Internet-Scale Computing

Cloud Computing3 minutes readFeb 26th, 2015

Recently, I was talking to a team for a new start-up and they told me that their target market were clients with over 1,000 software engineers, supporting over 10,000 software services.  One would think that this is a very tiny target market, but also a very lucrative one.  That discussion made me realize that leading companies are starting to emulate the Internet-Scale technology leaders such as Google, Amazon, eBay and Netflix, just to mention a few.  I had previously thought that only a handful of companies would aspire to develop software at the level of Internet-scale, but our discussion pointed out that other companies have the technical ability and the aspirations to develop similar applications.

So, what does Internet-Scale mean?  One of the best descriptions comes from a colleague, Adrian Cockroft, who spoke a few years back at the Computer Measurement Group.  At that time, NetFlix was using Amazon to deploy their software.  The software was deployed over more than 50,000 Amazon instances.  Their architecture assumed one thing – everything breaks.  This is true for the underlying infrastructure, the middleware software and the application services themselves.  Within Unisys, software engineers who started their careers working on mission-critical mainframe software for MCP and OS2200-based systems take this maxim as standard practice, but they achieved high reliability without the luxury of thousands of “backup” servers lingering around the world.

For example, with Netflix, for every service that is running in an instance, there are duplicate services, world-wide, with some duplicates running at a previous version level in case an error occurs at the current level and they need to “roll back”.  For Netflix, few open source solutions could be used because the scale of Netflix broke them.  Eventually, they had to collaborate with other technology leaders, such as Facebook, Amazon and Google.  For example, they worked with Facebook on the open source project called “Cassandra” to deal with bottlenecks in “Hadoop”.

Currently, “Google 101” classes are being taught at universities to prepare computer science graduates to understand these Internet-scale concepts.  These graduates will have new skills that can be leveraged to help major corporations develop their own Internet-Scale applications.

As more and more companies develop at Internet-Scale, new methods to monitor, analyze and manage these application will have to emerge.  Netflix has created their own and they are quite proprietary.  The current availability of Infrastructure as a Service (IaaS) will not scale.  If a single application requires 10,000 public cloud instances, then the idea that a “cloud user” will commission this one instance at a time does not make sense.  A higher level of management will emerge that can commission the application as a single entity, with intelligence and automation to determine placement of services to achieve work-wide service levels regarding both user response time as well as availability.  The management will include intelligence to scale up or scale down the number of instances to match changing workload demands.  It will monitor the services and their service levels to determine of a service is healthy.  If it is not, it is decommissioned and a new service is commissioned in its place.

Sort of reminds me of the Borg in Star Trek, don’t you think?

Tags-   Cloud computing Internet-Scale Computing