Back in the late 1800’s it was clear that electricity was the energy form of choice for industry. However it wasn’t until decades into the 20th century that the big players gave up their big steam-powered local electrical generators for cheap commodity power from consolidated grids. One of reasons this took 20+ years was technology: the introduction of AC transmission and electrical transformers made everything compatible. Immature business processes were the other roadblock. The utility companies had to figure out how to meter and bill their customers fairly and had to achieve sufficient scale to drive prices down far enough to make grid power attractive to industry.
A similar thing is happening to Cloud Computing. Virtual machine technology and fast public networks have made computing a mere commodity. Despite this, the majority of large businesses are clinging to their expensive and inefficient data centers. The technology is ready, the business case is there, but the uptake is slow. Why? In my view, there are three barriers:
• My private data on your public machines? You must be joking.
• Where do I plug in? Who’s in charge of provisioning?
• How do I get my gerbil to roar? How do I scale this thing?
In this series of articles, we’ll look at these problems and consider some forward-thinking solutions.
Barrier #1: Putting my private data on your public machine?
Over last fifteen years since the Internet took off we’ve been getting really good at building firewalls around our computing plants and insulating the pipes that pass data to our offsite workers. We have Virtual Private Networks and Secure Sockets to prevent snooping of information in transit, but we aren’t very good at securing the data once it stops moving. Sometimes we encrypt it when it is on our servers, other times we don’t when it’s on our laptops or USB sticks. We entrust the encryption keys with too many knowledge workers and we sometimes get locked out of our own spaces. How can we possibly extend this hodge-podge of storage security to computing equipment that is out there on someone else’s equipment?
In the noble tradition of “thinking out of the box”, maybe the best way to protect our data is to give it to as many public vendors as possible. This may sound odd, but there’s a catch. We don’t give them intact files, but anonymous, encrypted blocks of each file scattered across a continent of data centers. No single computer has the entire file; instead the blocks are distributed like needles in a large number of haystacks. Most importantly, anyone looking for your files won’t recognize them, only billions of blocks on different servers with no knowledge of which of them belongs to your file.
So, you may be asking, how would my organization find its own files? There is a clever piece of mathematics within the field of cryptology that solves this beautifully. It creates what appears to be a completely random set of large numbers from a single starting number – called a seed. This set of numbers identifies the ‘haystack’ (the server) and the position within that haystack to find each block of a file. Your organization needs only to remember the seeds for each file while the block data itself sits on public servers.
This is the solution for data privacy on the Cloud. Corporate data is scattered across public server equipment where it cannot be reconstructed without the seeds that positioned it. Your organization runs applications in this same Cloud. When these apps need to work on your data, they contact your corporation’s internal server and ask for the seed associated with the file to be transferred over a secure link (remember, transferring information over a secure channel is one thing that industry *has* perfected over that last decade.) The blocks are found and file is rebuilt within the app but never written to its local file system. The seed is discarded by the app when it’s finished. Your data never leaves the app and all traces are gone when the app finishes on the public server equipment.
As a side-effect of placing data on the Cloud like this, your organization will have made great strides in the direction of Disaster Recovery planning. By storing your data blocks redundantly, your files will be safely and invisibly stored across the Cloud but immune to equipment failure of your internal equipment or any specific Cloud provider.
Sometimes it takes a substantial change of approach to fix an architecture that doesn’t quite work in a new context. This is what’s happening for data and Cloud Computing. Fortunately the software on which this entire infrastructure is built is very malleable. It’s just a matter of adjusting our own paradigms as part of the evolutionary process.
In the next article I’ll tackle the provisioning problem.