Cloud Storage Appropriate Data

December 11, 2010 Off By Hoofer
Grazed from InfoWorld.  Author: George Crump.

As vendors continue to mature the cloud storage on-ramps we discussed previously, the use case for cloud storage is becoming more wide spread. What was once primarily backup and archive storage destination is now quickly becoming an option for primary data storage. As a result the data that you are going to involve in a cloud storage solution may be different than it was only a year ago.As we discussed in our recent webinar "What’s Your Cloud Strategy, Answering The Top Ten Questions", cloud storage, at least in the public sense, still has a bottleneck in the connection from your data center to the WAN. When using cloud for primary storage, we are looking for use cases where only a small percentage of the data set is active at any point in time. That active data set would be cached in some manor locally…

We are also looking for data sets when requests being made from the non-active section of the data set are relatively small in size. That non-active data set could be quite large but we have an ability to access the small sub-sets of that data. As we discussed in our recent article "How To Add NAS To A SAN" a file sharing function is an excellent example of a small working data set, and large but granularly accessible inactive data set. In most NAS or fileserver environments only a very small percentage of the total data is active at any point in time. It can be easily cached locally on a appliance or virtual machine. If data is not in the local cache, then we can just transfer the needed file from cloud storage not the entire file system.

This concept of a small working data set, and large but granularly accessible inactive data set, can expand well beyond file systems. With some software intelligence and understanding of how specific applications store data, cloud storage for primary data can be used for semi-structured databases like Sharepoint or Exchange. Both of these applications have relatively small databases of either file meta-data or messages, but both have very large blob type components that stores either related files or attachments. Those blobs have portions that are active, today’s files, and then the rest of the blob is dormant, seldom being accessed. As a result these applications and applications like them have become an ideal data set for cloud storage.

Part of a cloud storage strategy is understanding your data to look for data that is appropriate for cloud storage. Avoid data sets that are new, very active across the whole data set or are extremely large. This means that today very active databases and storage for virtualized server environments probably need to be local. For the remaining data sets, which for most data centers will be the majority of their storage consumption, cloud storage is worth consideration.