The Cloud: Tailor-made for Unstructured Data?

July 29, 2011 Off By David
Object Storage
Grazed from IT Business Edge.  Author: Arthur Cole.

The primary reason that cloud computing, and cloud storage in particular, is drawing such high interest these days is the enterprise’s need to handle increasing amounts of unstructured data.

To be sure, structured data from databases and other applications is growing as well, but it is the relentless growth of emails, tweets, texts and the like that keeps CIOs up at night. According to IDC, more than 80 percent of enterprise data is unstructured, and it’s increasing at a rate of about 60 percent a year. And to make matters worse, less than 5 percent is accessed on a regular basis.

The cloud, then, is the proverbial knight in shining armor as it offers the opportunity not only to scale storage resources to an incredible degree, but it does so in a way that enterprises can provision just the right amount of storage they need at a given time — no more over-provisioning, and over-spending, to accommodate theoretical loads.

Extra capacity, however, is only part of the equation. Equally important is the ability to manage and monitor all this stuff once it hits the cloud. As MarketWatch‘s Charles Silver points out, the cloud represents a shift from single-server/multi-client architectures to multi-server/multi-client ones. That means existing management systems designed for the old silo days are ill-equipped to handle the demands of a global, heterogeneous environment.

The changes needed to accommodate unstructured data in the cloud are already taking place in some of the latest management systems on the drawing board or in the channel. Microsoft is working on Project Daytona, which seeks to create a simplified version of Google’s MapReduce system for the Azure cloud. The goal is to provide an easier way to manage unstructured data using advanced machine-learning and analytics algorithms. The program is also looking at a new analytics-as-a-service called Excel DataScope designed to handle extremely large data sets.

Open source developer Gluster, meanwhile, says it has an all-software scale-out NAS solution suitable for internal or external cloud environments that allows firms to tap additional storage in much the same way they draw extra computing power. The GlusterFS system does away with metadata as a means to enhance performance and scalability, and offers compatibility with leading cloud offerings like Amazon Web Services and GoGrid. It also conforms to the POSIX standard for easier cloud migration.

Unstructured data typically invokes images of the movie "The Blob," with it devouring everyone and everything in its path. The truth is a little less frightening. No question, it’s a big problem, but it’s also a manageable problem. Your approach to data management will have to broaden in scope to a significant degree, but there’s no reason to think that the tools and technology coming your way won’t be able to handle it.