Big Data Is on a Collision Course with the Cloud

May 1, 2011 Off By David
Grazed from GigaOM.  Author: Derrick Harris.

It’s an interesting time to be involved with information technology, as we’re seeing two of the biggest trends in a long time ascend into the mainstream almost simultaneously. This morning, GigaOM Pro published my wrapup of the first-quarter news and trends in the infrastructure space (subscription required), and what struck me the most as I looked back was that “big data” has become the new “cloud computing.” I don’t mean that with regard to the technological aspects, but rather to the importance that vendors and customers alike have attached to the term. Just like every vendor now has a cloud product and every company has a cloud strategy in place, big data efforts also will become ubiquitous over the next couple years, and the two very well might merge in the near future.

That being said, the reasons for embracing big data might be entirely different than the reasons for embracing cloud computing. Cloud computing, at least for many users, is about offloading responsibilities that are a necessary cost of doing business, but that don’t necessarily do much to improve the business. Big data, on the other hand, is a different beast — at least for now. In many cases, it’s about looking at information in entirely new ways in order to improve whatever it is that company does. Whether they’re improving the effectiveness of advertising or actually inspiring new products, analytics efforts effect real business results.

But like cloud computing, businesses realize that if they don’t have a big data story to tell or a big data strategy in place, they’ll very soon fall behind the curve. Companies that haven’t at least implemented a private cloud infrastructure will still be wasting resources managing IT tasks while their competitors have automated them and are investing those resource elsewhere. Soon, companies without analytics systems in place will be grasping at straws, relatively speaking, to determine what it is that customers want, while their competitors will be drawing actionable insights from data that tells them what customers want. For proof, just look at the incredible amount of Hadoop, NoSQL, analytic database and business intelligence action over the past year, and the past three months, in particular.

That’s not to say that the two trends aren’t on convergent paths: I think they are, but just a result of cloud computing maturing, as big data is still in its relative infancy. With increasingly inexpensive cloud storage and increasingly powerful cloud processing, the cloud is becoming an ideal place to store and analyze the data that companies are collecting. For one, it’s a risk-free way to experiment with advanced analytics while not having to invest piles of cash in the infrastructure otherwise needed to run those types of workloads. But with the advent of big data workflows delivered as cloud services along with every other type of application, users no longer will even need to undertake, at least to the same degree, the sometimes laborious process of teaching themselves new software and new methods of analysis.

I think the next year will be very telling about the degree to which this convergence will happen, as cloud providers and big data vendors alike seek to capitalize on each other’s momentum. We already have Elastic MapReduce and data-as-a-service firms such as InfoChimps, but I suspect they’re just the beginning of what will become a broad base of services applying on-demand cloud resources to analytic workflows, and targeting CIOs with both of those capabilities on the tops of their minds.