Big data and cloud computing: Watch out for these unknowns

April 13, 2012 Off By David
Grazed from InfoWorld.  Author: David Linthicum.

The concept of big data is simple, as most good ideas are. Big data gives us the ability to use commodity computing to process distributed queries across multiple data sets and return result sets in record time. Cloud computing provides the underlying engine, typically through the use of Hadoop. Because these commodity server instances can be rented as needed, big data becomes affordable for most enterprises.

We always make discoveries as we use new technology, both good and bad. In the case of big data, the path to success will come with key lessons. But given the novelty of big data in real-world deployments, there are major questions for which we don’t yet have answers — so be extra careful in these areas…

Management of both structured and unstructured data, which is an advantage of using a nonrelational database, could mean that the unstructured data is much harder to deal with in the longer term. At some point, we’ll have to make tough calls around converting unstructured data to structured form. The trouble is that many of the initial design database implementations will be difficult to change once they’re in production.

The cost of using local servers is going to be high for those who won’t, or can’t, move to cloud-based platforms. We’re talking hundreds to thousands of servers that have to be loaded, powered, and maintained. Although you can avoid the cost of traditional enterprise software licensing, the raw processing power required will still drive many big data implementations over budget. I suspect many big data efforts will initially occur within data centers, where the big data expenses are intermingled with the overall data center costs; count on the final tallies to be a surprise.

Cloud-based big data servers are not at all the same. Amazon Web Services provides very different offerings than Google, for example, and capabilities differ between any pair of platforms you compare. Thus, the amount of time, effort, and talent required to get big data projects to their end state also vary, due to differences in technology. I suspect one or two platforms will emerge as the clear paths to success, but we’re not there yet.