How to Make Your Big Data Comfortable in the Cloud

March 16, 2012 Off By David
Object Storage

Grazed from Computer Technology Review.  Author: Ian Fyfe.

Save the date: Cloud computing and big data analytics are poised for a blissful union, and it figures to be a match made in heaven…er, the cloud.

Big data, of course, is the hot trend in high-performance computing. It’s large scale, it’s often unstructured, and it’s extremely valuable for enterprises looking to make sense out of huge datasets. Cloud computing has been the darling term of the tech world for the last few years as it ushers in a new era of computing as a service, despite lingering concerns over security, availability and cost…

The scale of big data can make it an interesting partner for the cloud at first glance. With that scale comes a high complexity in managing it, and, as such, big data analytics is usually kept on local clusters of servers. The advantage of the cloud, meanwhile, is about optimizing available resources as efficiently as possible. Still, if implemented correctly, there is no reason you shouldn’t be able to combine the benefits of big data with the benefits of the cloud.

Harnessing the power of big data in the cloud through business analytics doesn’t need to be tricky, but there are some specific strategies for making sure your enterprise is set up for optimal efficiency (and maximum ease-of-use). There are requirements for both big data and the cloud, that, when added together, can put an enterprise ahead of the curve in how it accesses, analyzes and improves its business operations, based on big data.

First, let’s focus on the three things your big data business analytics tool must have:

  1. The ability to connect: Use a business analytics tool that can connect – natively – to all of the leading big data sources, such as Hadoop and NoSQL stores.
  2. The ability to manage: Make sure that tool can effectively manage and orchestrate big data tasks as well as traditional IT tasks.
  3. The ability to integrate: Rarely does data for analytics come from a single source. A business analytics tool is only as good as its data integration capabilities; data will need to be efficiently integrated between traditional relational databases and non-traditional big data stores, such as Hadoop and NoSQL databases.

Your cloud requirements for big data business analytics are just as important. The main benefits of the cloud are elasticity, the ability to pay-as-you-go and not needing to manage the hardware on-premise. For example, a media company during normal times is able to meet its data processing needs using its own on-premise private cluster of 50 servers. However, around the time of major events such as the Super Bowl or FIFA World Cup, the volume of advertising impressions data needing to be processed may increase by 8-10x, so they can spill-up by temporarily adding another 200 servers in a public cloud, such as Amazon Web Services. Here are three things your big data analytics tool should be and have to run successfully in the cloud:

  1. Cloud provider agnostic: Look for an analytic tool that can be run on any cloud service, public or private.
  2. Elastic: Make sure it is quick and easy to add computing resources at times of peak load, and reduce costs by reducing resources during normal times.
  3. Data communications bandwidth: Make sure you have the data communications pipes in place to efficiently move that raw big data up into the cloud. Chances are your big data sources – for example, web logs – are already in the cloud, in which case this might be as simple as copying the big data files from one cloud provider service to another.

In conclusion, the union of big data and the cloud is a strong one, if done effectively and efficiently. Big data provides the storage and data processing muscle to manage and analyze vast lakes of data, while the cloud provides pay-as-you-go and elastic computing resources ‘in the sky’ that can grow and shrink as your big data analytics needs change from week to week. Truly a blissful union.