Big Data Meets Cloud

August 16, 2012 Off By David
Object Storage

Grazed from Forbes. Author: Holger Kisker.

Over the past few years, BI business intelligence (BI) was the overlooked stepchild of cloud solutions and market adoption. Sure, some BI software-as-a-service (SaaS) vendors have been pretty successful in this space, but it was success in a niche compared with the four main SaaS applications: customer relationship management (CRM), collaboration, human capital management (HCM), and eProcurement. While those four applications each reached cloud adoption of 25% and more in North America and Western Europe, BI was leading the field of second-tier SaaS solutions used by 17% of all companies in our Forrester Software Survey, Q4 2011.

Considering that the main challenges of cloud computing are data security and integration efforts (yes, the story of simply swiping your credit card to get a full operational cloud solution in place is a fairy tale), 17% cloud adoption is actually not bad at all; BI is all about data integration, data analysis, and security. With BI there is of course the flexibility to choose which data a company considers to run in a cloud dyployment and what data sources to integrate — a choice that is very limited when implementing, e.g., a CRM or eProcurement cloud solution…

“38% of all companies are planning a BI SaaS project before the end of 2013.”

With the increasing maturity of cloud technology and market understanding of its broad range of benefits (e.g., check out the Forrester report “The Changing Cloud Agenda”), the adoption of public and virtual private cloud solutions for BI will certainly grow much stronger in the coming years: 38% of all companies from our survey are planning a BI SaaS project before the end of 2013. Many of those respondents (27%) plan to complement their existing BI solutions and a smaller number (11%) actually plan to fully replace their existing BI with a cloud solution. But there is a big cloud looming on the horizon that can significantly accelerate this trend: big data.

The Big Data Opportunity

Information is exploding all around us: 1,500 blogs, 98,000 tweets, and 168 million emails every minute just to mention a few of the many sources that contribute to the tremendous data growth.[i] This year we will hit a volume of 2.7 zettabytes of global digital data and Forrester predicts that ongoing data growth will outperform Moore’s Law over the next few years. This is not a threat or challenge; this is a tremendous opportunity. The challenge of handling vast quantities of data has been around since the beginning of the digital age: There has always been more data than companies can analyze. What’s new is that, recently, our ability to collect and analyze huge amounts of data is exploding too. The big data hype is not about the challenge that there is so much data all around us; it is about the opportunity that today we can turn this huge amount of data into business value!

“Global data growth will outperform Moore’s law over the next few years.”

Big data can have many faces. Like cloud computing a few years ago, there is still a lot of confusion about what big data actually is and is not. Forrester’s defines big data as: “Techniques and technologies that make capturing value from data at an extreme scale economical.”

Please note that this definition includes the fact that big data is about a set of different technologies. Big data does not equal Hadoop and certainly not in-memory computing like some vendors suggest. The best choice to unlock the value of big data for better business decisions, which may or may not include Hadoop or in-memory computing, will depend on the use case scenario (the variety, volume, and velocity of the underlying data). And this is one of big data’s key challenges — it requires several technologies together to cover a broad spectrum of use case scenarios: What works well for sentiment analysis may not work for risk management or asset performance.

Leave Big Data In The Cloud

For many big data scenarios, most of the information is coming from outside the company, such as from social media, demographic data, web data, events, feeds, etc. Organizations recognize the growing importance of social media but are facing challenges to unlock its potential.

“Big data is like crude oil — it needs filtering and refining to unlock its value and make it usable.”

In social media streams, only a fraction of data is relevant, e.g., for sentiment analysis, ~20% of all tweets include a link that needs to be opened to understand its context.[ii] Huge amounts of external data needs to get filtered, formatted, and prepared for any subsequent analysis; after the analysis, what needs to be stored, the aggregated result only or the data source for audit and further analysis? All tweets from the past two years take 0.5 petabytes to store; it simply doesn’t make sense for every company interested in social media to start storing the same big data in-house. To stay for a moment with the analogy of big data and crude oil: Are you a consumer of oil (i.e., gasoline, jet fuel, heating oil), or do you want to build exploration sites and refineries? Are you a data consumer or a data service provider? Most of us will be data consumers (and co-producers of course), but there will be a fast-growing business opportunity for big data service providers mainly in the form of cloud services, where most of the data sits anyway.

There are three good reasons why (depending on the use case scenario) big data can make a lot of sense in the cloud:

Big data requires a spectrum of advanced technologies, skills, and investments. Do you really need/want this all in-house?
Big data includes huge amounts of external data. Does it make sense to move and manage all this data behind your firewall?
Big data needs a lot of data services. Focus on the value of your differentiated data analysis instead of big data management.

Consequently we already see a number of vendors that offer big data solutions in the cloud. Some address the technology side in the form of big data platforms that include several complementing big data management and analytics technologies; others focus on the big data service side, including data preparation, storage, or enrichment (e.g., the merging of different data sources).