Chaos Sumo Releases Industry Report on AWS S3 Blind Spots and New Data Lake Use Cases
April 23, 2018Chaos Sumo, a cloud-based log data retention and analytics service for object storage, today released the findings of The State of Object Storage 2018 Report: The Emergence of the AWS S3 Data Lake. As object storage such as AWS S3 continues to gain widespread enterprise momentum, with over 70 percent of companies reporting to use it today, it offers untapped opportunities for promising new use cases such as historical log analytics, and application and media hosting. More than one third of respondents in a recent survey, conducted by Chaos Sumo in December-January 2018, are also looking to object storage to streamline and enable data lake usage for historical trend analysis and machine learning. The study also found that the top barriers preventing S3 innovation are the lack of tools today that enable data access and visibility, and costs of moving data around in order to analyze the growing volumes of disparate object storage data with accuracy and scale.
"The current inability of businesses to perform consistent, longitudinal and easy trend and predictive analysis in object storage, including log analytics, is resulting in critical business information being thrown away or archived in an inaccessible manner," says Thomas Hazel, founder and CTO of Chaos Sumo. "This hidden culprit – the increasing costs of storing data for real- or near-time analysis, is the core impediment to doing more with the growing amount of data stored in object storage such as AWS S3, and Chaos Sumo is here to tackle this head on."
Major findings from the report based on over 120 responses from data science, analytics, engineering and DevOps/ IT professionals across a wide variety of organizations include:
Object storage has gone mainstream – AWS S3 is here to stay
- 72 percent of respondents report using AWS S3 or another form of cloud-based object storage today, with 40 percent anticipating their investment in object storage to grow over 50 percent in the next year.
As growth of AWS S3 object storage explodes, its intended uses cases are shifting toward analytics
- While 83 percent of respondents use the service as a cheap alternative to traditional on-premises storage solutions for backup, storing, and archiving data, object storage is increasingly being used for application hosting (38%), media hosting (34%), and business analytics (32%).
The biggest challenges with object storage are visibility into the stored data, ability to analyze the data right in S3 and the costs of moving the data
- Despite having a data lake, only 36 percent of respondents can easily access the data, and a mere 7 percent claim it is easy to analyze that data today.
- As object storage data expands, concerns around greater storage, compute, and network costs grow with 37 percent of respondents being worried about the increasing costs. Specifically, for Elastic/ ELK users, the prohibitive storage costs associated with exabytes of data in S3 are compounded with additional effort and resources needed for its scaffolding, which renders most of this rich data inaccessible.
Myriad of analytics tools that only do part of the job a major culprit for analytics challenges:
- 42 percent are using home-grown solutions for solving visibility and analytics issues within object storage/S3, while others quote using tools such as RedShift (51%), Amazon Athena (23%), and Elastic Logstash Kibana (ELK) (7%).
- These tools are not only inadequate at addressing the jobs needed to be done, they also take a lot of time to set up and manage – 52% of respondents say it took them more than three months to build their current analytics architecture.
Data lakes are slowly gaining momentum within the enterprise
- 28 percent of respondents report having data lakes today, with another 18 percent planning to implement one in the next 12-18 months.
Access the full report here for additional insight.