Dremio CEO Identifies Top Big Data & Analytics Predictions for 2020
January 7, 2020Innovations in the cloud and the rise of more efficient ways to collect, access, and analyze big data, have rapidly improved the amount of value enterprises are getting from their data. In 2020, enterprises will evolve in how they approach data maturity and strategize cloud investments.
According to Tomer Shiran, co-founder and CEO of Dremio, the new year will bring compelling reasons to focus on modern cloud data lakes; increased efficiency of cloud services to remarkably reduce cloud computing costs; easier ways to make IoT data a valuable business asset; and open source innovations to accelerate analytics results. The following five major trends guide his predictions for 2020.
Cloud data warehouses turn out to be a big data detour.
Given the tremendous cost and complexity associated with traditional on-premise data warehouses, it wasn’t surprising that a new generation of cloud-native enterprise data warehouse emerged. But savvy enterprises have figured out that cloud data warehouses are just a better implementation of a legacy architecture, and so they’re avoiding the detour and moving directly to a next-generation architecture built around cloud data lakes. In this new architecture data doesn’t get moved or copied, there is no data warehouse and no associated ETL, cubes, or other workarounds. We predict 75 percent of the global 2000 will be in production or in pilot with a cloud data lake in 2020, using multiple best-of breed engines for different use cases across data science, data pipelines, BI, and interactive/ad-hoc analysis.
Enterprises say goodbye to performance benchmarks, hello to efficiency benchmarks.
Escalating public cloud costs have forced enterprises to re-prioritize the evaluation criteria for their cloud services, with higher efficiency and lower costs now front and center. The highly elastic nature of the public cloud means that cloud services can (but don’t always) release resources when not in use. And services which deliver the same unit of work with higher performance are in effect more efficient and cost less. In the on-premises world of over-provisioned assets such gains are hard to reclaim. But in the public cloud time really is money. This has created a new battleground where cloud services are competing on the dimension of service efficiency to achieve the lowest cost per compute, and 2020 will see that battle heat up.
IoT data finally becomes queryable.
The explosion of IoT devices has created a flood of data typically landing in data lake storage such as AWS S3 and Microsoft ADLS as the system of record. But while capturing and storing IoT data is easy, the semi-structured nature of IoT data makes it difficult to process and use: data engineers are forced to build and maintain complex, and often brittle, data pipelines to enrich IoT data, add context to it, and accelerate it. Software AG has stepped in to tackle this problem head on with their Cumulocity IoT Data Hub, and we predict in 2020 IoT data will be directly queryable at high performance via business intelligence, self-service analytic, machine learning, or SQL-based tools.
The rise of data microservices for bulk analytics.
Traditional operational microservices have been designed and optimized for processing small numbers of records, primarily due to bandwidth constraints with existing protocols and transports. But now this long-standing bottleneck issue has been solved with the arrival of Apache Arrow Flight, which provides a high performance, massively parallel protocol for big data transfer across different applications and platforms. We predict that in 2020 Arrow Flight will unleash a new category of data microservices focused on bulk analytical operations with high volumes of records, and in turn these data microservices will enable loosely coupled analytical architectures which can evolve much faster than traditional monolithic analytical architectures.
Apache Arrow becomes fastest project to reach 10M downloads/month.
Apache Arrow (co-created by Dremio) has firmly established the industry-standard for columnar, in-memory data representation and sharing, powering dozens of open source & commercial technologies and making data science 100 to 1000X faster. Arrow has already achieved over 6M monthly downloads in the three years since release, with downloads continuing to grow exponentially. As a result, we predict Arrow will reach 10M downloads/month in 2020, faster than any other Apache project. And with the release of Apache Arrow Flight (also co-created by Dremio) this past October, the performance benefits of Arrow are being extended to the Remote Procedure Call (RPC) layer further increasing data interoperability. While Arrow Flight is just getting started, we predict that by 2025 it will replace decades-old ODBC/JDBC as the de facto way in which all modern data systems communicate.