Q&A: Li Kang Discusses CelerData Version 3, Bringing High-Performance Analytics to the Data Lakehouse

Q&A: Li Kang Discusses CelerData Version 3, Bringing High-Performance Analytics to the Data Lakehouse

March 16, 2023 Off By David

CelerData, a unified analytics platform for the modern, real-time enterprise, announced the latest version of its enterprise analytics platform, CelerData Version 3.

To find out more, CloudCow spoke with Li Kang, VP of Strategy at CelerData.

CloudCow: When we spoke last summer, you were launching StarRocks. What has changed since then?

Li Kang: Yes, we have lots of news since then. In late 2022, we announced the incorporation of CelerData. At that time, CelerData focused on leading the development of the StarRocks Project – a high-performance analytical database – while offering CelerData Enterprise, an on-premises deployment of StarRocks.

And just last month, we contributed the StarRocks Project to the Linux Foundation, where the project will continue to grow and thrive as part of the open source community.

CloudCow: Tell us what’s new in CelerData 3?

Kang: CelerData is built on top of the open source project StarRocks, one of the fastest MPP SQL databases. With this new release, lakehouse users have the option to conduct high-performance analytics without ingesting data into a central data warehouse. Analysts and data scientists can perform analytics by querying across streaming data and historical data in real-time, without having to wait and combine streaming data into batches for analysis. This greatly simplifies the data architecture and improves the timeliness of lakehouse analytics. CelerData’s advanced query engine can support thousands of concurrent users at 10,000 QPS (Queries Per Second), enabling use cases previously not possible on the data lakehouse.

CloudCow: Why are these new features so important?

Kang: The data lakehouse has added critical capabilities to the data lake architecture by introducing ACID control, table formats and data governance. But analytics capabilities on the lakehouse are still limited and cost prohibitive. Most query engines are not able to support real-time analytics, and fall apart when facing a large number of concurrent users. Compared to other common query engines, CelerData improves query performance by at least 3 times while significantly reducing infrastructure cost.

CloudCow: What are the top capabilities of the new product?

Kang: Enabling high performance data lake analytics is a key capability. By integrating with open table formats such as Hudi, Iceberg, and Delta Lake, customers can take advantage of the performance of CelerData query engine on a data lake without data ingestion.

In addition, unlike other data lake query engines, CelerData users have the option to bring data into its own storage format for the best query performance.

Another key component of CelerData 3 is our cloud native architecture that leverages cloud object storage to improve reliability and reduce storage cost.

It also enables better workload and resource isolation so that users can create different warehouses for different use cases.