Alluxio Announces Record Growth for 1H 2020 in Hybrid and Multi-Cloud Data OrchestrationAugust 13, 2020
Alluxio, the developer of open source cloud data orchestration system, today announced it has closed 1H 2020 with sales growth of more than 650% over 1H 2019. Alluxio demonstrated continued market strength and leadership in financial services, high tech, telecom, internet, gaming and ecommerce across North America, Asia and Europe.
“2020 has been an unprecedented year as organizations adjusted their priorities to emphasize cost saving and infrastructure modernization to prepare for future growth. For data & AI teams, a key solution is adopting a cloud / hybrid cloud strategy and Alluxio has become a critical element for that by bringing cost savings, speed and agility to data analytics and AI infrastructures,” said Haoyuan (H.Y.) Li, Founder and CEO, Alluxio, “Today, I am very proud to share that we closed the first half of the year with 650% revenue growth over the same period last year. This could not have been achieved without the strong commitment of our customers, partners and the vibrant Alluxio community. Marching into the second half, we will continue to make further investments to build a stronger data orchestration system bringing more cost savings, efficiency and agility for our customer’s data driven workloads (Spark, Presto, TensorFlow, PyTorch) in cloud and hybrid cloud environments.”
Continuing Customer Success and New Customer Wins
Alluxio continues to attract new customers and expand existing customer deployments around the globe. Recent notable additions and success stories include: Alibaba, Aunalytics, Datasapiens, EA, Nielsen, Playtika, Roblox, Ryte, Tencent, VIPShop, Walmart, Walkme and WeRide.
Derek Tan, Executive Director of Infra & Simulation at WeRide, said, “WeRide uses Alluxio as a hybrid cloud data gateway for applications on-premises to access public cloud storage like AWS S3. The new data access architecture provides a localized cache per location to eliminate redundant requests to S3. As a result, we reducedthe complexity of data synchronization by having a single interface to access data and removed the need to maintain a custom locally copy; reduced S3 data-out cost of downloading redundant data; gained fast access to data to boost engineering productivity; and now have an in-office cache of the cloud data.”
Honghan Tian, Sr. Infrastructure Architect, Data Service Center (DSC) at Tencent PCG (Platform and Content Business Group) leverages Alluxio to optimize the analytics performance and minimize the operating costs in building Tencent Beacon Growing, a real-time data analytics platform. He explained, “In our project “Beacon Growing,” we have deployed Alluxio to improve Impala performance by 2.44x for IO intensive queries and 1.20x for all queries. The query failure rate due to timeout is also reduced by 29%. In the future, we foresee it can reduce disk utilization by over 20% for our planned elastic computing on Impala.”
New Advancements for the Alluxio Data Orchestration Platform
The latest release of Alluxio, version 2.3, shipped in June. It focuses on streamlining the user experience in hybrid cloud deployments where Alluxio is deployed with compute in the cloud to access data on-prem. Specific new features include:
- One Command Deployment on Google Dataproc and AWS EMR – Deploying Alluxio for the first time should be easy, and being able to repeatedly create custom deployments with Alluxio in the stack is key for deployments in the cloud.
- Native Kubernetes Helm Chart Support – Alluxio 2.3 supports data locality on Kubernetes with ephemeral compute (ie. Spark) without the requirement for host networking.
- Environment Validation Tools – After deployment, the hurdle of connecting on-cloud Alluxio to remote data is the biggest challenge for new Alluxio users. With this release, a guided experience is now available to help users during this first step after deployment.
- Concurrent Metadata Synchronization – For long running and production hybrid cloud deployments, users found it critical for the files and directories virtualized in Alluxio to be synchronized with the on-premise data in near real time. In Alluxio 2.3, the new concurrent metadata synchronization algorithm provides an order of magnitude or more performance improvement.
- Alluxio Structured Data Services– Alluxio Structured Data Services (SDS) is the subsystem in Alluxio that enables integration with OLAP frameworks like Presto and SparkSQL at the structured data level, as opposed to raw files and directories. Alluxio 2.3 further improves the range of compatibility for SDS, especially in cloud environments.
- Glue UDB Support – The Alluxio Catalog Service now supports connecting to AWS Glue for the metadata service. This enables Alluxio Structured Data Services for table metadata stored in AWS Glue, in addition to the existing support for the Hive Metastore.
- ORC File Support – ORC is now a supported input type (in addition to CSV and Parquet) for transformations with the Alluxio Catalog Service.
Open Source Community Contributions
The Facebook Presto team has been collaborating with Alluxio on an open source data caching solution for Presto. This is required for multiple Facebook use-cases to improve query latency for queries that scan data from remote sources such as HDFS. In early experiments, significant improvements in query latencies and IO scans have been observed.