MapR Accelerates Separation of Compute and Storage
April 2, 2019MapR Technologies, Inc., visionary creator of the next-generation data platform for AI and Analytics, today announced innovations in the MapR Data Platform that accelerate the compute journey with new, deep integrations with Kubernetes core components for primary workloads on Spark and Drill. These innovations make it easy to better manage highly elastic workloads while also facilitating in-time deployments and the ability to separately scale compute and storage. Organizations restructuring their applications or building next-generation real time data lakes will benefit from these new capabilities in a Kubernetes model, with Spark and Drill, by easily leveraging the elasticity and agility of such clusters.
“Having run a recent survey on organizations’ use of containers to support AI and analytics initiatives, it is clear that a majority of them are exploring the use of containers and Kubernetes in production,” said Mike Leone, senior analyst, ESG. “We are also seeing compute needs are growing rapidly and bursty due to the unpredictability of compute-centric applications and workloads. MapR is solving for this need to independently scale compute while also tightly integrating with Kubernetes in anticipation of organizations’ rapid container adoption.”
In early 2019, MapR enabled persistent storage for compute running in Kubernetes-managed containers through a CSI compliant volume driver plugin. With this announcement, MapR further expands its portfolio of features and allows the deployment of Spark and Drill as compute containers orchestrated by Kubernetes. This deployment model allows end users including data engineers to run compute workloads in a Kubernetes cluster that is independent of where the data is stored or managed. The following core capabilities are included in this release:
- Tenant Operator: Creates tenant namespaces (Kubernetes Namespaces) for running compute applications, allowing for a simple way to start complex applications in containers within Kubernetes. An end user can run Spark, Drill, Hive Metastore, Tenant CLI, and Spark History Server in these namespaces. These tenants can, in turn, point to a storage cluster that is located elsewhere.
- Spark Job Operator: Creates Spark jobs, allowing for separate versions of Spark to be deployed in separate pods, facilitating the multiple stages of dev, test, and QA that are typical in a data engineer’s workflow.
- Drill Operator: Starts a set of Drillbits, allowing for auto-scaling of queries based on demand.
- CSI Driver Operator: Standard plugin to mount persistent volumes to run stateful applications in Kubernetes.
“MapR is paving the way for enterprise organizations to easily do two key things: start separating compute and storage and quickly embrace Kubernetes when running analytical AI/ML apps,” said Suresh Ollala, SVP Engineering, MapR. “Deep integration with Kubernetes core components, like operators and namespaces, allows us to define multiple tenants with resource isolation and limits, all running on the same MapR platform. This is a significant enabler for not only applications that need the flexibility and elasticity but also for apps that need to move back and forth from the cloud.”
In this release, MapR delivers on six key benefits:
- Handle compute bursts by spinning additional compute containers without having to add more physical host servers;
- Isolate resources and prevent applications from starving each other of resources by setting granular limits on quotas, and by using Spark job operators to create different Spark clusters;
- Accommodate fluctuating query workload by growing Drillbits dynamically based on load and demand;
- Run different versions of Spark and Drill on the same platform,
- Allow for multiple tenants to co-exist; and
- Deploy Spark and Drill container applications, along with MapR volumes, across multi-cloud environments, including private, hybrid and public clouds;
These capabilities will be available in Q2 2019.