Nebius launches new AI-native NVIDIA cloud platform built from the ground up to accelerate AI innovation

Nebius launches new AI-native NVIDIA cloud platform built from the ground up to accelerate AI innovation

October 18, 2024 0 By David
Object Storage

Nebius announced the launch of the first cloud computing platform built from scratch specifically for the age of AI.

Leveraging the experience of around 400 cloud engineers and deep AI expertise from Nebius’s in-house LLM R&D team, the new Nebius platform is designed to manage the full machine learning (ML) lifecycle – from data processing and training through to fine-tuning and inference – all in one place.

Built using the NVIDIA accelerated computing platform, the result is a truly AI-native cloud computing environment that supports highly intensive and distributed AI and ML workloads with the robustness, reliability and convenient user experience of a hyperscaler.

Roman Chernin, co-founder and Chief Business Officer at Nebius, said:

“The AI industry is changing incredibly fast, and so are the needs of AI practitioners. We spent months listening to our customers, and what they told us is that they need flexibility on capacity, they want real self-service access, and they need more than just basic infrastructure. This is what we have built, all in one place.”

The Nebius AI cloud is built to meet the needs of a wide range of customers: From individual researchers, start-ups and scale-ups who want genuine on-demand, self-service access, through to enterprises and major AI developers who demand reserved capacity and large, interconnected superclusters.

Key features of the new platform include:

  • Flexible, on-demand compute resources. The platform provides scalable compute power, optimized for AI and ML workloads using NVIDIA H100 and H200 Tensor Core GPUs, L40S GPUs, and the NVIDIA GB200 NVL72 platform coming soon. Users can tailor their cloud environment to fit their exact needs, whether they are running small experiments or large-scale AI deployments. This flexibility ensures that teams can adjust their compute resources as their workloads grow, without being locked into long-term contracts.
  • High-performance storage. Nebius offers high-speed storage optimized for AI workloads. With speeds of up to 100 GBps and 1M IOPS, users can efficiently handle large datasets, perform model training, and share data across distributed nodes. This ensures that AI projects remain uninterrupted and are able to scale as data grows.
  • Managed services for simplified AI operations. To further simplify the AI development process, Nebius provides fully managed services, including managed Apache Spark, a powerful tool for processing large datasets quickly, helping to streamline data engineering and ML workloads; and managed MLflow, which tracks experiments and model metrics, allowing users to easily manage the entire ML lifecycle and identify the best models for deployment.
  • Enhanced observability. The platform comes with built-in observability tools, offering real-time access to key performance metrics. Users can monitor their compute and storage usage directly from intuitive dashboards, helping them optimize resource utilization and maintain smooth operations without external monitoring tools.
  • Easy-to-use AI environment. Nebius has designed the platform with ease of use in mind. Nebius VMs come pre-configured with the latest versions of all essential AI libraries and drivers. Users can deploy their environments immediately without needing to worry about configuration, updates, or general-purpose tweaks. This minimizes setup time and maximizes productivity.

Andrey Korolenko, co-founder and Chief Product and Infrastructure Officer at Nebius, said:

“Over the past year we have written a new code base to create a fully owned cloud offering specifically for AI. This is a true full-stack AI platform: a fully owned network of large-scale NVIDIA InfiniBand-interconnected GPU clusters built to the NVIDIA reference architecture, with a proprietary cloud platform on top including a suite of managed services, developer tools and applications.”