AWS Announces Amazon EC2 Capacity Blocks for ML Workloads

AWS Announces Amazon EC2 Capacity Blocks for ML Workloads

November 3, 2023 Off By David

Amazon Web Services, Inc. (AWS) announced the general availability of Amazon Elastic Compute Cloud (EC2) Capacity Blocks for ML, an industry-first consumption model that enables any customer to access highly sought-after GPU compute capacity to run their short duration machine learning (ML) workloads. With EC2 Capacity Blocks, customers can reserve hundreds of NVIDIA GPUs colocated in Amazon EC2 UltraClusters designed for high-performance ML workloads. Customers can use EC2 Capacity Blocks with P5 instances, powered by the latest NVIDIA H100 Tensor Core GPUs, by specifying their cluster size, future start date, and duration. EC2 Capacity Blocks help ensure customers have reliable, predictable, and uninterrupted access to the GPU compute capacity required for their critical ML projects. To get started with EC2 Capacity Blocks visit aws.amazon.com/ec2/capacityblocks/.

Advancements in ML have unlocked opportunities for organizations of all sizes and across all industries to invent new products and transform their businesses. Traditional ML workloads demand substantial compute capacity, and with the advent of generative AI, even greater compute capacity is required to process the vast datasets used to train foundation models (FMs) and large language models (LLMs). Clusters of GPUs are well suited for this task because their combined parallel processing capabilities accelerate the training and inference processes. However, with more organizations recognizing the transformative power of generative AI, demand for GPUs has outpaced supply. As a result, customers who want to leverage the latest ML technologies, especially those customers whose capacity needs fluctuate depending on where they are at in the adoption phase, may face challenges accessing clusters of GPUs necessary to run their ML workloads. Alternatively, customers may commit to purchasing large amounts of GPU capacity for long durations, only to have it sit idle when they aren’t actively using it. Customers are looking for ways to provision the GPU capacity they require with more flexibility and predictability, without having to make a long-term commitment.

With EC2 Capacity Blocks, customers can reserve the amount of GPU capacity they need for short durations to run their ML workloads, eliminating the need to hold onto GPU capacity when not in use. EC2 Capacity Blocks are deployed in EC2 UltraClusters, interconnected with second-generation Elastic Fabric Adapter (EFA) petabit-scale networking, delivering low-latency, high-throughput connectivity, enabling customers to scale up to hundreds of GPUs. Customers can reserve EC2 UltraClusters of P5 instances powered by NVIDIA H100 GPUs for a duration between one to 14 days, at a future start date up to eight weeks in advance, and in cluster sizes of one to 64 instances (512 GPUs)-giving customers the flexibility to run a broad range of ML workloads and only pay for the amount of GPU time needed. EC2 Capacity Blocks are ideal for completing training and fine tuning ML models, short experimentation runs, and handling temporary future surges in inference demand to support customers’ upcoming product launches as generative applications become mainstream. Once an EC2 Capacity Block is scheduled, customers can plan for their ML workload deployments with certainty, knowing they will have the GPU capacity when they need it.

“AWS and NVIDIA have collaborated for more than 12 years to deliver scalable, high-performance GPU solutions, and we are seeing our customers build incredible generative AI applications that are transforming industries,” said David Brown, vice president of Compute and Networking at AWS. “AWS has unmatched experience delivering NVIDIA GPU-based compute in the cloud, in addition to offering our own Trainium and Inferentia chips. With Amazon EC2 Capacity Blocks, we are adding a new way for enterprises and startups to predictably acquire NVIDIA GPU capacity to build, train, and deploy their generative AI applications-without making long-term capital commitments. It’s one of the latest ways AWS is innovating to broaden access to generative AI capabilities.”

Since its founding in 1993, NVIDIA has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI, and is fueling industrial digitalization across markets. “Demand for accelerated compute is growing exponentially as enterprises around the world embrace generative AI to reshape their business,” said Ian Buck, vice president of Hyperscale and HPC Computing at NVIDIA. “With AWS’s new EC2 Capacity Blocks for ML, the world’s AI companies can now rent H100 not just one server at a time but at a dedicated scale uniquely available on AWS-enabling them to quickly and cost-efficiently train large language models and run inference in the cloud exactly when they need it.”

Customers can use the AWS Management Console, Command Line Interface, or SDK to find and reserve available Capacity Blocks. With EC2 Capacity Blocks, customers only pay for the amount of time they reserve. EC2 Capacity Blocks are available in the AWS US East (Ohio) Region, with availability planned for additional AWS Regions and Local Zones.

Amplify Partners works with engineers, professors, researchers, and open-source project creators to help turn their bold ideas into beloved products and companies. “We have partnered with several founders who leverage deep learning and large language models to bring ground-breaking innovations to market,” said Mark LaRosa, partner at Amplify Partners. “We believe that predictable and timely access to GPU compute capacity is fundamental to enabling founders to not only quickly bring their ideas to life but also continue to iterate on their vision and deliver increasing value to their customers. Availability of up to 512 NVIDIA H100 GPUs via EC2 Capacity Blocks is a game-changer in the current supply-constrained environment, as we believe it will provide startups with the GPU compute capacity they need, when they need it, without making long-term capital commitments. We are looking forward to supporting founders building on AWS by leveraging GPU capacity blocks and its industry-leading portfolio of machine learning and generative AI services.”

Launched in 2013, Canva is a free online visual communications and collaboration platform with a mission to empower everyone in the world to design. “Today, Canva empowers over 150 million monthly active users to create engaging visual assets that can be published anywhere,” said Greg Roodt, head of Data Platforms at Canva. “We’ve been using EC2 P4de instances to train multi-modal models that power new Generative AI tools, allowing our users to experiment with ideas freely and quickly. As we look to train larger models, we need the ability to predictably scale hundreds of GPUs during our training runs. It’s exciting to see AWS launching EC2 Capacity Blocks with support for P5 instances. We can now get predictable access to up to 512 NVIDIA H100 GPUs in low-latency EC2 UltraClusters to train even larger models than before.”

Leonardo.Ai provides a robust and dynamic platform for creative production that marries cutting edge generative AI technology with unparalleled creator control. “Our team at Leonardo leverages generative AI to enable creative professionals and enthusiasts to produce visual assets with unmatched quality, speed, and style consistency. Our foundation rests upon a suite of fine-tuned AI models and powerful tooling, offering granular control both before and after hitting generate,” said Peter Runham, CTO at Leonardo.Ai. “We leverage a wide range of AWS services to not only build and train our models, but also to host them to support usage from millions of monthly active customers. We are delighted with the launch of EC2 Capacity Blocks. It enables us to elastically access GPU capacity for training and experimenting while preserving the option for us to switch to different EC2 instances that might better meet our compute requirements.”

OctoAI’s mission is empowering developers to build AI applications that delight users by leveraging fast models running on the most efficient hardware. “At OctoML, we empower application builders to easily run, tune, and scale generative AI, optimizing model execution and using automation to scale their services and reduce engineering burden,” said Luis Ceze, CEO of OctoML. “Our ability to scale up on GPU capacity for short durations is critical, especially as we work with customers seeking to quickly scale their ML applications from zero to millions of users as part of their product launches. EC2 Capacity Blocks enables us to predictably spin up different sizes of GPU clusters that match our customers’ planned scale-ups, while offering potential cost savings as compared to long-term capacity commits or deploying on-prem.”