Announcing A3 supercomputers with NVIDIA H100 GPUs, purpose-built for AI
Roy Kim
Director, Product Management, Google Cloud
Chris Kleban
Group Product Manager, Google Cloud
Implementing state-of-the-art artificial intelligence (AI) and machine learning (ML) models requires large amounts of computation, both to train the underlying models, and to serve those models once they’re trained. Given the demands of these workloads, a one-size-fits-all approach is not enough — you need infrastructure that’s purpose-built for AI.
Together with our partners, we offer a wide range of compute options for ML use cases such as large language models (LLMs), generative AI, and diffusion models. Recently, we announced G2 VMs, becoming the first cloud to offer the new NVIDIA L4 Tensor Core GPUs for serving generative AI workloads. Today, we’re expanding that portfolio with the private preview launch of the next-generation A3 GPU supercomputer. Google Cloud now offers a complete range of GPU options for training and inference of ML models.
Google Compute Engine A3 supercomputers are purpose-built to train and serve the most demanding AI models that power today’s generative AI and large language model innovation. Our A3 VMs combine NVIDIA H100 Tensor Core GPUs and Google’s leading networking advancements to serve customers of all sizes:
A3 is the first GPU instance to use our custom-designed 200 Gbps IPUs, with GPU-to-GPU data transfers bypassing the CPU host and flowing over separate interfaces from other VM networks and data traffic. This enables up to 10x more network bandwidth compared to our A2 VMs, with low tail latencies and high bandwidth stability.
Our industry-unique intelligent Jupiter data center networking fabric scales to tens of thousands of highly interconnected GPUs and allows for full-bandwidth reconfigurable optical links that can adjust the topology on demand. For almost every workload structure, we achieve workload bandwidth that is indistinguishable from more expensive off-the-shelf non-blocking network fabrics, resulting in a lower TCO.
The A3 supercomputer’s scale provides up to 26 exaFlops of AI performance, which considerably improves the time and costs for training large ML models.
As companies transition from training to serving their ML models, A3 VMs are also a strong fit for inference workloads, seeing up to a 30x inference performance boost when compared to our A2 VM’s that are powered by NVIDIA A100 Tensor Core GPU*.
Purpose-built for performance and scale
A3 GPU VMs were purpose-built to deliver the highest-performance training for today’s ML workloads, complete with modern CPU, improved host memory, next-generation NVIDIA GPUs and major network upgrades. Here are the key features of the A3:
8 H100 GPUs utilizing NVIDIA’s Hopper architecture, delivering 3x compute throughput
3.6 TB/s bisectional bandwidth between A3’s 8 GPUs via NVIDIA NVSwitch and NVLink 4.0
Next-generation 4th Gen Intel Xeon Scalable processors
2TB of host memory via 4800 MHz DDR5 DIMMs
10x greater networking bandwidth powered by our hardware-enabled IPUs, specialized inter-server GPU communication stack and NCCL optimizations
A3 GPU VMs are a step forward for customers developing the most advanced ML models. By considerably speeding up the training and inference of ML models, A3 VMs enable businesses to train more complex ML models at a fast speed, creating an opportunity for our customer to build large language models (LLMs), generative AI, and diffusion models to help optimize operations and stay ahead of the competition.
This announcement builds on our partnership with NVIDIA to offer a full range of GPU options for training and inference of ML models to our customers.
“Google Cloud's A3 VMs, powered by next-generation NVIDIA H100 GPUs, will accelerate training and serving of generative AI applications,” said Ian Buck, vice president of hyperscale and high performance computing at NVIDIA. “On the heels of Google Cloud’s recently launched G2 instances, we're proud to continue our work with Google Cloud to help transform enterprises around the world with purpose-built AI infrastructure.”
Fully-managed AI infrastructure optimized for performance and cost
For customers looking to develop complex ML models without the maintenance, you can deploy A3 VMs on Vertex AI, an end-to-end platform for building ML models on fully-managed infrastructure that’s purpose-built for low-latency serving and high-performance training. Today, at Google I/O 2023, we’re pleased to build on these offerings by both opening generative AI support in Vertex AI to more customers, and by introducing new features and foundation models.
For customers looking to architect their own custom software stack, customers can also deploy A3 VMs on Google Kubernetes Engine (GKE) and Compute Engine, so that you can train and serve the latest foundation models, while enjoying support for autoscaling, workload orchestration, and automatic upgrades.
“Google Cloud's A3 VM instances provide us with the computational power and scale for our most demanding training and inference workloads. We're looking forward to taking advantage of their expertise in the AI space and leadership in large-scale infrastructure to deliver a strong platform for our ML workloads.” -Noam Shazeer, CEO, Character.AI
At Google Cloud, AI is in our DNA. We’ve applied decades of experience running global scale computing for AI. We designed that infrastructure to scale and be optimized for running a wide variety of AI workloads — and now, we’re making it available to you. To join the Preview waitlist for the A3, please register with this link.
*Data source: https://www.nvidia.com/en-us/data-center/h100/