AI Hypercomputer

Powering the age of inference with Ironwood TPUs and new Axion-based VMs.

Train, tune, and serve on an AI supercomputer

AI Hypercomputer is the integrated supercomputing system underneath every AI workload on Google Cloud. It is made up of hardware, software and consumption models designed to simplify AI deployment, improve system-level efficiency, and optimize costs.

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

Release Notes

Overview

AI-optimized hardware

Choose from compute (including AI accelerators), storage, and networking options optimized for granular, workload-level objectives, whether that's higher throughput, lower latency, faster time-to-results, or lower TCO. Learn more about: Cloud TPUs, Cloud GPUs, plus the latest in storage and networking.

Announcement: Anthropic to Expand Use of Google Cloud TPUs and Services

Anthropic announced plans to access up to 1 million TPUs to train and serve Claude models, worth tens of billions of dollars. Anthropic chose TPUs due to their price-performance and efficiency.

Read the press release

Leading software, open frameworks

Get more from your hardware with industry-leading software, integrated with open frameworks, libraries, and compilers to make AI development, integration, and management more efficient.

Support for PyTorch, JAX, Keras, vLLM, Megatron-LM, NeMo Megatron, MaxText, MaxDiffusion, and many more.
Deep integration with the XLA compiler allows for interoperability between different accelerators, while Pathways on Cloud allows you to use the same distributed runtime that powers Google’s internal large-scale training and inference infrastructure.
All of this is deployable in your environment of choice, whether that's Google Kubernetes Engine, Cluster Director or Google Compute Engine.

Video: Hear from Moloco, LG, and Shopify

Learn how they leverage Google Cloud’s AI solutions to drive innovation and transform their businesses

Watch on-demand

Flexible consumption models

Flexible consumption options allow customers to choose fixed costs with committed use discounts or dynamic on-demand models to meet your business needs. Dynamic Workload Scheduler and Spot VMs can help you get the capacity you need without over allocating. Plus, Google Cloud's cost optimization tools help automate resource utilization to reduce manual tasks for engineers.

Updates to Dynamic Workload Scheduler

Read this blog to get the latest on Dynamic Workload Scheduler, and other AI Hypercomputer updates.

Read the blog

How It Works

In this keynote from the AI Infra Summit 2025, a Google Cloud leader outlines what's next for the foundations of AI and how to use AI Hypercomputer for inference, outlining our latest technology best practices you can use today.

Common Uses

Cost-effectively serve models at scale

Maximize price-performance and reliability for inference workloads

Inference is quickly becoming more diverse and complex, evolving in three main areas:

First, how we interact with AI is changing. Conversations now have much longer and more diverse context.
Second, sophisticated reasoning and multi-step inference are making Mixture-of-Experts (MoE) models more common. This is redefining how memory and compute scale from initial input to final output.
Finally, it's clear that the real value isn't just about raw tokens per dollar, but the usefulness of the response. Does the model have the right expertise? Did it answer a critical business question correctly? That's why we believe customers need better measurements, focusing on the total cost of system operations, not the price of their processors.

How-tos

Maximize price-performance and reliability for inference workloads

Inference is quickly becoming more diverse and complex, evolving in three main areas:

First, how we interact with AI is changing. Conversations now have much longer and more diverse context.
Second, sophisticated reasoning and multi-step inference are making Mixture-of-Experts (MoE) models more common. This is redefining how memory and compute scale from initial input to final output.
Finally, it's clear that the real value isn't just about raw tokens per dollar, but the usefulness of the response. Does the model have the right expertise? Did it answer a critical business question correctly? That's why we believe customers need better measurements, focusing on the total cost of system operations, not the price of their processors.

Additional resources

Explore AI Inference resources

What is AI inference? Our comprehensive guide to types, comparisons, and use cases
Run best practice inference recipes with GKE Inference Quickstart
Take a course on AI inference on Cloud Run
Watch this video on the secret to cost-efficient AI inference
Discover how to accelerate AI inference workloads

Customer examples

AI turns sports fans into kit designers

PUMA partnered with Google Cloud for its integrated AI infrastructure (AI Hypercomputer), allowing them to use Gemini for user prompts alongside Dynamic Workload Scheduler to dynamically scale inference on GPUs, dramatically reducing costs and generation time.

Impact:

They slashed AI kit generation time from 2-5 minutes down to just 30 seconds. This transformed the platform into a fast, truly interactive experience that kept users engaged.
In just 10 days, fans created 180,000 kits and cast 1.7 million ratings.
The project proved a new way for PUMA to connect with its community. It moved beyond a simple brand-to-consumer relationship by successfully turning fans into active co-creators, providing the company with direct, real-time insight into the creative desires of its most passionate consumers.

New Way Now: With AI Creator, PUMA fans get a shot at designing real-life kits

Run large-scale AI training and pre-training

Powerful, scalable, and efficient AI training

Training workloads need to run as highly synchronized jobs across thousands of nodes in tightly coupled clusters. A single degraded node can disrupt an entire job, delaying time-to-market. You need to:

Ensure the cluster is set up quickly and tuned for the workload in question
Predict failures and troubleshoot them quickly
And continue with a workload, even when failures do happen

We want to make it extremely easy for customers to deploy and scale training workloads on Google Cloud.