AI Hypercomputer is the integrated supercomputing system underneath every AI workload on Google Cloud. It is made up of hardware, software and consumption models designed to simplify AI deployment, improve system-level efficiency, and optimize costs.
Overview
Choose from compute (including AI accelerators), storage, and networking options optimized for granular, workload-level objectives, whether that's higher throughput, lower latency, faster time-to-results, or lower TCO. Learn more about: Cloud TPUs, Cloud GPUs, plus the latest in storage and networking.
Get more from your hardware with industry-leading software, integrated with open frameworks, libraries, and compilers to make AI development, integration, and management more efficient.
Flexible consumption options allow customers to choose fixed costs with committed use discounts or dynamic on-demand models to meet your business needs. Dynamic Workload Scheduler and Spot VMs can help you get the capacity you need without over allocating. Plus, Google Cloud's cost optimization tools help automate resource utilization to reduce manual tasks for engineers.
Common Uses
Inference is quickly becoming more diverse and complex, evolving in three main areas:
PUMA partnered with Google Cloud for its integrated AI infrastructure (AI Hypercomputer), allowing them to use Gemini for user prompts alongside Dynamic Workload Scheduler to dynamically scale inference on GPUs, dramatically reducing costs and generation time.
Impact:
Inference is quickly becoming more diverse and complex, evolving in three main areas:
PUMA partnered with Google Cloud for its integrated AI infrastructure (AI Hypercomputer), allowing them to use Gemini for user prompts alongside Dynamic Workload Scheduler to dynamically scale inference on GPUs, dramatically reducing costs and generation time.
Impact:
Training workloads need to run as highly synchronized jobs across thousands of nodes in tightly coupled clusters. A single degraded node can disrupt an entire job, delaying time-to-market. You need to:
We want to make it extremely easy for customers to deploy and scale training workloads on Google Cloud.
To create an AI cluster, get started with one of our tutorials:
Moloco relied on AI Hypercomputer's fully integrated stack to automatically scale on advanced hardware like TPUs and GPUs, which freed up Moloco engineers, while integration with Google's industry-leading data platform created a cohesive, end-to-end system for AI workloads.
After launching its first deep learning models, Moloco experienced hockey-stick growth and profitability, growing 5x in 2.5 years and achieved.

AssemblyAI
AssemblyAI uses Google Cloud to train models quickly and at-scale

LG AI Research dramatically cut costs and accelerated development while adhering to strict data security and residency requirements

Anthropic announced plans to access up to 1 million TPUs to train and serve Claude models, worth tens of billions of dollars. But how are they running on Google Cloud? Watch this video to see how Anthropic is pushing the computing limits of AI at scale with GKE.
Training workloads need to run as highly synchronized jobs across thousands of nodes in tightly coupled clusters. A single degraded node can disrupt an entire job, delaying time-to-market. You need to:
We want to make it extremely easy for customers to deploy and scale training workloads on Google Cloud.
To create an AI cluster, get started with one of our tutorials:
Moloco relied on AI Hypercomputer's fully integrated stack to automatically scale on advanced hardware like TPUs and GPUs, which freed up Moloco engineers, while integration with Google's industry-leading data platform created a cohesive, end-to-end system for AI workloads.
After launching its first deep learning models, Moloco experienced hockey-stick growth and profitability, growing 5x in 2.5 years and achieved.

AssemblyAI
AssemblyAI uses Google Cloud to train models quickly and at-scale

LG AI Research dramatically cut costs and accelerated development while adhering to strict data security and residency requirements

Anthropic announced plans to access up to 1 million TPUs to train and serve Claude models, worth tens of billions of dollars. But how are they running on Google Cloud? Watch this video to see how Anthropic is pushing the computing limits of AI at scale with GKE.
Google Cloud provides images that contain common operating systems, frameworks, libraries, and drivers. AI Hypercomputer optimizes these pre-configured images to support your AI workloads.
"Working with Google Cloud to incorporate generative AI allows us to create a bespoke travel concierge within our chatbot. We want our customers to go beyond planning a trip and help them curate their unique travel experience." Martin Brodbeck, CTO, Priceline
Google Cloud provides images that contain common operating systems, frameworks, libraries, and drivers. AI Hypercomputer optimizes these pre-configured images to support your AI workloads.
"Working with Google Cloud to incorporate generative AI allows us to create a bespoke travel concierge within our chatbot. We want our customers to go beyond planning a trip and help them curate their unique travel experience." Martin Brodbeck, CTO, Priceline
FAQ
While individual services offer specific capabilities, AI Hypercomputer provides an integrated system where hardware, software, and consumption models are designed to work optimally together. This integration delivers system-level efficiencies in performance, cost, and time-to-market that are harder to achieve by stitching together disparate services. It simplifies complexity and provides a holistic approach to AI infrastructure.
Yes, AI Hypercomputer is designed with flexibility in mind. Technologies like Cross-Cloud Interconnect provide high-bandwidth connectivity to on-premises data centers and other clouds, facilitating hybrid and multicloud AI strategies. We operate with open standards and integrate popular third-party software to enable you to build solutions that span multiple environments, and change services as you please.
Security is a core aspect of AI Hypercomputer. It benefits from Google Cloud’s multi-layered security model. Specific features include Titan security microcontrollers (ensuring systems boot from a trusted state), RDMA Firewall (for zero-trust networking between TPUs/GPUs during training), and integration with solutions like Model Armor for AI safety. These are complemented by robust infrastructure security policies and principles like the Secure AI Framework.
No. AI Hypercomputer can be used for any sized workload. Smaller sized workloads still realize all the benefits of an integrated system, such as efficiency and simplified deployment. AI Hypercomputer also supports customers as their businesses scale, from small proof-of-concepts and experiments to large scale production deployments.
For most customers, a managed AI platform like Vertex AI is the easiest way to get started with AI because it has all of the tools, templates, and models baked in. Plus, Vertex AI is powered by AI Hypercomputer under the hood in a way that is optimized on your behalf. Vertex AI is the easiest way to get started because it’s the simplest experience. If you prefer to configure and optimize every component of your infrastructure, you can access AI Hypercomputer’s components as infrastructure and assemble it in a way that meets your needs.
Yes, we are building a library of recipes in Github. You can also use the Cluster Toolkit for pre-built cluster blueprints.
AI-optimized hardware
Storage
Networking
Compute: Access Google Cloud TPUs (Trillium), NVIDIA GPUs (Blackwell), and CPUs (Axion). This allows for optimization based on specific workload needs for throughput, latency, or TCO.
Leading software and open frameworks
Consumption models: