AI Hypercomputer

Train, serve and operate your AI applications on the agent-native infrastructure powering Google.

AI Hypercomputer

Train, serve and operate your AI applications on the agent-native infrastructure powering Google.

What is AI Hypercomputer?

An architecture combining purpose-built hardware, open software, and flexible consumption. Each component is carefully integrated to work well together, improving your performance, cost, and developer productivity.

See the latest announcements (April, 2026): What’s next in Google AI infrastructure: Scaling for the agentic era

Smarter, faster training

Build models in weeks, not months. Use Google’s training stack to speed up development and testing without sacrificing performance.

11:08

Train and tune LLMs faster

Develop LLMs 36% faster and squeeze up to 97% productivity (Goodput) out of every accelerator using TPU 8t together with software co-designed with Google DeepMind and integrated with open source frameworks - from Pathways to Pallas (training), Ray to Agent Sandbox (tuning). We also know that one size doesn't fit all, so we partner closely with NVIDIA to deliver the latest GPUs; Google Cloud will be will be among the first to deliver instances based on the next-generation NVIDIA Vera Rubin NVL72 when it becomes available later this year.

Train lightweight models smarter using proprietary data

Use Gemini Enterprise Agent Platform with BigQuery to train models on proprietary data 16X faster by combining your data estate, ML development and accelerators in one place. Both are powered by AI Hypercomputer, whether you use G4 VMs or Ironwood TPUs.

Build adaptive physical agents with MuJoCo-Warp

Run GPU-based simulations on DeepMind’s MuJoCo-Warp, up to 100X faster than standard MuJoCo. Then simulate impossible, risky, or expensive edge cases using synthetic media from Veo, Genie and Nano Banana, or ingest petabytes of real-world sensor data in BigQuery. Learn more about building physical agents on Google Cloud here.

Responsive, efficient inference

Get validated model profiles plus fully-integrated Google and open software to boost application responsiveness with less complexity and waste.

Serve LLMs with near-zero latency

Use integrated inference technologies to deliver useful, responsive services to customers. Cut time-to-first-token by 71% with GKE Inference Gateway, serve up to 120k tokens per second using llm-d for disaggregated serving, and load models 5X faster using Anywhere Cache and TPU 8i to keep your working memory exactly where it’s needed.

Serve pre-built visual, perception, and media models

Deploy classical ML models 70% faster using one of 200+ models available on Gemini Enterprise Agent Platform, using on your choice of TPU or GPU, including A5X VMs (NVIDIA Vera Rubin) and TPU 8i when they become available later this year.

Serve agents safely and cost-effectively

Serve swarms of agents securely in GKE Agent Sandbox, provisioning up to 300 sandboxes per second while instantly pausing and resuming as needed, so you never pay for agents sitting idle.

Flexible, open, reliable operations

Use any framework or accelerator across hybrid and multicloud environments with automated cluster maintenance and management fit for exascale.

Switch between TPUs and GPUs without rewriting code

TorchTPU removes the TPU learning curve for developers by providing native PyTorch support, so you can use the best available accelerator without complex code rewrites.

Deploy AI in any environment at virtually any scale

Based on open source Kubernetes, GKE gives you multicloud portability with enterprise scale, supporting up to 130,000 nodes while integrating natively with Agent Platform and Google Distributed Cloud for hybrid deployments.

Automate cluster maintenance with advanced cluster diagnostics and observability tools

Every accelerator on AI Hypercomputer is supported by cluster director capabilities, including a pre-deployment bill of health, 360 degree observability dashboards and always-on health checks.

Connect multicloud workloads in minutes rather than weeks

Connect services across clouds without laggy connections using Cross-Cloud Network, a networking backbone trusted by over 65% of the Fortune 100 which moves over 27 exabytes of data per month.

Get accelerator capacity, your way

Our flexible consumption models give you multiple ways to schedule and reduce the cost of accelerators. Save up to 91% on batch or fault-tolerant jobs with Spot VMs, up to 50% on jobs with a flexible start date using Dynamic Workload Scheduler, and up to 50% off when you sign up for committed use discounts.

Agent-ready systems

Push the limits of performance and use energy responsibly as you scale on the infrastructure foundation trusted by Google and frontier AI Labs

De-risk your AI roadmap on a trusted foundation

Google Cloud supports 9 out of 10 top AI labs and 70 percent of funded AI startups. By deploying on AI Hypercomputer, you’re using data centers that reliably processed over 100 billion tokens for nearly 350 customers in December 2025 alone.

Achieve industry leading energy-efficiency

Google Cloud’s data centers, including AI Hypercomputer, deliver industry-leading energy efficiency, with six times more computing power per unit of electricity than five years ago. This enables our 8th generation TPU to deliver 80% better price-performance and 20% more energy-efficiency than the previous generation.

Reduce your impact on the grid and communities

Google is committed to paying for 100% of the power our data centers use and any new infrastructure costs directly driven by our growth. Partner with us to ensure that as your AI ambitions scale, local households and businesses don’t foot the bill. In the coming years, we will fund new power and infrastructure to serve our models, and continue investing in alternative energy sources like advanced nuclear, geothermal and long-duration storage.

Protect your most valuable IP from silicon to the edge

Our Titanium architecture’s custom Titan chips deliver a verifiable hardware root of trust and zero-trust security. Independent analysis from cloudvulndb.org shows that our systems experience up to 70% fewer critical vulnerabilities than other leading clouds.

Powering the world's leading innovators

How WPP accelerates humanoid robot training 10x with G4 VMs

WPP has significantly optimized humanoid robot training by leveraging Google Cloud’s G4 VMs and NVIDIA Isaac Sim, reducing reinforcement learning cycles from 24 hours to less than one hour. By mastering complex human movements like dancing in simulation, they are bridging the "sim-to-real" gap to enable more precise and natural robotic motion for the film and marketing industries.

Dive Deeper

AI turns sports fans into kit designers

Puma partnered with Google Cloud for its integrated AI infrastructure (AI Hypercomputer), allowing them to use Gemini for user prompts alongside Dynamic Workload Scheduler to dynamically scale inference on GPUs, dramatically reducing costs and generation time.

3:20

Dive Deeper

Helping frontline factory workers without coding expertise build their own AI solutions

Toyota chose Google Cloud because of Google Kubernetes Engine’s unique scaling performance — four times faster than competitors in their tests — which provided the critical speed and responsiveness needed to successfully democratize AI for frontline factory workers.

Dive Deeper

Building a powerful, bilingual foundation model to solve complex business problems

Their solution accelerated AI development, boosted performance by 1.3x, and enabled secure, enterprise-wide human-AI collaboration across LG's affiliates.

2:46

Dive Deeper

Major League Baseball serves teams and fans faster with agents on AI Hypercomputer

Major League Baseball used AI Hypercomputer to build AI agents, cutting development from months to weeks and incident response from hours to seconds.

3:19

Dive Deeper

Learn more about AI Hypercomputer

Analyst insights

Tutorials

Documentation

Deployment and orchestration options
This video explores two options for AI orchestration and cluster management on Google Cloud: the cloud native approach using Google Kubernetes Engine (GKE) and a high performance approach with Slurm and Cluster Director.
Video (3:39)
Watch the video

Start your AI journey today

Reach out to one of our infrastructure experts to brainstorm ideas, discuss your next project or see a demo.