Tensor Processing Units (TPUs)

Engineered for next-generation AI

Tensor Processing Units (TPUs) are custom accelerators co-designed with open software to power the entire AI lifecycle—from frontier training and large-scale inference to the multi-step reasoning data demands of Agentic AI.

Powered by AI Hypercomputer

TPUs are powered by Google Cloud's AI Hypercomputer, a groundbreaking architecture that brings together purpose-built hardware, open software, and flexible consumption models.

A decade of Tensor Processing Units (TPUs)

TPUs are purpose-built for advanced AI workloads—from large language models and code generation to intelligent agents. They are the engine behind Gemini and Google’s flagship billion-user products, including Search, Photos, and Maps.

Engineered for next-generation AI

Tensor Processing Units (TPUs) are custom accelerators co-designed with open software to power the entire AI lifecycle—from frontier training and large-scale inference to the multi-step reasoning data demands of Agentic AI.

Powered by AI Hypercomputer

TPUs are powered by Google Cloud's AI Hypercomputer, a groundbreaking architecture that brings together purpose-built hardware, open software, and flexible consumption models.

A decade of Tensor Processing Units (TPUs)

TPUs are purpose-built for advanced AI workloads—from large language models and code generation to intelligent agents. They are the engine behind Gemini and Google’s flagship billion-user products, including Search, Photos, and Maps.

Benefits

Open, flexible, and reliable operations

Build on an open ecosystem using familiar libraries and tools. TPUs provide native, high-performance support for PyTorch and JAX, and support the vLLM engine for fast inference. Manage and scale these deployments reliably across global clusters with Google Kubernetes Engine (GKE).

Performance without compromise

1 million+ TPUs in a single logical training cluster

Turn months of training into weeks for frontier models. With up to 1 million TPU chips in a single cluster, TPUs maximize goodput, ensuring nearly every compute cycle goes toward active learning.

Benefits

Open, flexible, and reliable operations

Build on an open ecosystem using familiar libraries and tools. TPUs provide native, high-performance support for PyTorch and JAX, and support the vLLM engine for fast inference. Manage and scale these deployments reliably across global clusters with Google Kubernetes Engine (GKE).

Performance without compromise

Turn months of training into weeks for frontier models. With up to 1 million TPU chips in a single cluster, TPUs maximize goodput, ensuring nearly every compute cycle goes toward active learning.

TPU 8i - Optimized for inference and reinforcement learning

TPU versions

TPU 8t

TPU 8t is built for large-scale pre-training and embedding-heavy workloads at a scale of 9,600 chips in a single superpod, delivering the high-density compute required for frontier models with nearly 3x the compute performance per pod over the previous generation.

Coming soon.

TPU 8i

TPU 8i is optimized for post-training and inference. It provides an 80% performance-per-dollar improvement over previous generations for low-latency inference for large MoE models.

Coming soon.

Ironwood - 7th generation

7th-generation energy-efficient TPU engineered for large-scale training, reasoning, and inference. Features 9,216 liquid-cooled chips per pod, provides 42.5 ExaFlops and 4X better performance per chip over Trillium.

Generally available.

Documentation | Pricing

Trillium - 6th generation

Sixth-generation TPU featuring improved energy efficiency and peak compute performance for training and inference. Operates with 67% more energy efficiency and provides 4.7x higher peak compute performance per chip compared to previous generation TPU v5e.

Generally available.

Documentation | Pricing

TPU versions

TPU 8t

TPU 8t is built for large-scale pre-training and embedding-heavy workloads at a scale of 9,600 chips in a single superpod, delivering the high-density compute required for frontier models with nearly 3x the compute performance per pod over the previous generation.

Coming soon.

TPU 8i

TPU 8i is optimized for post-training and inference. It provides an 80% performance-per-dollar improvement over previous generations for low-latency inference for large MoE models.

Coming soon.

Ironwood - 7th generation

7th-generation energy-efficient TPU engineered for large-scale training, reasoning, and inference. Features 9,216 liquid-cooled chips per pod, provides 42.5 ExaFlops and 4X better performance per chip over Trillium.

Generally available.

Documentation | Pricing

Trillium - 6th generation

Sixth-generation TPU featuring improved energy efficiency and peak compute performance for training and inference. Operates with 67% more energy efficiency and provides 4.7x higher peak compute performance per chip compared to previous generation TPU v5e.

Generally available.

Documentation | Pricing

TPU software stack

TPU software stack is engineered to bridge the gap between high-level machine learning code and custom silicon to optimize raw hardware efficiency.

JAX

Compile complex mathematical operations at near-native hardware speed on TPUs. This open-source library for numerical computation and automatic differentiation scales effortlessly across massive distributed TPU clusters using an explicit device mesh.

View Github repository

TorchTPU & PyTorch

Run PyTorch workloads directly on TPUs using standard syntax. Built with Meta, this native, "eager-first" stack enables the migration of existing codebases with minimal changes to unlock major cost-performance benefits—without requiring engineering teams to learn a new framework.

Read the article

OpenXLA

Reduce compilation bottlenecks to extract high performance from TPU hardware. This open-source machine learning compiler automatically fuses operations, optimizes memory usage, and eliminates framework overhead for linear algebra computations across TPU backends.

View Github repository

MaxText

Slash training timelines for massive foundation models. This JAX-based reference implementation provides a highly scalable, out-of-the-box blueprint to achieve peak hardware utilization during LLM pre-training on TPUs.

View Github repository

Tunix

Streamline LLM post-training and alignment on TPU infrastructure. This lightweight, open-source JAX library delivers scalable workflows for Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to efficiently turn raw models into production-ready agents.

View Github repository

vLLM

Maximize serving concurrency and drive down operational costs for real-time TPU inference. This high-throughput open-source engine leverages memory management techniques like PagedAttention to eliminate waste and deliver highly efficient LLM serving on TPU architecture.

View Github repository

TPU software stack

TPU software stack is engineered to bridge the gap between high-level machine learning code and custom silicon to optimize raw hardware efficiency.

JAX

Compile complex mathematical operations at near-native hardware speed on TPUs. This open-source library for numerical computation and automatic differentiation scales effortlessly across massive distributed TPU clusters using an explicit device mesh.

View Github repository

TorchTPU & PyTorch

Run PyTorch workloads directly on TPUs using standard syntax. Built with Meta, this native, "eager-first" stack enables the migration of existing codebases with minimal changes to unlock major cost-performance benefits—without requiring engineering teams to learn a new framework.

Read the article

OpenXLA

Reduce compilation bottlenecks to extract high performance from TPU hardware. This open-source machine learning compiler automatically fuses operations, optimizes memory usage, and eliminates framework overhead for linear algebra computations across TPU backends.

View Github repository

MaxText

Slash training timelines for massive foundation models. This JAX-based reference implementation provides a highly scalable, out-of-the-box blueprint to achieve peak hardware utilization during LLM pre-training on TPUs.

View Github repository

Tunix

Streamline LLM post-training and alignment on TPU infrastructure. This lightweight, open-source JAX library delivers scalable workflows for Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to efficiently turn raw models into production-ready agents.

View Github repository

vLLM

Maximize serving concurrency and drive down operational costs for real-time TPU inference. This high-throughput open-source engine leverages memory management techniques like PagedAttention to eliminate waste and deliver highly efficient LLM serving on TPU architecture.

View Github repository

TPUs are powering the innovators

Citadel Securities runs workloads up to 4x faster with Ironwood

In financial markets, every nanosecond matters. Citadel Securities partnered with Google Cloud to build a scalable, cloud-based, quantitative research environment capable of running workloads 2 to 4 times faster with 30% lower cost. With an infrastructure built on Ironwood TPUs, workloads that used to take weeks or days are now executed in hours or even minutes, helping customers all around the world meet their investment objectives.

Nuro is creating a safe path to a driverless future with Google Cloud

Nuro has significantly accelerated the development of its autonomous driving technology by leveraging Google Cloud TPUs and Google Kubernetes Engine, training AI models twice as fast without incremental cost. By processing petabytes of real-world driving data and running large-scale safety simulations, they are continuously improving the Nuro Driver to safely navigate any vehicle—from passenger cars to semi-trucks—across public roads.

Lightricks trains video diffusion models at scale with JAX on TPU

Lightricks has significantly accelerated the training of its 13-billion-parameter generative video models by migrating its codebase to JAX on Google Cloud TPUs, increasing daily training steps by 40%. By overcoming previous scalability limits, this strategic shift unlocked linear scaling across thousands of TPU cores and doubled developer productivity, enabling Lightricks to bring high-performance, highly controllable AI video tools to the creator economy much faster.

Additional resources

Technical Use Cases

Articles

Building production AI on Google Cloud TPUs with JAX
JAX has become a key framework for developing state-of-the-art foundation models across the AI landscape, and not just at Google. In this article, we are excited to share an overview of the JAX AI Stack — a robust, end-to-end platform based on JAX, the core numerical library, into an industrial-grade solution for machine learning at any scale.
Read the article

Documentation

Take the next steps

Frequently Asked Questions

What is a Tensor Processing Unit (TPU)?

TPUs are custom-designed application-specific integrated circuits (ASICs) developed by Google. They are purpose-built from the ground up to accelerate machine learning and artificial intelligence workloads. TPUs are optimized specifically for the heavy matrix multiplication and tensor operations that act as the fundamental building blocks of neural networks, powering everything from large language models to complex reasoning agents.

How does the TPU architecture work?

At the heart of a TPU is a specialized Matrix Multiplier Unit (MXU) that processes massive amounts of matrix operations simultaneously. TPUs utilize a "systolic array" architecture that reads data once and flows it continuously through thousands of arithmetic logic units (ALUs), accumulating results without needing to constantly read and write to memory. They also support reduced precision arithmetic (like 16-bit or 8-bit floating-point), allowing for millions of computations per second without sacrificing the accuracy required for AI models.

Which machine learning frameworks do Cloud TPUs support?

Cloud TPUs offer native, high-performance support for leading machine learning frameworks, primarily TensorFlow, PyTorch, and JAX. Code written in these frameworks is compiled by the Accelerated Linear Algebra (XLA) compiler, which automatically optimizes the computational graph to run efficiently on the underlying TPU hardware.

What types of workloads are best suited for TPUs?

TPUs excel at deep learning tasks that require massive parallel matrix operations. They are highly recommended for:

Models dominated by matrix computations
Training runs that take weeks or months to converge
Large foundation models and large language models (LLMs)
Workloads with ultra-large embeddings, such as advanced recommendation engines
Continuous reinforcement learning and high-volume, low-latency inference

What are TPU pods and slices?

A TPU Pod is a massive physical cluster of TPU chips connected together over a specialized, high-speed network (such as Google's Optical Circuit Switching or Virgo network). A single superpod can contain thousands of interconnected chips. A "slice" is a smaller, dedicated subset of a pod that you can rent. This networking architecture allows developers to scale their AI workloads across massive amounts of compute with little to no code changes, effectively operating the entire slice as a single unified machine.

What is ML "Goodput" in the context of TPU training?

Goodput refers to the actual, productive time your infrastructure spends actively training your model, rather than idling. Training massive foundation models involves complex orchestration across thousands of chips, meaning compute cycles can easily be wasted on data transfer delays, hardware resets, or checkpointing. Cloud TPUs are engineered to maximize Goodput through high-bandwidth interconnects and optimized memory caching, ensuring that paid compute cycles contribute directly to model advancement.

How is orchestration handled at scale for TPUs?

Users can support reliability, comprehensive monitoring, and management using Google Kubernetes Engine (GKE) and Cluster Director. By leveraging GKE for TPU, customers operate in a cloud-native ecosystem featuring Inference Gateway for load balancing and the ability to scale deployments up to 130,000 GKE nodes.