Announcing the general availability of Trillium, our sixth generation and most advanced TPU to date.
Cloud TPUs optimize performance and cost for all AI workloads, from training to inference. Using world-class data center infrastructure, TPUs offer high reliability, availability, and security.
Not sure if TPUs are the right fit? Learn about when to use GPUs or CPUs on Compute Engine instances to run your machine learning workloads.
Overview
Google Cloud TPUs are custom-designed AI accelerators, which are optimized for training and inference of large AI models. They are ideal for a variety of use cases, such as chatbots, code generation, media content generation, synthetic speech, vision services, recommendation engines, personalization models, among others.
Cloud TPUs are designed to scale cost-efficiently for a wide range of AI workloads, spanning training, fine-tuning, and inference. Cloud TPUs provide the versatility to accelerate workloads on leading AI frameworks, including PyTorch, JAX, and TensorFlow. Seamlessly orchestrate large-scale AI workloads through Cloud TPU integration in Google Kubernetes Engine (GKE). Leverage Dynamic Workload Scheduler to improve the scalability of workloads by scheduling all accelerators needed simultaneously. Customers looking for the simplest way to develop AI models can also leverage Cloud TPUs in Vertex AI, a fully-managed AI platform.
Cloud TPUs are optimized for training large and complex deep learning models that feature many matrix calculations, for instance building large language models (LLMs). Cloud TPUs also have SparseCores, which are dataflow processors that accelerate models relying on embeddings found in recommendation models. Other use cases include healthcare, like protein folding modeling and drug discovery.
A GPU is a specialized processor originally designed for manipulating computer graphics. Their parallel structure makes them ideal for algorithms that process large blocks of data commonly found in AI workloads. Learn more.
A TPU is an application-specific integrated circuit (ASIC) designed by Google for neural networks. TPUs possess specialized features, such as the matrix multiply unit (MXU) and proprietary interconnect topology that make them ideal for accelerating AI training and inference.
Cloud TPU versions
Cloud TPU version | Description | Availability |
---|---|---|
Trillium | The most advanced Cloud TPU to date | During preview, Trillium is available in North America (US East region), Europe (West region), and Asia (Northeast region) |
Cloud TPU v5p | The most powerful Cloud TPU for training AI models | Cloud TPU v5p is generally available in North America (US East region) |
Cloud TPU v5e | A versatile Cloud TPU for training and inference needs | Cloud TPU v5e is generally available in North America (US Central/East/South/ West regions), Europe (West region), and Asia (Southeast region) |
Additional information on Cloud TPU versions
Trillium
The most advanced Cloud TPU to date
During preview, Trillium is available in North America (US East region), Europe (West region), and Asia (Northeast region)
Cloud TPU v5p
The most powerful Cloud TPU for training AI models
Cloud TPU v5p is generally available in North America (US East region)
Cloud TPU v5e
A versatile Cloud TPU for training and inference needs
Cloud TPU v5e is generally available in North America (US Central/East/South/ West regions), Europe (West region), and Asia (Southeast region)
Additional information on Cloud TPU versions
How It Works
Get an inside look at the magic of Google Cloud TPUs, including a rare inside view of the data centers where it all happens. Customers use Cloud TPUs to run some of the world's largest AI workloads and that power comes from much more than just a chip. In this video, take a look at the components of the TPU system, including data center networking, optical circuit switches, water cooling systems, biometric security verification and more.
Common Uses
Get started quickly with MaxText and MaxDiffusion, high performance, highly scalable open source reference deployments for large model training.
Maximize performance, efficiency, and time to value with Cloud TPUs. Scale to thousands of chips with Cloud TPU Multislice training. Measure and improve large scale ML training productivity with ML Goodput Measurement. Get started quickly with MaxText and MaxDiffusion, open source reference deployments for large model training.
Get started quickly with MaxText and MaxDiffusion, high performance, highly scalable open source reference deployments for large model training.
Maximize performance, efficiency, and time to value with Cloud TPUs. Scale to thousands of chips with Cloud TPU Multislice training. Measure and improve large scale ML training productivity with ML Goodput Measurement. Get started quickly with MaxText and MaxDiffusion, open source reference deployments for large model training.
Accelerate AI Inference with JetStream and MaxDiffusion. JetStream is a new inference engine specifically designed for Large Language Model (LLM) inference. JetStream represents a significant leap forward in both performance and cost efficiency, offering unparalleled throughput and latency for LLM inference on Cloud TPUs. MaxDiffusion is a set of diffusion model implementations optimized for Cloud TPUs, making it easy to run inference for diffusion models on Cloud TPUs with high performance.
Cloud TPU v5e enables high-performance and cost-effective inference for a wide range of AI workloads, including the latest LLMs and Gen AI models. TPU v5e delivers up to 2.5x more throughput performance per dollar and up to 1.7x speedup over Cloud TPU v4. Each TPU v5e chip provides up to 393 trillion int8 operations per second, allowing complex models to make fast predictions. A TPU v5e pod delivers up to 100 quadrillion int8 operations per second, or 100 petaOps of compute power.
Accelerate AI Inference with JetStream and MaxDiffusion. JetStream is a new inference engine specifically designed for Large Language Model (LLM) inference. JetStream represents a significant leap forward in both performance and cost efficiency, offering unparalleled throughput and latency for LLM inference on Cloud TPUs. MaxDiffusion is a set of diffusion model implementations optimized for Cloud TPUs, making it easy to run inference for diffusion models on Cloud TPUs with high performance.
Cloud TPU v5e enables high-performance and cost-effective inference for a wide range of AI workloads, including the latest LLMs and Gen AI models. TPU v5e delivers up to 2.5x more throughput performance per dollar and up to 1.7x speedup over Cloud TPU v4. Each TPU v5e chip provides up to 393 trillion int8 operations per second, allowing complex models to make fast predictions. A TPU v5e pod delivers up to 100 quadrillion int8 operations per second, or 100 petaOps of compute power.
A robust AI/ML platform considers the following layers: (i) Infrastructure orchestration that support GPUs for training and serving workloads at scale, (ii) Flexible integration with distributed computing and data processing frameworks, and (iii) Support for multiple teams on the same infrastructure to maximize utilization of resources.
Combine the power of Cloud TPUs with the flexibility and scalability of GKE to build and deploy machine learning models faster and more easily than ever before. With Cloud TPUs available in GKE, you can now have a single consistent operations environment for all your workloads, standardizing automated MLOps pipelines.
A robust AI/ML platform considers the following layers: (i) Infrastructure orchestration that support GPUs for training and serving workloads at scale, (ii) Flexible integration with distributed computing and data processing frameworks, and (iii) Support for multiple teams on the same infrastructure to maximize utilization of resources.
Combine the power of Cloud TPUs with the flexibility and scalability of GKE to build and deploy machine learning models faster and more easily than ever before. With Cloud TPUs available in GKE, you can now have a single consistent operations environment for all your workloads, standardizing automated MLOps pipelines.
For customers looking for a simplest way to develop AI models, you can deploy Cloud TPU v5e with Vertex AI, an end-to-end platform for building AI models on fully-managed infrastructure that’s purpose-built for low-latency serving and high-performance training.
For customers looking for a simplest way to develop AI models, you can deploy Cloud TPU v5e with Vertex AI, an end-to-end platform for building AI models on fully-managed infrastructure that’s purpose-built for low-latency serving and high-performance training.
Pricing
Cloud TPU pricing | All Cloud TPU pricing is per chip-hour | ||
---|---|---|---|
Cloud TPU Version | Evaluation Price (USD) | 1-year commitment (USD) | 3-year commitment (USD) |
Trillium | Starting at $2.7000 per chip-hour | Starting at $1.8900 per chip-hour | Starting at $1.2200 per chip-hour |
Cloud TPU v5p | Starting at $4.2000 per chip-hour | Starting at $2.9400 per chip-hour | Starting at $1.8900 per chip-hour |
Cloud TPU v5e | Starting at $1.2000 per chip-hour | Starting at $0.8400 per chip-hour | Starting at $0.5400 per chip-hour |
Cloud TPU pricing varies by product and region.
Cloud TPU pricing
All Cloud TPU pricing is per chip-hour
Trillium
Starting at
$2.7000
per chip-hour
Starting at
$1.8900
per chip-hour
Starting at
$1.2200
per chip-hour
Cloud TPU v5p
Starting at
$4.2000
per chip-hour
Starting at
$2.9400
per chip-hour
Starting at
$1.8900
per chip-hour
Cloud TPU v5e
Starting at
$1.2000
per chip-hour
Starting at
$0.8400
per chip-hour
Starting at
$0.5400
per chip-hour
Cloud TPU pricing varies by product and region.