TPU v5e
This document describes the architecture and supported configurations of Cloud TPU v5e.
TPU v5e supports single and multi-host training and single-host inference. Multi-host inference is supported using Sax. For more information, see Large Language Model Serving.
System architecture
Each v5e chip contains one TensorCore. Each TensorCore has four matrix-multiply units (MXUs), a vector unit, and a scalar unit.
The following diagram illustrates a TPU v5e chip.
The following table shows the key chip specifications and their values for v5e.
Key chip specifications | v5e values |
Peak compute per chip (bf16) | 197 TFLOPs |
HBM2 capacity and bandwidth | 16 GB, 819 GBps |
Interchip Interconnect BW | 1600 Gbps |
The following table shows Pod specifications and their values for v5e.
Key Pod specifications | v5e values |
TPU Pod size | 256 chips |
Interconnect topology | 2D Torus |
Peak compute per Pod | 100 PetaOps(Int8) |
All-reduce bandwidth per Pod | 51.2 TB/s |
Bisection bandwidth per Pod | 1.6 TB/s |
Data center network bandwidth per Pod | 6.4 Tbps |
Configurations
Cloud TPU v5e is a combined training and inference (serving) product. To
differentiate between a training and an inference environment, use the
AcceleratorType
parameter with the TPU API or the --machine-type
flag
when creating a GKE node pool.
Training jobs are optimized for throughput and availability, while serving jobs are optimized for latency. A training job on TPUs provisioned for serving could have lower availability and similarly, a serving job executed on TPUs provisioned for training could have higher latency.
You use AcceleratorType
to specify the number of TensorCores you want to use.
You specify the AcceleratorType
when creating a TPU using the
gcloud CLI or the Google Cloud console. The value you
specify for AcceleratorType
is a string with the format:
v$VERSION_NUMBER-$CHIP_COUNT
.
The following 2D slice shapes are supported for v5e:
Topology | Number of TPU chips | Number of Hosts |
1x1 | 1 | 1/8 |
2x2 | 4 | 1/2 |
2x4 | 8 | 1 |
4x4 | 16 | 2 |
4x8 | 32 | 4 |
8x8 | 64 | 8 |
8x16 | 128 | 16 |
16x16 | 256 | 32 |
Each TPU VM in a v5e TPU slice contains 1, 4 or 8 chips. In 4-chip and smaller slices, all TPU chips share the same Non Uniform Memory Access (NUMA) node.
For 8-chip v5e TPU VMs, CPU-TPU communication will be more efficient within NUMA
partitions. For example, in the following figure, CPU0-Chip0
communication will
be faster than CPU0-Chip4
communication.
Cloud TPU v5e types for serving
Single-host serving is supported for up to 8 v5e chips. The following configurations are supported: 1x1, 2x2 and 2x4 slices. Each slice has 1, 4 and 8 chips respectively.
To provision TPUs for a serving job, use one of the following accelerator types in your CLI or API TPU creation request:
AcceleratorType (TPU API) | Machine type (GKE API) |
---|---|
v5litepod-1 |
ct5lp-hightpu-1t |
v5litepod-4 |
ct5lp-hightpu-4t |
v5litepod-8 |
ct5lp-hightpu-8t |
The following command creates a v5e TPU slice with 8 v5e chips for serving:
$ gcloud compute tpus tpu-vm create your-tpu-name \ --zone=us-central1-a \ --accelerator-type=v5litepod-8 \ --version=v2-alpha-tpuv5-lite
For more information about managing TPUs, see Manage TPUs. For more information about the system architecture of Cloud TPU, see System architecture.
Serving on more than 8 v5e chips, also called multi-host serving, is supported using Sax. For more information, see Large Language Model Serving.
Cloud TPU v5e types for training
Training is supported for up to 256 chips.
To provision TPUs for a v5e training job, use one of the following accelerator types in your CLI or API TPU creation request:
AcceleratorType (TPU API) | Machine type (GKE API) | Topology |
---|---|---|
v5litepod-16 |
ct5lp-hightpu-4t |
4x4 |
v5litepod-32 |
ct5lp-hightpu-4t |
4x8 |
v5litepod-64 |
ct5lp-hightpu-4t |
8x8 |
v5litepod-128 |
ct5lp-hightpu-4t |
8x16 |
v5litepod-256 |
ct5lp-hightpu-4t |
16x16 |
The following command creates a v5e TPU slice with 256 v5e chips for training:
$ gcloud compute tpus tpu-vm create your-tpu-name \ --zone=us-east5-a \ --accelerator-type=v5litepod-256 \ --version=v2-alpha-tpuv5-lite
For more information about managing TPUs, see Manage TPUs.
For more information about the system architecture of Cloud TPU, see
System architecture.