Plan TPUs in GKE

Autopilot Standard

This page describes how to plan your usage of Tensor Processing Units (TPUs) in Google Kubernetes Engine (GKE) to reduce the risk of TPU misconfiguration, non-availability errors, or out-of-quota interruptions.

Before you use TPUs in GKE, ensure that you are familiar with TPUs definitions and terminology in GKE.

Plan your TPU configuration

To work with TPUs in GKE clusters, you must plan their configuration. We recommend that you follow these steps:

Choose a GKE mode of operation: Run your workloads on TPUs in a GKE Autopilot or Standard cluster.

Best practice:
Use an Autopilot cluster for a fully managed Kubernetes experience.
Choose the TPU version: Different TPU types have different capabilities, like price-performance ratios, training throughput, and serving latency. The TPU types affect the available CPU and memory capacities.
Validate TPU availability: TPUs are available in specific Google Cloud regions. To use a TPU type in your GKE workload, your cluster must be in a supported region for that type.
Choose the TPU Topology: The physical arrangement of the TPUs within a TPU slice. Select a topology that matches your model's parallelism requirements.

Use the reference tables on this page to identify if your node pools are single-host or multi-host TPU slice nodes.

Choose a GKE mode of operation

You can use TPUs in the available GKE modes of operation for clusters:

Autopilot mode (recommended): GKE manages the underlying infrastructure such as node configuration, autoscaling, auto-upgrades, baseline security configurations, and baseline networking configuration. In Autopilot, you choose a TPU type and topology, then specify them in your Kubernetes manifest. GKE manages provisioning nodes with TPUs and scheduling your workloads.
Standard mode: You manage the underlying infrastructure, including configuring the individual nodes.

To choose the GKE mode of operation that's the best fit for your workloads, see Choose a GKE mode of operation.

Choose a TPU consumption option

When you plan your TPU configuration in GKE, select a consumption option that aligns with your workload needs. Your choice of consumption option impacts the available TPU versions and the quota you need to configure. GKE offers the following TPU consumption options to help you optimize resource allocation and cost while maintaining workload performance:

Flex-start: to provision Flex-start VMs for up to seven days, with GKE automatically allocating the hardware on a best-effort basis based on availability. For more information, see About GPU and TPU provisioning with flex-start provisioning mode.
Spot VMs: to provision Spot VMs, you can get significant discounts, but Spot VMs can be preempted at any time, with a 30-second warning. For more information, see Spot VMs.
Future reservation for up to 90 days (in calendar mode): to provision TPU resources for up to 90 days, for a specified time period. For more information, see Request TPUs with future reservation in calendar mode.
TPU reservations: to request a future reservation for one year or longer.

To choose the consumption option that meets your workload requirements, see About accelerator consumption options for AI/ML workloads in GKE.

Choose the TPU version

The VMs in a TPU slice have the following technical characteristics.

Autopilot

TPU version	Machine type	Number of vCPUs	Memory (GiB)	Number of NUMA nodes	Maximum TPU chips in a TPU slice node
TPU Trillium (v6e)	`tpu-v6e-slice`	44 to 180	176 to 1440	1 to 2	256
TPU v5p	`tpu-v5p-slice`	208	448	2	6,144
TPU v5e	`tpu-v5-lite-podslice`	24 to 224	48 to 384	1	256
TPU v4	`tpu-v4-podslice`	240	407	2	4,096
TPU v3 (single-host only)	`tpu-v3-device`	96	340	2	8
TPU v3	`tpu-v3-slice`	48	340	1	256

Standard

TPU version	Machine type	Number of vCPUs	Memory (GiB)	Number of NUMA nodes	Likelihood of being preempted
TPU Trillium (v6e)	`ct6e-standard-1t`	44	448	2	Higher
TPU Trillium (v6e)	`ct6e-standard-4t`	180	720	1	Medium
TPU Trillium (v6e)	`ct6e-standard-8t`	180	1440	2	Lower
TPU v5p	`ct5p-hightpu-4t`	208	448	2
TPU v5e	`ct5lp-hightpu-1t`	24	48	1	Higher
TPU v5e	`ct5lp-hightpu-4t`	112	192	1	Medium
TPU v5e	`ct5lp-hightpu-8t`	224	384	1	Low
TPU v4	`ct4p-hightpu-4t`	240	407	2
TPU v3 (single-host only)	`ct3-hightpu-4t`	96	340	2
TPU v3	`ct3p-hightpu-4t`	48	340	1

Multi-host ct5lp- machine types are more suitable for serving large models or training. Multi-host ct5lp- machines are interconnected with high-speed links.

Review the TPU specifications and pricing in the Cloud TPU pricing documentation to decide which TPU configuration to use.

Limitations

Consider these limitations when choosing the TPU to use:

TPU Trillium is available in the following versions:
- Standard clusters in version 1.31.1-gke.1846000 and later.
- Autopilot clusters in version 1.31.2-gke.1115000 and later.
TPU Trillium doesn't support configuring SMT set to 2 on ct6e-standard-8t.
TPU v5p autoscaling is supported on GKE clusters with control planes running at least version 1.29.2-gke.1035000 or 1.28.7-gke.1020000.
For capacity reservations, use a specific reservation.
You can run a maximum of 256 Pods in a single TPU VM.
GKE cost allocation and usage metering don't include any data about the usage or costs of TPUs.
The cluster autoscaler cancels TPU node pool scale-up operations that remain in waiting status for more than 10 hours. The cluster autoscaler retries such scale-up operations when resources are available. This behavior might reduce TPU obtainability if you don't use reservations.
Ubuntu nodes are not supported.
TPU Node architecture is deprecated. TPU v3 is the only TPU version that still supports the TPU Node architecture in GKE.

Validate TPU availability in GKE

TPUs are available in specific Google Cloud regions. To use a TPU type in your GKE cluster, your cluster must be in a supported region for that type.

Autopilot

TPU version	`cloud.google.com/gke-tpu-accelerator`	Minimum GKE version	Availability	Zone
TPU Trillium (v6e)	`tpu-v6e-slice`	1.31.2-gke.1384000	GA	`asia-northeast1-b` `europe-west4-a` `southamerica-west1-a` `us-central1-b` `us-east1-d` `us-east5-a` `us-east5-b`
TPU v5e	`tpu-v5-lite-podslice`	1.27.2-gke.2100	GA	`europe-west4-b` `us-central1-a` `us-south1-a` `us-west1-c` `us-west4-a`
TPU v5p	`tpu-v5p-slice`	1.28.3-gke.1024000	GA	`europe-west4-b` `us-central1-a` `us-east5-a`
TPU v4	`tpu-v4-podslice`	1.26.1-gke.1500	GA	`us-central2-b`
TPU v3	`tpu-v3-slice`	1.31.1-gke.1146000	GA	`europe-west4-a` `us-central1-a` `us-central1-b`
TPU v3	`tpu-v3-device`	1.31.0-gke.1500	GA	`europe-west4-a` `us-central1-a` `us-central1-b`

Standard

TPU version	Machine type beginning with	Minimum GKE version	Availability	Zone
TPU Trillium (v6e)	`ct6e-`	1.31.2-gke.1115000	GA	`asia-northeast1-b` `europe-west4-a` `southamerica-west1-a` `us-central1-b` `us-east1-d` `us-east5-a` `us-east5-b`
TPU v5e	`ct5lp-`	1.27.2-gke.2100	GA	`europe-west4-b` `us-central1-a` `us-south1-a` `us-west1-c` `us-west4-a`
TPU v5p	`ct5p-`	1.28.3-gke.1024000	GA	`europe-west4-b` `us-central1-a` `us-east5-a`
TPU v4	`ct4p-`	1.26.1-gke.1500	GA	`us-central2-b`
TPU v3	`ct3p-`	1.31.1-gke.1146000	GA	`europe-west4-a` `us-central1-a` `us-central1-b`
TPU v3	`ct3-`	1.31.0-gke.1500	GA	`europe-west4-a` `us-central1-a` `us-central1-b`

Choose a topology

After you decide on a TPU version, select a topology that's supported by that TPU type. Depending on the TPU type, the topology is two- or three-dimensional. Your model's parallelism requirements help you to decide on a topology. You can identify the number of TPU chips in the slice by calculating the product of each size in the topology. For example:

2x2x2 is an 8-chip multi-host TPU v4 slice
2x2 is a 4-chip single-host TPU v5e slice

If a specific topology supports both single-host and multi-host TPU slice nodes, the number of TPU chips that your workload requests determines the host type.

For example, TPU v5e (tpu-v5-lite-podslice) supports the 2x4 topology as both single- and multi-host. If you:

Request 4 chips in your workload, you get a multi-host node that has 4 TPU chips.
Request 8 chips in your workload, you get a single-host node that has 8 TPU chips.

Use the following table to choose the TPU machine type and topology for your use case:

For small-scale model training or inference, use TPU v4 or TPU v5e with single-host TPU slice node pools.
For large-scale model training or inference, use TPU v4 or TPU v5e with multi-host TPU slice node pools.
For large-scale training or inferencing, use Pathways. Pathways simplifies large-scale machine learning computations by enabling a single JAX client to orchestrate workloads across multiple large TPU slices. For more information, see Pathways.

Autopilot

After you choose a TPU type and topology, specify these in your workload manifest. For instructions, see Deploy TPU workloads on GKE Autopilot.

TPU version	Machine type	Node pool type	Technical specifications
TPU Trillium (v6e)	`tpu-v6e-slice`	Single-host	Topology: 1x1 Number of TPU chips: 1 Number of VMs: 1
TPU Trillium (v6e)	`tpu-v6e-slice`	Single-host	Topology: 2x2 Number of TPU chips: 4 Number of VMs: 4
TPU Trillium (v6e)	`tpu-v6e-slice`	Single-host	Topology: 2x4 Number of TPU chips: 8 Number of VMs: 8
TPU Trillium (v6e)	`tpu-v6e-slice`	Multi-host	Topology: 4x4 Number of TPU chips: 16 Number of VMs: 4
TPU Trillium (v6e)	`tpu-v6e-slice`	Multi-host	Topology: 4x8 Number of TPU chips: 32 Number of VMs: 8
TPU Trillium (v6e)	`tpu-v6e-slice`	Multi-host	Topology: 8x8 Number of TPU chips: 64 Number of VMs: 16
TPU Trillium (v6e)	`tpu-v6e-slice`	Multi-host	Topology: 8x16 Number of TPU chips: 128 Number of VMs: 32
TPU Trillium (v6e)	`tpu-v6e-slice`	Multi-host	Topology: 16x16 Number of TPU chips: 256 Number of VMs: 64
TPU v5p	`tpu-v5p-slice`	Single-host	Topology: 2x2x1 Number of TPU chips: 4 Number of VMs: 1
TPU v5p	`tpu-v5p-slice`	Multi-host	Topology: 2x2x2 Number of TPU chips: 8 Number of VMs: 2
TPU v5p	`tpu-v5p-slice`	Multi-host	Topology: 2x2x4 Number of TPU chips: 16 Number of VMs: 4
TPU v5p	`tpu-v5p-slice`	Multi-host	Topology: 2x4x4 Number of TPU chips: 32 Number of VMs: 8
TPU v5p	`tpu-v5p-slice`	Multi-host	Topology: 4x4x4 Number of TPU chips: 64 Number of VMs: 16
TPU v5p	`tpu-v5p-slice`	Multi-host	Topology: {A}x{B}x{C} Number of TPU chips: {A}{B}{C} Number of VMs: (ABC/4)¹
TPU v5e	`tpu-v5-lite-podslice`	Single-host	Topology: 1x1 Number of TPU chips: 1 Number of VMs: 1
TPU v5e	`tpu-v5-lite-podslice`	Single-host	Topology: 2x2 Number of TPU chips: 4 Number of VMs: 1
TPU v5e	`tpu-v5-lite-podslice`	Single-host	Topology: 2x4 Number of TPU chips: 8 Number of VMs: 1
TPU v5e	`tpu-v5-lite-podslice`	Multi-host	Topology: 2x4 Number of TPU chips: 8 Number of VMs: 2
TPU v5e	`tpu-v5-lite-podslice`	Multi-host	Topology: 4x4 Number of TPU chips: 16 Number of VMs: 4
TPU v5e	`tpu-v5-lite-podslice`	Multi-host	Topology: 4x8 Number of TPU chips: 32 Number of VMs: 8
TPU v5e	`tpu-v5-lite-podslice`	Multi-host	Topology: 8x8 Number of TPU chips: 64 Number of VMs: 16
TPU v5e	`tpu-v5-lite-podslice`	Multi-host	Topology: 8x16 Number of TPU chips: 128 Number of VMs: 32
TPU v5e	`tpu-v5-lite-podslice`	Multi-host	Topology: 16x16 Number of TPU chips: 256 Number of VMs: 64
TPU v5e (single-host only)	`tpu-v5-lite-device`	Single-host	Topology: 1x1 Number of TPU chips: 1 Number of VMs: 1
TPU v5e (single-host only)	`tpu-v5-lite-device`	Single-host	Topology: 2x2 Number of TPU chips: 4 Number of VMs: 1
TPU v5e (single-host only)	`tpu-v5-lite-device`	Single-host	Topology: 2x4 Number of TPU chips: 8 Number of VMs: 1
TPU v4	`tpu-v4-podslice`	Single-host	Topology: 2x2x1 Number of TPU chips: 4 Number of VMs: 1
TPU v4	`tpu-v4-podslice`	Multi-host	Topology: 2x2x2 Number of TPU chips: 8 Number of VMs: 2
TPU v4	`tpu-v4-podslice`	Multi-host	Topology: 2x2x4 Number of TPU chips: 16 Number of VMs: 4
TPU v4	`tpu-v4-podslice`	Multi-host	Topology: 2x4x4 Number of TPU chips: 32 Number of VMs: 8
TPU v4	`tpu-v4-podslice`	Multi-host	Topology: 4x4x4 Number of TPU chips: 64 Number of VMs: 16
TPU v4	`tpu-v4-podslice`	Multi-host	Topology: {A}x{B}x{C} Number of TPU chips: {A}{B}{C} Number of VMs: (ABC/4)¹
TPU v3	`tpu-v3-slice`	Multi-host	Topology: 4x4 Number of TPU chips: 16 Number of VMs: 2
TPU v3	`tpu-v3-slice`	Multi-host	Topology: 4x8 Number of TPU chips: 32 Number of VMs: 4
TPU v3	`tpu-v3-slice`	Multi-host	Topology: 8x8 Number of TPU chips: 64 Number of VMs: 8
TPU v3	`tpu-v3-slice`	Multi-host	Topology: 8x16 Number of TPU chips: 128 Number of VMs: 16
TPU v3	`tpu-v3-slice`	Multi-host	Topology: 16x16 Number of TPU chips: 256 Number of VMs: 32
TPU v3	`tpu-v3-device`	Single-host	Topology: 2x2 Number of TPU chips: 4 Number of VMs: 1

Calculated by the topology product divided by four. ↩

Custom topologies for more than 64 chips are supported. The following conditions apply:
- For more than 64 chips, {A}, {B}, and {C} must be multiples of 4
- The largest topology is 16x16x24
- The values must be {A}≤{B}≤{C}, like 8x12x16.
Custom topologies aren't supported.

Standard

After you choose a TPU type and topology, specify these in your workload manifest. For instructions, see Deploy TPU workloads on GKE Standard.

TPU version	Machine type	Node pool type	Technical specifications
TPU Trillium (v6e)	`ct6e-standard-1t`	Single-host	Topology: 1x1 Number of TPU chips: 1 Number of VMs: 1
TPU Trillium (v6e)	`ct6e-standard-8t`	Single-host	Topology: 2x4 Number of TPU chips: 8 Number of VMs: 1
TPU Trillium (v6e)	`ct6e-standard-4t`	Single-host	Topology: 2x2 Number of TPU chips: 4 Number of VMs: 1
TPU Trillium (v6e)	`ct6e-standard-4t`	Multi-host	Topology: 2x4 Number of TPU chips: 8 Number of VMs: 2
TPU Trillium (v6e)	`ct6e-standard-4t`	Multi-host	Topology: 4x4 Number of TPU chips: 16 Number of VMs: 4
TPU Trillium (v6e)	`ct6e-standard-4t`	Multi-host	Topology: 4x8 Number of TPU chips: 32 Number of VMs: 8
TPU Trillium (v6e)	`ct6e-standard-4t`	Multi-host	Topology: 8x8 Number of TPU chips: 64 Number of VMs: 16
TPU Trillium (v6e)	`ct6e-standard-4t`	Multi-host	Topology: 8x16 Number of TPU chips: 128 Number of VMs: 32
TPU Trillium (v6e)	`ct6e-standard-4t`	Multi-host	Topology: 16x16 Number of TPU chips: 256 Number of VMs: 64
TPU v5p	`ct5p-hightpu-4t`	Single-host	Topology: 2x2x1 Number of TPU chips: 4 Number of VMs: 1
TPU v5p	`ct5p-hightpu-4t`	Multi-host	Topology: 2x2x2 Number of TPU chips: 8 Number of VMs: 2
TPU v5p	`ct5p-hightpu-4t`	Multi-host	Topology: 2x2x4 Number of TPU chips: 16 Number of VMs: 4
TPU v5p	`ct5p-hightpu-4t`	Multi-host	Topology: 2x4x4 Number of TPU chips: 32 Number of VMs: 8
TPU v5p	`ct5p-hightpu-4t`	Multi-host	Topology: {A}x{B}x{C} Number of TPU chips: ABC Number of VMs: (ABC/4)¹
TPU v5e	`ct5lp-hightpu-1t`	Single-host	Topology: 1x1 Number of TPU chips: 1 Number of VMs: 1
TPU v5e	`ct5lp-hightpu-4t`	Single-host	Topology: 2x2 Number of TPU chips: 4 Number of VMs: 1
TPU v5e	`ct5lp-hightpu-8t`	Single-host	Topology: 2x4 Number of TPU chips: 8 Number of VMs: 1
TPU v5e	`ct5lp-hightpu-4t`	Multi-host	Topology: 2x4 Number of TPU chips: 8 Number of VMs: 2
TPU v5e	`ct5lp-hightpu-4t`	Multi-host	Topology: 4x4 Number of TPU chips: 16 Number of VMs: 4
TPU v5e	`ct5lp-hightpu-4t`	Multi-host	Topology: 4x8 Number of TPU chips: 32 Number of VMs: 8
TPU v5e	`ct5lp-hightpu-4t`	Multi-host	Topology: 8x8 Number of TPU chips: 64 Number of VMs: 16
TPU v5e	`ct5lp-hightpu-4t`	Multi-host	Topology: 8x16 Number of TPU chips: 128 Number of VMs: 32
TPU v5e	`ct5p-hightpu-4t`	Multi-host	Topology: 2x4x4 Number of TPU chips: 32 Number of VMs: 8
TPU v5e	`ct5p-hightpu-4t`	Single-host	Topology: 2x2x1 Number of TPU chips: 4 Number of VMs: 1
TPU v4	`ct4p-hightpu-4t`	Multi-host	Topology: 2x2x2 Number of TPU chips: 8 Number of VMs: 2
TPU v4	`ct4p-hightpu-4t`	Multi-host	Topology: 2x2x4 Number of TPU chips: 16 Number of VMs: 4
TPU v4	`ct4p-hightpu-4t`	Multi-host	Topology: 2x4x4 Number of TPU chips: 32 Number of VMs: 8
TPU v4	`ct4p-hightpu-4t`	Multi-host	Topology: {A}x{B}x{C} Number of TPU chips: ABC Number of VMs: (ABC/4)¹
TPU v3	`ct3-hightpu-4t`	Single-host	Topology: 2x2 Number of TPU chips: 4 Number of VMs: 1
TPU v3	`ct3p-hightpu-4t`	Multi-host	Topology: 4x4 Number of TPU chips: 16 Number of VMs: 4
TPU v3	`ct3p-hightpu-4t`	Multi-host	Topology: 4x8 Number of TPU chips: 32 Number of VMs: 8
TPU v3	`ct3p-hightpu-4t`	Multi-host	Topology: 8x8 Number of TPU chips: 64 Number of VMs: 16
TPU v3	`ct3p-hightpu-4t`	Multi-host	Topology: 8x16 Number of TPU chips: 128 Number of VMs: 32
TPU v3	`ct3p-hightpu-4t`	Multi-host	Topology: 16x16 Number of TPU chips: 256 Number of VMs: 64
TPU v3	`ct3p-hightpu-4t`	Multi-host	Topology: 16x32 Number of TPU chips: 512 Number of VMs: 128
TPU v3	`ct3p-hightpu-4t`	Multi-host	Topology: 32x32 Number of TPU chips: 1024 Number of VMs: 256

Calculated by the topology product divided by four. ↩

Advanced configurations

The following sections describe scheduling best practices for advanced TPU configurations.

Autoscaling TPUs in GKE

GKE supports Tensor Processing Units (TPUs) to accelerate machine learning workloads. Both single-host TPU slice node pool and multi-host TPU slice node pool support autoscaling and auto-provisioning.

With the --enable-autoprovisioning flag on a GKE cluster, GKE creates or deletes single-host or multi-host TPU slice node pools with a TPU version and topology that meets the requirements of pending workloads.

When you use --enable-autoscaling, GKE scales the node pool based on its type, as follows:

Single-host TPU slice node pool: GKE adds or removes TPU nodes in the existing node pool. The node pool may contain any number of TPU nodes between zero and the maximum size of the node pool as determined by the --max-nodes and the --total-max-nodes flags. When the node pool scales, all the TPU nodes in the node pool have the same machine type and topology. To learn more how to create a single-host TPU slice node pool, see Create a node pool.
Multi-host TPU slice node pool: GKE atomically scales up the node pool from zero to the number of nodes required to satisfy the TPU topology. For example, with a TPU node pool with a machine type ct5lp-hightpu-4t and a topology of 16x16, the node pool contains 64 nodes. The GKE autoscaler ensures that this node pool has exactly 0 or 64 nodes. When scaling back down, GKE evicts all scheduled pods, and drains the entire node pool to zero. To learn more how to create a multi-host TPU slice node pool, see Create a node pool.

Provision additional storage to a TPU slice

A VM in a TPU slice includes a 100 GiB boot disk. If your TPU slice needs additional storage for training or preprocessing, or if you need to save checkpoints, you can use Google Cloud Hyperdisk or Balanced Persistent Disk storage if it's available for your TPU. For more information about supported disk types for each TPU version, see the TPU support for Hyperdisk and Persistent Disk.

CPU for Standard clusters

This section doesn't apply to Autopilot clusters because GKE places each TPU slice on its own node. To learn more, see How TPUs work in Autopilot mode.

For Standard clusters, consider the following scheduling best practices.

To schedule a non-TPU workload on a VM in a TPU slice node, ensure that your GKE Pod can tolerate the google.com/tpu taint. If you want the workload to be deployed to specific nodes, use node selectors.

Kubernetes resource management and priority treats VMs in TPUs the same as other VM types. To give scheduling priority to Pods that require TPUs over other Pods on the same nodes, request the maximum CPU or memory for those TPU slices. Low-priority TPU slices should do the following:

Set low CPU and memory requests to ensure that the node has enough allocatable resources for the TPU workloads. To learn more, see How Kubernetes applies resource requests and limits.
Set no CPU limit (unlimited) to ensure that Pods can burst to use all unused cycles.
Set appropriate memory limits to ensure Pods can function correctly without risking node-pressure eviction.

If a Kubernetes Pod doesn't request CPU and memory (even if it is requesting TPUs), then Kubernetes considers it a best-effort Pod, and there is no guarantee that it needed any CPU and memory. Only Pods that explicitly request CPU and memory have such guarantees. For specific Kubernetes scheduling, configure the Pod needs with explicit CPU and memory request. For more information, see Resource Management for Pods and Containers.

To learn more best practices, see Kubernetes best practices: Resource requests and limits.

Reduce workload interruption

If you are using TPUs to train a machine learning model and your workload is interrupted, all work performed since the last checkpoint is lost. To decrease the probability that your workload is interrupted, do the following:

Set a higher priority for this Job than for all other Jobs: If resources are scarce, the GKE scheduler preempts lower priority Jobs to schedule a higher priority Job. This also ensures that your higher priority workload receives all the resources that it needs (up to the total resources available in the cluster). To learn more, see Pod priority and preemption.
Configure maintenance exclusion: A maintenance exclusion is a non-repeating window of time during which automatic maintenance is forbidden. To learn more, see Maintenance exclusions.
Use extended run time Pods in Autopilot: Use extended run time Pods for a grace period of up to seven days before GKE terminates your Pods for scale-downs or node upgrades.
Use collection scheduling in TPU Trillium: Use collections to indicate that a TPU slice node pool is part of a serving workload. Google Cloud limits and streamlines interruptions to the operations of inference workloads. To learn more, see How collection scheduling works.

These recommendations help to minimize interruptions, but not to prevent them. For example, a preemption due to a hardware failure or preemption for defragmentation can still occur. Similarly, setting a GKE maintenance exclusion doesn't prevent Compute Engine maintenance events.

Best practice:

Save checkpoints frequently and add code to your training script to start from the last checkpoint when resumed.

Handle disruption due to node maintenance

The GKE nodes that host the TPUs are subject to maintenance events or other disruptions that might cause node shutdown. In GKE clusters with the control plane running version 1.29.1-gke.1425000 and later, you can reduce disruption to workloads by configuring GKE to terminate your workloads gracefully.

To understand, configure, and monitor disruption events that might occur on GKE nodes running AI/ML workloads, see Manage GKE node disruption for GPUs and TPUs.

Maximize TPU utilization

To maximize your investment in TPUs, schedule a mix of Job priorities and queue them to maximize the amount of time that your TPUs are operating. For Job-level scheduling and preemption, you need to use an add-on to Kubernetes that orchestrates Jobs into queues.

Best practice:

Use Kueue to orchestrate Jobs into queues.

What's next

Follow the Deploy TPU workloads in GKE to set up Cloud TPU with GKE.
Learn about best practices for using Cloud TPU for your machine learning tasks.
Build large-scale machine learning on Cloud TPUs with GKE.
Serve Large Language Models with KubeRay on TPUs.