This page shows you how to request and deploy workloads that use Cloud TPU accelerators (TPUs) in Google Kubernetes Engine (GKE).
Before you configure and deploy TPU workloads in GKE, you should be familiar with the following concepts:
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
TPU availability in GKE
Use GKE to create and manage node pools with TPUs. You can use these purposely-built accelerators to perform large-scale AI model training, tuning, and inference.
See a list of supported TPU versions in GKE.
Plan your TPU configuration
Plan your TPU configuration based on your machine learning model and how much memory it requires. The following are the steps that are relevant when planning your TPU configuration:
Ensure sufficient quota
To check your TPU quota limit and current usage, follow these steps.
Go to the Quotas page in the Google Cloud console:
In the
Filter box, do the following:Select the Service property, enter Compute Engine API, and press Enter.
Select the Quota property, and enter the name of the quota based on the TPU version and machine type. For instance, if you plan to create on-demand TPU v5e nodes whose machine type begins with
ct5lp-
, enterTPU v5 Lite PodSlice chips
.TPU version Machine type begins with Name of the quota for on-demand instances Name of the quota for Spot VMs1 instances Name of the quota for reserved2 instances TPU v4 ct4p-
TPU v4 PodSlice chips
Preemptible TPU v4 PodSlice chips
Committed TPU v4 PodSlice chips
TPU v5e ct5l-
TPU v5 Lite Device chips
Preemptible TPU v5 Lite Device chips
Committed TPU v5 Lite Device chips
TPU v5e ct5lp-
TPU v5 Lite PodSlice chips
Preemptible TPU v5 Lite PodSlice chips
Committed TPU v5 Lite PodSlice chips
Optionally, to apply more advanced filters to narrow the results, select the Dimensions (e.g. locations) property, add the name of the region where you want to create TPUs in GKE, and press Enter. For instance, enter
region:us-west4
if you plan to create TPU v5e nodes in the zoneus-west4-a
.
If no quotas match the filter you entered, then the project has not been granted any of the specified quota, and you must request a TPU quota increase.
Request TPU quota increase
If you need additional quota to create TPU nodes in GKE, request an increase in your TPU quota by contacting your Google Cloud account representative. The TPUs provisioned with GKE use Compute Engine API allocated quota. The quota requested for Cloud TPU API quota does not apply when using TPUs in GKE.
Create a cluster
Create a GKE cluster in Standard mode in a region with available TPUs. We recommend that you use regional clusters, which provide high availability of the Kubernetes control plane. You can use the Google Cloud CLI or the Google Cloud console.
gcloud container clusters create CLUSTER_NAME \
--location LOCATION
Replace the following:
CLUSTER_NAME
: the name of the new cluster.LOCATION
: The region with your TPU capacity available.VERSION
: the GKE version number. To learn more, see TPU availability in GKE.
Create a node pool
Single-host TPU slice
You can create a single-host TPU slice node pool using the Google Cloud CLI, Terraform, or the Google Cloud console.
gcloud
gcloud container node-pools create POOL_NAME \
--location=LOCATION \
--cluster=CLUSTER_NAME \
--node-locations=NODE_ZONES \
--machine-type=MACHINE_TYPE \
[--num-nodes=NUM_NODES \]
[--spot \]
[--enable-autoscaling \]
[--reservation-affinity=specific \
--reservation=RESERVATION_NAME \]
[--total-min-nodes TOTAL_MIN_NODES \]
[--total-max-nodes TOTAL_MAX_NODES \]
[--location-policy=ANY]
Replace the following:
POOL_NAME
: The name of the new node pool.LOCATION
: The name of the zone based on the TPU version you want to use:- For TPU v4, use
us-central2-b
. - For TPU v5e machine types beginning with
ct5l-
, useus-central1-a
oreurope-west4-b
. - For TPU v5e machine types beginning with
ct5lp-
, useus-west4-a
,us-east1-c
, orus-east5-b
. - For TPU v5p machine types beginning with
ct5p-
, useus-east5-a
.
To learn more, see Select a TPU version and topology.
- For TPU v4, use
CLUSTER_NAME
: The name of the cluster.NODE_ZONE
: The comma-separated list of one or more zones where GKE creates the node pool.MACHINE_TYPE
: The type of machine to use for nodes. For more information about TPU compatible machine types, use the table in Mapping of TPU configuration.
Optionally, you can also use the following flags:
NUM_NODES
: The initial number of nodes in the node pool in each zone. If you omit this flag, the default is3
. If autoscaling is enabled for the node pool using the--enable-autoscaling
flag, we recommend that you setNUM_NODES
to0
, since the autoscaler provisions additional nodes as soon as your workloads demands them.RESERVATION_NAME
: The name of the reservation GKE uses when creating the node pool. If you omit this flag, GKE uses available TPUs. To learn more about TPU reservations, see TPU reservation.--enable-autoscaling
: Create a node pool with autoscaling enabled.TOTAL_MIN_NODES
: Minimum number of all nodes in the node pool. Omit this field unless autoscaling is also specified.TOTAL_MAX_NODES
: Maximum number of all nodes in the node pool. Omit this field unless autoscaling is also specified.
--spot
: Sets the node pool to use Spot VMs for the nodes in the node pool. This cannot be changed after node pool creation.
Terraform
- Ensure that you use the version 4.84.0 or later of the
google
provider. - Add the following block to your Terraform configuration:
resource "google_container_node_pool" "NODE_POOL_RESOURCE_NAME" {
provider = google
project = PROJECT_ID
cluster = CLUSTER_NAME
name = POOL_NAME
location = CLUSTER_LOCATION
node_locations = [NODE_ZONES]
initial_node_count = NUM_NODES
autoscaling {
total_min_node_count = TOTAL_MIN_NODES
total_max_node_count = TOTAL_MAX_NODES
location_policy = "ANY"
}
node_config {
machine_type = MACHINE_TYPE
reservation_affinity {
consume_reservation_type = "SPECIFIC_RESERVATION"
key = "compute.googleapis.com/reservation-name"
values = [RESERVATION_LABEL_VALUES]
}
spot = true
}
}
Replace the following:
NODE_POOL_RESOURCE_NAME
: The name of the node pool resource in the Terraform template.PROJECT_ID
: Your project ID.CLUSTER_NAME
: The name of the existing cluster.POOL_NAME
: The name of the node pool to create.CLUSTER_LOCATION
: The compute zone(s) of the cluster. Specify the region where the TPU version is available. To learn more, see Select a TPU version and topology.NODE_ZONES
: The comma-separated list of one or more zones where GKE creates the node pool.NUM_NODES
: The initial number of nodes in the node pool in each of the node pool's zones. If omitted, default is3
. If auto-scaling is enabled for the node pool using the austoscaling template, we recommend that you setNUM_NODES
to0
, since GKE provisions additional TPU nodes as soon as your workload demands them.MACHINE_TYPE
: The type of TPU machine to use. To see TPU compatible machine types, use the table in Mapping of TPU configuration.
Optionally, you can also use the following variables:
autoscaling
: Create a node pool with autoscaling enabled. For single-host TPU slice, GKE scales between theTOTAL_MIN_NODES
andTOTAL_MAX_NODES
values.TOTAL_MIN_NODES
: Minimum number of all nodes in the node pool. This field is optional unless autoscaling is also specified.TOTAL_MAX_NODES
: Maximum number of all nodes in the node pool. This field is optional unless autoscaling is also specified.
RESERVATION_NAME
: If you use TPU reservation, this is the list of labels of the reservation resources to use when creating the node pool. To learn more about how to populate theRESERVATION_LABEL_VALUES
in thereservation_affinity
field, see Terraform Provider.spot
: Sets the node pool to use Spot VMs for the TPU nodes. This cannot be changed after node pool creation. For more information, see Spot VMs.
Console
To create a node pool with TPUs:
Go to the Google Kubernetes Engine page in the Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Click add_box Add node pool.
In the Node pool details section, check the Specify node locations box.
Select the zone based on the TPU version you want to use:
- For TPU v4, use
us-central2-b
. - For TPU v5e machine types beginning with
ct5l-
, useus-central1-a
oreurope-west4-b
. - For TPU v5e machine types beginning with
ct5lp-
, useus-west4-a
,us-east1-c
, orus-east5-b
. - For TPU v5p machine types beginning with
ct5p-
, useus-east5-a
.
- For TPU v4, use
From the navigation pane, click Nodes.
In the Machine Configuration section, select TPUs.
In the Series drop-down menu, select one of the following:
- CT4P: TPU v4
- CT5LP: TPU v5e
- CT5P: TPU v5p
In the Machine type drop-down menu, select the name of the machine to use for nodes. Use the Mapping of TPU configuration table to learn how to define the machine type and TPU topology that create a single-host TPU node pool.
In the TPU Topology drop-down menu, select the physical topology for the TPU slice.
In the Changes needed dialog, click Make changes.
Ensure that Boot disk type is either Standard persistent disk or SSD persistent disk.
Optionally, select the Enable nodes on spot VMs checkbox to use Spot VMs for the nodes in the node pool.
Click Create.
Multi-host TPU slice
You can create a multi-host TPU slice node pool using the Google Cloud CLI, Terraform, or the Google Cloud console.
gcloud
gcloud container node-pools create POOL_NAME \
--location=LOCATION \
--cluster=CLUSTER_NAME \
--node-locations=NODE_ZONE \
--machine-type=MACHINE_TYPE \
--tpu-topology=TPU_TOPOLOGY \
--num-nodes=NUM_NODES \
[--spot \]
[--enable-autoscaling \
--max-nodes MAX_NODES]
[--reservation-affinity=specific \
--reservation=RESERVATION_NAME]
Replace the following:
POOL_NAME
: The name of the new node pool.LOCATION
: The name of the zone based on the TPU version you want to use:- For TPU v4, use
us-central2-b
. - For TPU v5e machine types beginning with
ct5lp-
, useus-west4-a
,us-east1-c
, orus-east5-b
. TPU v5e machine types beginning withct5l-
are never multi-host. To learn more, see Select a TPU version and topology.
- For TPU v4, use
CLUSTER_NAME
: The name of the cluster.NODE_ZONE
: The comma-separated list of one or more zones where GKE creates the node pool.MACHINE_TYPE
: The type of machine to use for nodes. To learn more about the available machine types, see Mapping of TPU configuration.TPU_TOPOLOGY
: The physical topology for the TPU slice. The format of the topology depends on the TPU version as follows:- TPU v4: Define the topology in 3-tuples (
{A}x{B}x{C}
), for example4x4x4
. - TPU v5e: Define the topology in 2-tuples (
{A}x{B}
), for example2x2
.
- TPU v4: Define the topology in 3-tuples (
NUM_NODES
: The number of nodes in the node pool. It must be zero or the product of the values defined inTPU_TOPOLOGY
({A}x{B}x{C}
) divided by the number of chips in each VM. For multi-host TPU v4 and TPU v5e, the number of chips in each VM is four. Therefore, if yourTPU_TOPOLOGY
is2x4x4
(TPU v4 with four chips in each VM), then theNUM_NODES
is 32/4 which equals to 8.
Optionally, you can also use the following flags:
RESERVATION_NAME
: The name of the reservation GKE uses when creating the node pool. If you omit this flag, GKE uses available TPU node pools. To learn more about TPU reservations, see TPU reservation.--spot
: Sets the node pool to use Spot VMs for the TPU nodes. This cannot be changed after node pool creation. For more information, see Spot VMs.--enable-autoscaling
: Create a node pool with autoscaling enabled. When GKE scales a multi-host TPU slice node pool, it atomically scales up the node pool from zero to the maximum size.MAX_NODES
: The maximum size of the node pool. It must be equal to the product of the values defined inTPU_TOPOLOGY
({A}x{B}x{C}
) divided by the number of chips in each VM.
Terraform
- Ensure that you use the version 4.84.0 or later of the
google
provider. Add the following block to your Terraform configuration:
resource "google_container_node_pool" "NODE_POOL_RESOURCE_NAME" { provider = google project = PROJECT_ID cluster = CLUSTER_NAME name = POOL_NAME location = CLUSTER_LOCATION node_locations = [NODE_ZONES] initial_node_count = NUM_NODES autoscaling { max_node_count = MAX_NODES location_policy = "ANY" } node_config { machine_type = MACHINE_TYPE reservation_affinity { consume_reservation_type = "SPECIFIC_RESERVATION" key = "compute.googleapis.com/reservation-name" values = [RESERVATION_LABEL_VALUES] } spot = true } placement_policy { type = "COMPACT" tpu_topology = TPU_TOPOLOGY } }
Replace the following:
NODE_POOL_RESOURCE_NAME
: The name of the node pool resource in the Terraform template.PROJECT_ID
: Your project ID.CLUSTER_NAME
: The name of the existing cluster to add the node pool to.POOL_NAME
: The name of the node pool to create.CLUSTER_LOCATION
: Compute location for the cluster. We recommend having a regional cluster for higher reliability of the Kubernetes control plane. You can also use a zonal cluster. To learn more, see Select a TPU version and topology.NODE_ZONES
: The comma-separated list of one or more zones where GKE creates the node pool.NUM_NODES
: The number of nodes in the node pool. It must be zero or the product of the number of the TPU chips divided by four, because in multi-host TPU slices each TPU node has 4 chips. For example, ifTPU_TOPOLOGY
is4x8
, then there are 32 chips which meansNUM_NODES
must be 8. To learn more about TPU topologies, use the table in Mapping of TPU configuration.TPU_TOPOLOGY
: This indicates the desired physical topology for the TPU slice. The format of the topology depends on the TPU version you are using:- For TPU v4: Define the topology in 3-tuples (
{A}x{B}x{C}
), for example4x4x4
. - For TPU v5e: Define the topology in 2-tuples (
{A}x{B}
), for example2x2
.
- For TPU v4: Define the topology in 3-tuples (
Optionally, you can also use the following variables:
RESERVATION_NAME
: If you use TPU reservation, this is the list of labels of the reservation resources to use when creating the node pool. To learn more about how to populate theRESERVATION_LABEL_VALUES
in thereservation_affinity
field, see Terraform Provider.autoscaling
: Create a node pool with autoscaling enabled. When GKE scales a multi-host TPU slice node pool, it atomically scales up the node pool from zero to the maximum size.MAX_NODES
: It is the maximum size of the node pool. It must be equal to the product of the values defined inTPU_TOPOLOGY
({A}x{B}x{C}
) divided by the number of chips in each VM.
spot
: Lets the node pool to use Spot VMs for the TPU nodes. This cannot be changed after node pool creation. For more information, see Spot VMs.
Console
To create a node pool with TPUs:
Go to the Google Kubernetes Engine page in the Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Click add_box Add node pool.
In the Node pool details section, check the Specify node locations box.
Select the zone based on the TPU version you want to use:
- For TPU v4, use
us-central2-b
. - For TPU v5e, use either
us-west4-a
orus-east1-c
.
- For TPU v4, use
From the navigation pane, click Nodes.
In the Machine Configuration section, select TPUs.
In the Series drop-down menu, select one of the following:
- CT4P: For TPU v4.
- CT5LP: For TPU v5e.
In the Machine type drop-down menu, select the name of the machine to use for nodes. Use the Mapping of TPU configuration table to learn how to define the machine type and TPU topology that create a multi-host TPU node pool.
In the TPU Topology drop-down menu, select the physical topology for the TPU slice.
In the Changes needed dialog, click Make changes.
Ensure that Boot disk type is either Standard persistent disk or SSD persistent disk.
Optionally, select the Enable nodes on spot VMs checkbox to use Spot VMs for the nodes in the node pool.
Click Create.
Run your workload on TPU nodes
Workload preparation
After you create your cluster and node pools, you can set up your workloads. As a prerequisite, you have to complete the following workload preparation steps:
Frameworks like JAX, PyTorch, and TensorFlow access TPU VMs using the
libtpu
shared library.libtpu
includes the XLA compiler, TPU runtime software, and the TPU driver. Each release of PyTorch and JAX requires a certainlibtpu.so
version. To use TPUs in GKE, ensure that you use the following versions:TPU version libtpu.so
versionTPU v4 - Recommended jax[tpu] version: 0.4.4 or later.
- Recommended torchxla[tpuvm] version: v2.0.0 or later.
TPU v5e - Recommended jax[tpu] version: v0.4.9 or later.
- Recommended torchxla[tpuvm] version: v2.1.0 or later.
TPU v5p - Recommended jax[tpu] version: 0.4.19 or later.
- Recommended torchxla[tpuvm] version: suggested to use a nightly version build on October 23, 2023.
Set the following environment variables for the container requesting the TPU resources:
TPU_WORKER_ID
: A unique integer for each Pod. This ID denotes a unique worker-id in the TPU slice. The supported values for this field range from zero to the number of Pods minus one.TPU_WORKER_HOSTNAMES
: A comma-separated list of TPU VM hostnames or IP addresses that need to communicate with each other within the slice. There should be a hostname or IP address for each TPU VM in the slice. The list of IP addresses or hostnames are ordered and zero indexed by theTPU_WORKER_ID
.
GKE automatically injects these environment variables by using a mutating webhook when a Job is created with the
completionMode: Indexed, subdomain
,parallelism > 1
, and requestinggoogle.com/tpu
properties. GKE adds a headless Service so that the DNS records are added for the Pods backing the Service.In your workload manifest, add a Kubernetes node selectors to ensure that GKE schedules your TPU workload on the TPU machine type and TPU topology you defined:
nodeSelector: cloud.google.com/gke-tpu-accelerator: TPU_ACCELERATOR cloud.google.com/gke-tpu-topology: TPU_TOPOLOGY
Replace the following:
TPU_ACCELERATOR
: The name of the TPU accelerator:- For TPU v4, use
tpu-v4-podslice
. - For TPU v5e machine types beginning with
ct5l-
, usetpu-v5-lite-device
, - For TPU v5e machine types beginning with
ct5lp-
, usetpu-v5-lite-podslice
. - For TPU v5p machine types beginning with
ct5p-
, usetpu-v5p-slice
.
- For TPU v4, use
TPU_TOPOLOGY
: The physical topology for the TPU slice. The format of the topology depends on the TPU version as follows:- TPU v4: Define the topology in 3-tuples (
{A}x{B}x{C}
), for example4x4x4
. - TPU v5e: Define the topology in 2-tuples (
{A}x{B}
), for example2x2
. - TPU v5p: Define the topology in 3-tuples (
{A}x{B}x{C}
), for example4x4x4
.
- TPU v4: Define the topology in 3-tuples (
After you complete the workload preparation, you can run a Job that uses TPUs.
The following sections show examples on how to run a Job that performs simple computation with TPUs.
Example 1: Run a workload that displays the number of available TPU chips in a TPU node pool
This example includes the following configuration:
- TPU version: v4 (
tpu-v4-podslice
) - Topology: 2x2x4
- Type of node pool: Multi-host TPU slice
Create the following
tpu-job.yaml
manifest:apiVersion: v1 kind: Service metadata: name: headless-svc spec: clusterIP: None selector: job-name: tpu-job-podslice --- apiVersion: batch/v1 kind: Job metadata: name: tpu-job-podslice spec: backoffLimit: 0 completions: 4 parallelism: 4 completionMode: Indexed template: spec: subdomain: headless-svc restartPolicy: Never nodeSelector: cloud.google.com/gke-tpu-accelerator: tpu-v4-podslice cloud.google.com/gke-tpu-topology: 2x2x4 containers: - name: tpu-job image: python:3.10 ports: - containerPort: 8471 # Default port using which TPU VMs communicate - containerPort: 8431 # Port to export TPU runtime metrics, if supported. securityContext: privileged: true command: - bash - -c - | pip install 'jax[tpu]' -f https://storage.googleapis.com/jax-releases/libtpu_releases.html python -c 'import jax; print("TPU cores:", jax.device_count())' resources: requests: google.com/tpu: 4 limits: google.com/tpu: 4
Each TPU node has the following node labels:
cloud.google.com/gke-accelerator-type: tpu-v4-podslice
cloud.google.com/gke-tpu-topology: 2x2x4
Apply the manifest:
kubectl apply -f tpu-job.yaml
GKE runs a TPU v4 slice with four TPU VMs (multi-host TPU slice). The node pool has 16 interconnected chips.
Verify that the Job created four Pods:
kubectl get pods
The output is similar to the following:
NAME READY STATUS RESTARTS AGE tpu-job-podslice-0-5cd8r 0/1 Completed 0 97s tpu-job-podslice-1-lqqxt 0/1 Completed 0 97s tpu-job-podslice-2-f6kwh 0/1 Completed 0 97s tpu-job-podslice-3-m8b5c 0/1 Completed 0 97s
Get the logs of one of the Pods:
kubectl logs POD_NAME
Replace
POD_NAME
with the name of one of the created Pods. For example,tpu-job-podslice-0-5cd8r
.The output is similar to the following:
TPU cores: 16
Example 2: run a workload that displays the number of available TPU chips in the TPU VM
- Topology: 2x4
- TPU version: v5e (
tpu-v5-lite-podslice
) - Type of node pool: Single-host TPU slice
Create the following manifest as
tpu-pod.yaml
apiVersion: v1 kind: Pod metadata: name: tpu-job-jax-v5 spec: restartPolicy: Never nodeSelector: cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice cloud.google.com/gke-tpu-topology: 2x4 containers: - name: tpu-job image: python:3.10 ports: - containerPort: 8431 # Port to export TPU runtime metrics, if supported. securityContext: privileged: true command: - bash - -c - | pip install 'jax[tpu]' -f https://storage.googleapis.com/jax-releases/libtpu_releases.html python -c 'import jax; print("Total TPU chips:", jax.device_count())' resources: requests: google.com/tpu: 8 limits: google.com/tpu: 8
This manifest includes the following node labels:
cloud.google.com/gke-tpu-accelerator: tpu-v5-lite-podslice
cloud.google.com/gke-tpu-topology: 2x4
GKE provisions a node pool with eight single-host TPU slices that use TPU v5e. Each TPU VM has eight chips (single-host TPU slice).
Clean up
To avoid incurring charges to your Google Cloud account for the resources
used in this guide, consider deleting the TPU node pools that no longer have
scheduled workloads. If the workloads running must be gracefully
terminated, use kubectl drain
to clean up the workloads before you delete the node.
Delete a TPU node pool:
gcloud container node-pools delete POOL_NAME \ --location=LOCATION \ --cluster=CLUSTER_NAME
Replace the following:
POOL_NAME
: The name of the node pool.CLUSTER_NAME
: The name of the cluster.LOCATION
: The compute location of the cluster.
Additional configurations
The following sections describe the additional configurations you can apply to your TPU workloads.
Multislice
You can aggregate smaller slices together in a Multislice to handle larger training workloads. For more information, see Multislice TPUs in GKE.
Migrate your TPU reservation
If you have existing TPU reservations, you must first migrate your TPU reservation to a new Compute Engine-based reservation system. You can also create Compute Engine-based reservation system where no migration is needed. To learn how to migrate your TPU reservations, see TPU reservation.
Logging
Logs emitted by containers running on GKE nodes, including TPU VMs, are collected by the GKE logging agent, sent to Logging, and are visible in Logging.
Use GKE node auto-provisioning
You can configure GKE to automatically create and delete node pools to meet the resource demands of your TPU workloads. For more information, see Configuring Cloud TPUs.
TPU node auto repair
If a TPU node in a multi-host TPU slice node pool is unhealthy, the entire node pool is recreated. Conditions that result in unhealthy TPU nodes include the following:
- Any TPU node with common node conditions.
- Any TPU node with an unallocatable TPU count larger than zero.
- Any TPU VM instance that is stopped (due to preemption) or is terminated.
- Node maintenance: If any TPU node (VM) within a multi-host TPU slice node pool goes down for host maintenance, GKE recreates the entire TPU slice.
You can see the repair status (including the failure reason) in the operation history. If the failure is caused by insufficient quota, contact your Google Cloud account representative to increase the corresponding quota.
Observability and metrics
Runtime metrics
In GKE version 1.27.4-gke.900 or later, TPU workloads
that use JAX version
0.4.14
or later and specify containerPort: 8431
export TPU utilization metrics as GKE
system metrics.
The following metrics are available in Cloud Monitoring
to monitor your TPU workload's runtime performance:
- Duty cycle: Percentage of time over the past sampling period (60 seconds) during which the TensorCores were actively processing on a TPU chip. Larger percentage means better TPU utilization.
- Memory used: Amount of accelerator memory allocated in bytes. Sampled every 60 seconds.
- Memory total: Total accelerator memory in bytes. Sampled every 60 seconds.
These metrics are located in the Kubernetes node (k8s_node
) and Kubernetes
container (k8s_container
) schema.
Kubernetes container:
kubernetes.io/container/accelerator/duty_cycle
kubernetes.io/container/accelerator/memory_used
kubernetes.io/container/accelerator/memory_total
Kubernetes node:
kubernetes.io/node/accelerator/duty_cycle
kubernetes.io/node/accelerator/memory_used
kubernetes.io/node/accelerator/memory_total
Host metrics
In GKE version 1.28.1-gke.1066000 or later, TPU VM export TPU utilization metrics as GKE system metrics. The following metrics are available in Cloud Monitoring to monitor your TPU host's performance:
- TensorCore utilization: Current percentage of the TensorCore that is utilized. The TensorCore value equals the sum of the matrix-multiply units (MXUs) plus the vector unit.
The TensorCore utilization value is the division of the TensorCore operations that were
performed over the past sample period (60 seconds) by the supported number of
TensorCore operations over the same period.
Larger value means better utilization. - Memory Bandwidth utilization: Current percentage of the accelerator memory bandwidth that is being used. Computed by dividing the memory bandwidth used over a sample period (60s) by the maximum supported bandwidth over the same sample period.
These metrics are located in the Kubernetes node (k8s_node
) and Kubernetes
container (k8s_container
) schema.
Kubernetes container:
kubernetes.io/container/accelerator/tensorcore_utilization
kubernetes.io/container/accelerator/memory_bandwidth_utilization
Kubernetes node:
kubernetes.io/container/node/tensorcore_utilization
kubernetes.io/container/node/memory_bandwidth_utilization
For more information, see Kubernetes metrics and GKE system metrics.
Run containers without privileged mode
If your TPU node is running versions less than 1.28, read the following section:
A container running on a TPU VM needs access to higher limits on locked
memory so the driver can communicate with the TPU chips over direct memory
access (DMA). To enable this, you must configure a higher
ulimit
. If you want to
reduce the permission scope on your container, complete the following steps:
Edit the
securityContext
to include the following fields:securityContext: capabilities: add: ["SYS_RESOURCE"]
Increase
ulimit
by running the following command inside the container before your setting up your workloads to use TPU resources:ulimit -l 68719476736
Note: For TPU v5e, running containers without privileged mode is available in clusters in version 1.27.4-gke.900 and later.
Known issues
- Cluster autoscaler might wrongly calculate capacity for new TPU nodes before those nodes report available TPUs. Cluster autoscaler might then perform additional scale up and as a result create more nodes than needed. Cluster autoscaler will scale down additional nodes, if they are not needed, after regular scale down operation.
- Cluster autoscaler cancels scaling up of TPU node pools that remain in waiting status for more than 15 minutes. Cluster Autoscaler will retry such scale up operations later. This behavior might reduce TPU obtainability for customers who don't use reservations.
- Non-TPU workloads that have a toleration for the TPU taint may prevent scale down of the node pool if they are being recreated during draining of the TPU node pool.
What's next
- Serve Large Language Models with Saxml on TPUs
- Learn more about setting up Ray on GKE with TPUs
- Build large-scale machine learning on Cloud TPUs with GKE
- Serve Large Language Models with KubeRay on TPUs
- Troubleshoot TPUs in GKE