Run GPUs in GKE Standard node pools

Standard

This page shows you how to use NVIDIA® graphics processing unit (GPU) hardware accelerators in your Google Kubernetes Engine (GKE) Standard clusters' nodes. For more information about GPUs in GKE, refer to About GPUs in GKE.

You can also use GPUs directly in your Autopilot Pods. For instructions, refer to Deploy GPU workloads in Autopilot.

Overview

With GKE, you can create node pools equipped with NVIDIA Tesla® K80, P100, P4, V100, T4, L4, and A100 GPUs. GPUs provide compute power to drive deep-learning tasks such as image recognition, natural language processing, as well as other compute-intensive tasks such as video transcoding and image processing.

You can also use GPUs with Spot VMs if your workloads can tolerate frequent node disruptions. Using Spot VMs reduces the price of running GPUs. To learn more, refer to Using Spot VMs with GPU node pools.

As of version 1.29.2-gke.1108000 you can now create GPU node pools on GKE Sandbox. For more information, see GKE Sandbox and GKE Sandbox Configuration.

Before you begin

Before you start, make sure you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
Note: For existing gcloud CLI installations, make sure to set the compute/region and compute/zone properties. By setting default locations, you can avoid errors in gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location.

Requirements

GPUs on GKE have the following requirements:

Kubernetes version: For node pools using the Container-Optimized OS node image, GPU nodes are available in GKE version 1.9 or higher. For node pools using the Ubuntu node image, GPU nodes are available in GKE version 1.11.3 or higher.
GPU quota: You must have Compute Engine GPU quota in your selected zone before you can create GPU nodes. To ensure that you have enough GPU quota in your project, refer to Quotas in the Google Cloud console.

If you require additional GPU quota, you must request GPU quota in the Google Cloud console. If you have an established billing account, your project should automatically receive quota after you submit the quota request.

Note: By default, Free Trial accounts do not receive GPU quota.
NVIDIA GPU drivers: When creating a cluster or a node pool, you can tell GKE to automatically install a driver version based on your GKE version. If you don't tell GKE to automatically install GPU drivers, you must manually install the drivers.
A100 GPUs: A100 GPUs are only supported on a2 machine types, and requires GKE version 1.18.6-gke.3504 or higher. You must ensure that you have enough quota for the underlying A2 machine type to use the A100 GPU.
L4 GPUs:
- You must use GKE version 1.22.17-gke.5400 or later.
- The GKE version that you choose must include NVIDIA driver version 525 or later in Container-Optimized OS. If driver version 525 or later isn't the default or the latest version in your GKE version, you must manually install a supported driver on your nodes.

Limitations

Before using GPUs on GKE, keep in mind the following limitations:

You cannot add GPUs to existing node pools.
GPU nodes cannot be live migrated during maintenance events.
The GPU type you can use depends on the machine series, as follows:
- A3 machine series: H100 GPUs.
- A2 machine series: A100 GPUs.
- G2 machine series: L4 GPUs.
- N1 machine series: All GPUs except A100 and L4.
You should ensure that you have enough quota in your project for the machine series that corresponds to your selected GPU type and quantity.
GPUs are not supported in Windows Server node pools.
In GKE versions 1.22 to 1.25, the cluster autoscaler only supports basic scaling up and down of nodes with L4 GPUs. This limitation doesn't apply to GKE version 1.26 and later.
For H100 GPUs, to use local SSDs for Pod storage, you must explicitly specify the exact number of local SSDs to attach to the underlying A3 VM by using the --ephemeral-storage-local-ssd=count=SSD_COUNT flag for ephemeral storage or the --local-nvme-ssd-block=count=SSD_COUNT flag for block access. If you omit this flag, you won't be able to use the Local SSDs in your Pods. These flags are only required if you want to use Local SSD for data access.

The supported machine size in GKE is a3-highgpu-8g, and the corresponding Local SSD count is 16.

Availability

GPUs are available in specific regions and zones. When you request GPU quota, consider the regions in which you intend to run your clusters.

For a complete list of applicable regions and zones, refer to GPUs on Compute Engine.

You can also see GPUs available in your zone using the Google Cloud CLI. To see a list of all GPU accelerator types supported in each zone, run the following command:

gcloud compute accelerator-types list

Pricing

For GPU pricing information, refer to the pricing table on the Google Cloud GPU page.

GPU quota

Your GPU quota is the total number of GPUs that can run in your Google Cloud project. To create clusters with GPUs, your project must have sufficient GPU quota.

Your GPU quota should be at least equivalent to the total number of GPUs you intend to run in your cluster. If you enable cluster autoscaling, you should request GPU quota at least equivalent to your cluster's maximum number of nodes multiplied by the number of GPUs per node.

For example, if you create a cluster with three nodes that runs two GPUs per node, your project requires at least six GPU quota.

Requesting GPU quota

To request GPU quota, use the Google Cloud console. For more information about requesting quotas, refer to GPU quotas in the Compute Engine documentation.

To search for GPU quota and submit a quota request, use the Google Cloud console:

Go to the IAM & Admin Quotas page in the Google Cloud console.

Go to Quotas
In the Filter box, do the following:
1. Select the Quota property, enter the name of the GPU model, and press Enter.
2. (Optional) To apply more advanced filters to narrow the results, select the Dimensions (e.g. locations) property, add the name of the region or zone you are using, and press Enter.
From the list of GPU quotas, select the quota you want to change.
Click Edit Quotas. A request form opens.
Fill the New quota limit field for each quota request.
Fill the Request description field with details about your request.
Click Next.
In the Override confirmation dialog, click Confirm.
In the Contact details screen, enter your name and a phone number that the approvers might use to complete your quota change request.
Click Submit request.
You receive a confirmation email to track the quota change.

Running GPUs

To run GPUs in GKE Standard clusters, create a node pool with attached GPUs. When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, GKE automatically taints the GPU nodes with the following node taint:

Key: nvidia.com/gpu
Effect: NoSchedule

Additionally, GKE automatically applies the corresponding tolerations to Pods requesting GPUs by running the ExtendedResourceToleration admission controller.

This causes only Pods requesting GPUs to be scheduled on GPU nodes, which enables more efficient autoscaling: your GPU nodes can quickly scale down if there are not enough Pods requesting GPUs.

For better cost-efficiency, reliability, and availability of GPUs on GKE, we recommend the following actions:

Create separate GPU node pools. For each node pool, limit the node location to the zones where the GPUs you want are available.
Enable autoscaling in each node pool.
Use regional clusters to improve availability by replicating the Kubernetes control plane across zones in the region.
Tell GKE to automatically install either the default or latest GPU drivers on the node pools so that you don't need to manually install and manage your driver versions.

Create a GPU node pool

To create a separate GPU node pool in an existing cluster you can use the Google Cloud console or the Google Cloud CLI. You can also use Terraform for provisioning your GKE clusters and GPU node pool.

You can tell GKE to automatically install the default or latest NVIDIA driver version that corresponds to your Container-Optimized OS version.

gcloud

To create a node pool with GPUs in a cluster, run the following command:

gcloud container node-pools create POOL_NAME \
  --accelerator type=GPU_TYPE,count=AMOUNT,gpu-driver-version=DRIVER_VERSION \
  [--machine-type MACHINE_TYPE] \
  --region COMPUTE_REGION --cluster CLUSTER_NAME \
  --node-locations COMPUTE_ZONE1[,COMPUTE_ZONE2] \
  [--enable-autoscaling \
   --min-nodes MIN_NODES \
   --max-nodes MAX_NODES] \
  [--ephemeral-storage-local-ssd=count=SSD_COUNT]

Replace the following:

POOL_NAME: the name you choose for the node pool.
GPU_TYPE: the GPU type. Can be one of the following:
- nvidia-tesla-k80
- nvidia-tesla-p100
- nvidia-tesla-p4
- nvidia-tesla-v100
- nvidia-tesla-t4
- nvidia-tesla-a100
- nvidia-a100-80gb
- nvidia-l4
- nvidia-h100-80gb
DRIVER_VERSION: the NVIDIA driver version to install. Can be one of the following:
- default: Install the default driver version for your GKE version.
- latest: Install the latest available driver version for your GKE version. Available only for nodes that use Container-Optimized OS.
- disabled: Skip automatic driver installation. You must manually install a driver after you create the node pool. If you omit gpu-driver-version, this is the default option.
Note: The gpu-driver-version option is only available for GKE version 1.27.2-gke.1200 and later. In earlier versions, omit this flag and manually install a driver after you create the node pool. If you upgrade an existing cluster or node pool to this version or later, GKE automatically installs the default driver version that corresponds to the GKE version, unless you specify differently when you start the upgrade.
AMOUNT: the number of GPUs to attach to nodes in the node pool.
MACHINE_TYPE: the Compute Engine machine type for the nodes. Required for the following GPU types:
- nvidia-h100: A3 machine type
- nvidia-tesla-a100 or nvidia-a100-80gb: A2 machine type
- nvidia-l4: G2 machine type.
For all other GPUs, this flag is optional.
COMPUTE_REGION: the cluster's Compute Engine region, such as us-central1. Choose a region that has at least one zone where the requested GPUs are available.
CLUSTER_NAME: the name of the cluster in which to create the node pool.
COMPUTE_ZONE1,COMPUTE_ZONE2,[...]: the specific zones where GKE creates the GPU nodes. The zones must be in the same region as the cluster, specified by the --region flag. The GPU types that you define must be available in each selected zone. We recommend that you always use the --node-locations flag when creating the node pool to specify the zone or zones containing the requested GPUs.
MIN_NODES: the minimum number of nodes for each zone in the node pool at any time. This value is relevant only if the --enable-autoscaling flag is used.
MAX_NODES: the maximum number of nodes for each zone in the node pool at any time. This value is relevant only if the --enable-autoscaling flag is used.
SSD_COUNT: the number of Local SSDs to attach for ephemeral storage. This flag is required to use Local SSDs in A3 machine types with H100 GPUs.

For example, the following command creates a highly-available autoscaling node pool, p100, with two P100 GPUs for each node, in the regional cluster p100-cluster. GKE automatically installs the default drivers on those nodes.

gcloud container node-pools create p100 \
  --accelerator type=nvidia-tesla-p100,count=2,gpu-driver-version=default \
  --region us-central1 --cluster p100-cluster \
  --node-locations us-central1-c \
  --min-nodes 0 --max-nodes 5 --enable-autoscaling

Console

To create a node pool with GPUs:

Go to the Google Kubernetes Engine page in the Google Cloud console.

Go to Google Kubernetes Engine
In the cluster list, click the name of the cluster you want to modify.
Click Add Node Pool.
Optionally, on the Node pool details page, select the Enable autoscaling checkbox.
Configure your node pool as desired.
From the navigation pane, select Nodes.
Under Machine configuration, click GPU.
Select a GPU type and Number of GPUs to run on each node.
Read the warning and select I understand the limitations.
In the GPU Driver installation section, select one of the following methods:
- Google-managed: GKE automatically installs a driver. If you select this option, choose one of the following from the Version drop-down:
  - Default: Install the default driver version.
  - Latest: Install the latest available driver version.
- Customer-managed: GKE doesn't install a driver. You must manually install a compatible driver using the instructions in Installing NVIDIA GPU device drivers.
Click Create.

Terraform

You can create a regional cluster with Terraform with GPUs using a Terraform module.

Set the Terraform variables by including the following block in the variables.tf file:
```
variable "project_id" {
  default     = PROJECT_ID
  description = "the gcp_name_short project where GKE creates the cluster"
}

variable "region" {
  default     = CLUSTER_REGION
  description = "the gcp_name_short region where GKE creates the cluster"
}

variable "zone" {
  default     = "COMPUTE_ZONE1,COMPUTE_ZONE2"
  description = "the GPU nodes zone"
}

variable "cluster_name" {
  default     = "CLUSTER_NAME"
  description = "the name of the cluster"
}

variable "gpu_type" {
  default     = "GPU_TYPE"
  description = "the GPU accelerator type"
}

variable "gpu_driver_version" {
  default = "DRIVER_VERSION"
  description = "the NVIDIA driver version to install"
}

variable "machine_type" {
  default = "MACHINE_TYPE"
  description = "The Compute Engine machine type for the VM"
}
```
Replace the following:
- PROJECT_ID: your project ID.
- CLUSTER_NAME: the name of the GKE cluster.
- CLUSTER_REGION: the compute region for the cluster.
- COMPUTE_ZONE1,COMPUTE_ZONE2,[...]: the specific zones where GKE creates the GPU nodes. The zones must be in the same region specified by the region variable. These zones must have the GPU types you defined available. To learn which zones have GPUs, see Availability. You should use the node_locations variable when creating the GPU node pool to specify the zone or zones containing the requested GPUs.
- GPU_TYPE: the GPU type. Can be one of the following:
  - nvidia-tesla-k80
  - nvidia-tesla-p100
  - nvidia-tesla-p4
  - nvidia-tesla-v100
  - nvidia-tesla-t4
  - nvidia-tesla-a100
  - nvidia-a100-80gb
  - nvidia-l4
  - nvidia-h100-80gb
- DRIVER_VERSION: the GPU driver version for GKE to automatically install. This field is optional. The following values are supported:
  - INSTALLATION_DISABLED: Disable automatic GPU driver installation. You must manually install drivers to run your GPUs.
  - DEFAULT: Automatically install the default driver version for your node operating system version.
  - LATEST: Automatically install the latest available driver version for your node OS version.
If you omit this field, GKE doesn't automatically install a driver. This field isn't supported in node pools that use node auto-provisioning. To manually install a driver, see Manually install NVIDIA GPU drivers in this document. * MACHINE_TYPE: the Compute Engine machine type for the nodes. Required for the following GPU types:
- nvidia-h100: A3 machine type
- nvidia-tesla-a100 or nvidia-a100-80gb: A2 machine type
- nvidia-l4: G2 machine type.
For all other GPUs, this flag is optional.

Add the following block to your Terraform configuration:

provider "google" {
  project = var.project_id
  region  = var.region
}

resource "google_container_cluster" "ml_cluster" {
  name     = var.cluster_name
  location = var.region
  node_locations = [var.zone]
}

resource "google_container_node_pool" "gpu_pool" {
  name       = google_container_cluster.ml_cluster.name
  location   = var.region
  cluster    = google_container_cluster.ml_cluster.name
  node_count = 3

  autoscaling {
    total_min_node_count = "1"
    total_max_node_count = "5"
  }

  management {
    auto_repair  = "true"
    auto_upgrade = "true"
  }

  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/trace.append",
      "https://www.googleapis.com/auth/service.management.readonly",
      "https://www.googleapis.com/auth/servicecontrol",
    ]

    labels = {
      env = var.project_id
    }

    guest_accelerator {
      type  = var.gpu_type
      count = 1
      gpu_driver_installation_config {
        gpu_driver_version = var.gpu_driver_version
      }
    }

    image_type   = "cos_containerd"
    machine_type = var.machine_type
    tags         = ["gke-node", "${var.project_id}-gke"]

    disk_size_gb = "30"
    disk_type    = "pd-standard"

    metadata = {
      disable-legacy-endpoints = "true"
    }
  }
}

Terraform calls Google Cloud APIs to set create a new cluster with a node pool that uses GPUs. The node pool initially has three nodes and autoscaling is enabled. To learn more about Terraform, see the google_container_node_pool resource spec.

You can also create a new cluster with GPUs and specify zones using the --node-locations flag. However, we recommend that you create a separate GPU node pool in an existing cluster, as shown in this section.

Manually install NVIDIA GPU drivers

If you chose to disable automatic device driver installation when you created a GPU node pool, or if you're using a GKE version earlier than the minimum supported version for automatic installation, you must manually install a compatible NVIDIA GPU driver on the nodes. Google provides a DaemonSet that you can apply to install the drivers. On GPU nodes that use Container-Optimized OS, you also have the option of selecting between the default GPU driver version or a newer version.

We recommend that you use automatic driver installation when possible by specifying the gpu-driver-version option in the --accelerator flag when you create your Standard cluster. If you used the installation DaemonSet to manually install GPU drivers on or before January 25, 2023, you might need to re-apply the DaemonSet to get a version that ignores nodes that use automatic driver installation.

The following instructions show you how to install the drivers on Container-Optimized OS (COS) and Ubuntu nodes, and using Terraform.

COS

To deploy the installation DaemonSet and install the default GPU driver version, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml

Alternatively, to install the newer GPU driver version (see table below), run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml

The installation takes several seconds to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.

Each version of Container-Optimized OS image has at least one supported NVIDIA GPU driver version. See the release notes of the major Container-Optimized OS LTS milestones for the default supported version.

The following table lists the available driver versions in each GKE version:

GKE version	NVIDIA driver
1.26	R470(default), R510, or R525
1.25	R470(default), R510, or R525
1.24	R470(default), R510, or R525
1.23	R450(default), R470, R510, or R525
1.22	R450(default), R470, R510, or R525
1.21	R450(default), R470, or R510
1.20	R450(default), R470

Ubuntu

To deploy the installation DaemonSet for all GPUs except NVIDIA L4 GPUs, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml

The installation takes several seconds to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.

For NVIDIA L4 GPUs, install the R525 driver instead by using the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded-R525.yaml

The following table lists the available driver versions in each GKE version:

GKE version	NVIDIA driver
1.27	R470
1.26	R470
1.25	R470
1.24	R470
1.23	R470
1.22	R450
1.21	R450
1.20	R450

Terraform

You can use Terraform to install the default GPU driver version based on the type of nodes. In both cases, you must configure the kubectl_manifest Terraform resource type.

To install the DaemonSet on COS, add the following block in your Terraform configuration:

  data "http" "nvidia_driver_installer_manifest" {
    url = "https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml"
  }

  resource "kubectl_manifest" "nvidia_driver_installer" {
    yaml_body = data.http.nvidia_driver_installer_manifest.body
  }

To install DaemonSet on Ubuntu, add the following block in your Terraform configuration:

  data "http" "nvidia_driver_installer_manifest" {
    url = "https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml"
  }

  resource "kubectl_manifest" "nvidia_driver_installer" {
    yaml_body = data.http.nvidia_driver_installer_manifest.body
  }

Using node auto-provisioning with GPUs

When using node auto-provisioning with GPUs, the auto-provisioned node pools by default do not have sufficient scopes to install the drivers. To grant the required scopes, modify the default scopes for node auto-provisioning to add logging.write, monitoring, devstorage.read_only, and compute, such as in the following example.

gcloud container clusters update CLUSTER_NAME --enable-autoprovisioning \
    --min-cpu=1 --max-cpu=10 --min-memory=1 --max-memory=32 \
    --autoprovisioning-scopes=https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring,https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/compute

In GKE version 1.29.2-gke.1108000 and later, you can select a GPU driver version for GKE to automatically install in auto-provisioned GPU nodes. Add the following field to your manifest:

spec:
  nodeSelector:
    cloud.google.com/gke-gpu-driver-version: "DRIVER_VERSION"

Replace DRIVER_VERSION with one of the following values:

default - the default, stable driver for your node GKE version. If you omit the nodeSelector in your manifest, this is the default option.
latest - the latest available driver version for your node GKE version.

To learn more about auto-provisioning, see Using node auto-provisioning.

Configuring Pods to consume GPUs

You use a resource limit to configure Pods to consume GPUs. You specify a resource limit in a Pod specification using the following key-value pair

Key: nvidia.com/gpu
Value: Number of GPUs to consume

Below is an example of a Pod specification that consumes GPUs:

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 600; done;"]
    resources:
      limits:
       nvidia.com/gpu: 2

Consuming multiple GPU types

If you want to use multiple GPU accelerator types per cluster, you must create multiple node pools, each with their own accelerator type. GKE attaches a unique node selector to GPU nodes to help place GPU workloads on nodes with specific GPU types:

Key: cloud.google.com/gke-accelerator
Value: nvidia-tesla-k80, nvidia-tesla-p100, nvidia-tesla-p4, nvidia-tesla-v100, nvidia-tesla-t4, nvidia-tesla-a100, nvidia-a100-80gb, nvidia-h100-80gb, or nvidia-l4.

You can target particular GPU types by adding this node selector to your workload's Pod specification. For example:

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 600; done;"]
    resources:
      limits:
       nvidia.com/gpu: 2
  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-tesla-k80

Upgrade node pools using accelerators (GPUs and TPUs)

GKE automatically upgrades Standard clusters, including node pools. You can also manually upgrade node pools if you want your nodes on a later version sooner. To control how upgrades work for your cluster, use release channels, maintenance windows and exclusions, and rollout sequencing.

You can also configure a node upgrade strategy for your node pool, such as surge upgrades or blue-green upgrades. By configuring these strategies, you can ensure that the node pools are upgraded in a way that achieves the optimal balance between speed and disruption for your environment. For multi-host TPU slice node pools, instead of using the configured node upgrade strategy, GKE atomically recreates the entire node pool in a single step. To learn more, see the definition of atomicity in Terminology related to TPU in GKE.

Using a node upgrade strategy will temporarily require GKE to provision additional resources, depending on the configuration. If Google Cloud has limited capacity for your node pool's resources—for example, you're seeing resource availability errors when trying to create more nodes with GPUs or TPUs—see Upgrade in a resource-constrained environment.

About the NVIDIA CUDA-X libraries

CUDA is NVIDIA's parallel computing platform and programming model for GPUs. To use CUDA applications, the image that you use must have the libraries. To add the NVIDIA CUDA-X libraries, use any of the following methods:

Recommended: Use an image with the NVIDIA CUDA-X libraries pre-installed. For example, you can use Deep Learning Containers. These containers pre-install the key data science frameworks, the NVIDIA CUDA-X libraries, and tools. Alternatively, the NVIDIA CUDA image contains only the NVIDIA CUDA-X libraries.
Build and use your own image. In this case, include the following values in the LD_LIBRARY_PATH environment variable in your container specification:
1. /usr/local/cuda-CUDA_VERSION/lib64: the location of the NVIDIA CUDA-X libraries on the node. Replace CUDA_VERSION with the CUDA-X image version that you used. Some versions also contain debug utilities in /usr/local/nvidia/bin. For details, see the NVIDIA CUDA image on DockerHub.
2. /usr/local/nvidia/lib64: the location of the NVIDIA device drivers.

To check the minimum GPU driver version required for your version of CUDA, see CUDA Toolkit and Compatible Driver Versions. Ensure that the GKE patch version running on your nodes includes a GPU driver version that's compatible with your chosen CUDA version. For a list of GPU driver versions associated with GKE version, refer to the corresponding Container-Optimized OS page linked in the GKE current versions table.

Monitor GPU nodes

If your GKE cluster has system metrics enabled, then the following metrics are available in Cloud Monitoring to monitor your GPU workload performance:

Duty Cycle (container/accelerator/duty_cycle): Percentage of time over the past sample period (10 seconds) during which the accelerator was actively processing. Between 1 and 100.
Memory Usage (container/accelerator/memory_used): Amount of accelerator memory allocated in bytes.
Memory Capacity (container/accelerator/memory_total): Total accelerator memory in bytes.

You can use predefined dashboards to monitor your clusters with GPU nodes. For more information, see View observability metrics. For general information about monitoring your clusters and their resources, refer to Observability for GKE.

View usage metrics for workloads

You view your workload GPU usage metrics from the Workloads dashboard in the Google Cloud console.

To view your workload GPU usage, perform the following steps:

Go to the Workloads page in the Google Cloud console.
Go to Workloads
Select a workload.

The Workloads dashboard displays charts for GPU memory usage and capacity, and GPU duty cycle.

View NVIDIA Data Center GPU Manager (DCGM) metrics

You can collect and visualize NVIDIA DCGM metrics by using Google Cloud Managed Service for Prometheus. For Standard clusters, you must install the NVIDIA drivers. For Autopilot clusters, GKE installs the drivers.

For instructions on how to deploy DCGM and the Prometheus DCGM exporter, see NVIDIA Data Center GPU Manager (DCGM) in the Google Cloud Observability documentation.

Configure GPU node graceful termination

In GKE clusters with the control plane running 1.29.1-gke.1425000 or later, GPU nodes support SIGTERM signals that alert the node of an imminent shutdown. The imminent shutdown notification is configurable up to 60 minutes in GPU nodes.

You can configure GKE to terminate your workloads gracefully within this notification timeframe. In your Pod manifest, set the spec.terminationGracePeriodSeconds field to a value up to a maximum of 60 minutes (3600 seconds). For example, to get a notification time of 10 minutes, in your Pod manifest, set the spec.terminationGracePeriodSeconds field to 600 seconds as follows:

  spec:
    terminationGracePeriodSeconds: 600

GKE makes a best effort to terminate these Pods gracefully and to execute the termination action that you define for example, saving a training state.

Run GPUs in GKE Standard node pools

Overview

Before you begin

Requirements

Limitations

Availability

Pricing

GPU quota

Requesting GPU quota

Running GPUs

Create a GPU node pool

gcloud

Console

Terraform

Manually install NVIDIA GPU drivers

COS

Ubuntu

Terraform

Using node auto-provisioning with GPUs

Configuring Pods to consume GPUs

Consuming multiple GPU types

Upgrade node pools using accelerators (GPUs and TPUs)

About the NVIDIA CUDA-X libraries

Monitor GPU nodes

View usage metrics for workloads

View NVIDIA Data Center GPU Manager (DCGM) metrics

Configure GPU node graceful termination

What's next