This page shows you how to use NVIDIA® graphics processing unit (GPU) hardware accelerators in your Google Kubernetes Engine (GKE) Standard clusters' nodes. For more information about GPUs in GKE, refer to About GPUs in GKE.
You can also use GPUs directly in your Autopilot Pods. For instructions, refer to Deploy GPU workloads in Autopilot.
Overview
With GKE, you can create node pools equipped with NVIDIA Tesla® K80, P100, P4, V100, T4, L4, and A100 GPUs. GPUs provide compute power to drive deep-learning tasks such as image recognition, natural language processing, as well as other compute-intensive tasks such as video transcoding and image processing.
You can also use GPUs with Spot VMs if your workloads can tolerate frequent node disruptions. Using Spot VMs reduces the price of running GPUs. To learn more, refer to Using Spot VMs with GPU node pools.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI.
Requirements
GPUs on GKE have the following requirements:
- Kubernetes version: For node pools using the Container-Optimized OS node image, GPU nodes are available in GKE version 1.9 or higher. For node pools using the Ubuntu node image, GPU nodes are available in GKE version 1.11.3 or higher.
GPU quota: You must have Compute Engine GPU quota in your desired zone before you can create GPU nodes. To ensure that you have enough GPU quota in your project, refer to Quotas in the Google Cloud console.
If you require additional GPU quota, you must request GPU quota in the Google Cloud console. If you have an established billing account, your project should automatically receive quota after you submit the quota request.
NVIDIA GPU drivers: You must manually install NVIDIA GPU drivers on your nodes. Further down in this page, we explain how to install the drivers.
A100 GPUs: A100 GPUs are only supported on a2 machine types, and requires GKE version 1.18.6-gke.3504 or higher. You must ensure that you have enough quota for the underlying A2 machine type to use the A100 GPU.
L4 GPUs:
- You must use GKE version 1.22.17-gke.5400 or later.
- You must ensure that you have enough quota for the underlying G2 Compute Engine machine type to use L4 GPUs.
- The GKE version that you choose must include NVIDIA driver version 525 or later in Container-Optimized OS. If driver version 525 or later isn't the default version in your GKE version, you must manually install a supported driver on your nodes.
Limitations
Before using GPUs on GKE, keep in mind the following limitations:
- You cannot add GPUs to existing node pools.
- GPU nodes cannot be live migrated during maintenance events.
The GPU type you can use depends on the machine series, as follows:
- A2 machine series: A100 GPUs.
- G2 machine series: L4 GPUs.
- N1 machine series: All GPUs except A100 and L4.
You should ensure that you have enough quota in your project for the machine series that corresponds to your selected GPU type and quantity.
GPUs are not supported in Windows Server node pools.
In GKE versions 1.22 to 1.25, the cluster autoscaler only supports basic scaling up and down of nodes with L4 GPUs. This limitation doesn't apply to GKE version 1.26 and later.
Availability
GPUs are available in specific regions and zones. When you request GPU quota, consider the regions in which you intend to run your clusters.
For a complete list of applicable regions and zones, refer to GPUs on Compute Engine.
You can also see GPUs available in your zone using the Google Cloud CLI. To see a list of all GPU accelerator types supported in each zone, run the following command:
gcloud compute accelerator-types list
Pricing
For GPU pricing information, refer to the pricing table on the Google Cloud GPU page.
GPU quota
Your GPU quota is the total number of GPUs that can run in your Google Cloud project. To create clusters with GPUs, your project must have sufficient GPU quota.
Your GPU quota should be at least equivalent to the total number of GPUs you intend to run in your cluster. If you enable cluster autoscaling, you should request GPU quota at least equivalent to your cluster's maximum number of nodes multiplied by the number of GPUs per node.
For example, if you create a cluster with three nodes that runs two GPUs per node, your project requires at least six GPU quota.
Requesting GPU quota
To request GPU quota, use the Google Cloud console. For more information about requesting quotas, refer to Requesting additional quota in the Compute Engine documentation.
Searching for GPU quotas
To search for GPU quota, perform the following steps in the Google Cloud console:
Go to the IAM & Admin Quotas page in the Google Cloud console.
In the Metrics drop-down menu, click None, then enter "gpus" in the search field.
From the search results, select the desired GPUs:
Close the Metrics drop-down menu.
Submitting quota request
To submit a quota request, perform the following steps in the Google Cloud console:
- From the list of GPU quotas, select the quotas for your desired
regions, such as
us-central1
. - Click Edit Quotas. A request form opens.
- Fill the New quota limit field for each quota request.
- Fill the Request description field with details about your request.
- Click Done.
- Click Submit request.
Running GPUs
When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, GKE automatically taints the GPU nodes with the following node taint:
- Key:
nvidia.com/gpu
- Effect:
NoSchedule
Additionally, GKE automatically applies the corresponding tolerations to Pods requesting GPUs by running the ExtendedResourceToleration admission controller.
This causes only Pods requesting GPUs to be scheduled on GPU nodes, which enables more efficient autoscaling: your GPU nodes can quickly scale down if there are not enough Pods requesting GPUs.
For better cost-efficiency, reliability, and availability of GPUs on GKE, we recommend the following actions:
- Create separate GPU node pools. For each node pool, limit the node location to the zones where the GPUs you want are available.
- Enable autoscaling in each node pool.
- Use regional clusters to improve availability by replicating the Kubernetes control plane across zones in the region.
To create a separate GPU node pool in an existing cluster you can use the Google Cloud console or the Google Cloud CLI. You can also use Terraform for provisioning your GKE clusters and GPU node pool.
gcloud
To create a node pool with GPUs in a cluster, run the following command:
gcloud container node-pools create POOL_NAME \
--accelerator type=GPU_TYPE,count=AMOUNT \
[--machine-type MACHINE_TYPE] \
--region COMPUTE_REGION --cluster CLUSTER_NAME \
--node-locations COMPUTE_ZONE1[,COMPUTE_ZONE2] \
[--num-nodes NUM_NODES] \
[--enable-autoscaling \
--min-nodes MIN_NODES \
--max-nodes MAX_NODES]
Replace the following:
POOL_NAME
: the name you choose for the node pool.GPU_TYPE
: the GPU type. Can be one of the following:nvidia-tesla-k80
nvidia-tesla-p100
nvidia-tesla-p4
nvidia-tesla-v100
nvidia-tesla-t4
nvidia-tesla-a100
nvidia-a100-80gb
nvidia-l4
AMOUNT
: the number of GPUs to attach to nodes in the node pool.MACHINE_TYPE
: the Compute Engine machine type for the nodes. Required ifGPU_TYPE
isnvidia-tesla-a100
ornvidia-a100-80gb
, which can only use an A2 machine type, or ifGPU_TYPE
isnvidia-l4
, which can only use a G2 machine type. For all other GPUs, this flag is optional.COMPUTE_REGION
: the cluster's Compute Engine region, such asus-central1
. Choose a region that has at least one zone where the requested GPUs are available.CLUSTER_NAME
: the name of the cluster in which to create the node pool.COMPUTE_ZONE1,COMPUTE_ZONE2,[...]
: the specific zones where GKE creates the GPU nodes. The zones must be in the same region as the cluster, specified by the--region
flag. The GPU types that you define must be available in each selected zone. We recommend that you always use the--node-locations
flag when creating the node pool to specify the zone or zones containing the requested GPUs.NUM_NODES
: the initial number of nodes to be created.MIN_NODES
: the minimum number of nodes for each zone in the node pool at any time. This value is relevant only if the--enable-autoscaling
flag is used.MAX_NODES
: the maximum number of nodes for each zone in the node pool at any time. This value is relevant only if the--enable-autoscaling
flag is used.
For example, the following command creates a high-available autoscaling node
pool, p100
, with two P100 GPUs for each node, in the regional cluster p100-cluster
.
gcloud container node-pools create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--region us-central1 --cluster p100-cluster \
--node-locations us-central1-c \
--num-nodes 3 --min-nodes 0 --max-nodes 5 --enable-autoscaling
Console
To create a node pool with GPUs:
Go to the Google Kubernetes Engine page in the Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Click add_box Add Node Pool.
Optionally, on the Node pool details page, select the Enable autoscaling checkbox.
Configure your node pool as desired.
From the navigation pane, select Nodes.
Under Machine family, click GPU.
Select a GPU type and Number of GPUs to run on each node.
Read the warning and select I understand the limitations.
Configure the machine as desired.
Click Create.
Terraform
You can create a regional cluster with Terraform with GPUs using a Terraform module.
Set the Terraform variables by including the following block in the
variables.tf
file:variable "project_id" { default = PROJECT_ID description = "the gcp_name_short project where GKE creates the cluster" } variable "region" { default = CLUSTER_REGION description = "the gcp_name_short region where GKE creates the cluster" } variable "zone" { default = "COMPUTE_ZONE1,COMPUTE_ZONE2" description = "the GPU nodes zone" } variable "cluster_name" { default = "CLUSTER_NAME" description = "the name of the cluster" } variable "gpu_type" { default = "GPU_TYPE" description = "the GPU accelerator type" }
Replace the following:
PROJECT_ID
: your project ID.CLUSTER_NAME
: the name of the GKE cluster.CLUSTER_REGION
: the compute region for the cluster.COMPUTE_ZONE1,COMPUTE_ZONE2,[...]
: the specific zones where GKE creates the GPU nodes. The zones must be in the same region specified by theregion
variable. These zones must have the GPU types you defined available. To learn which zones have GPUs, see Availability. You should use thenode_locations
variable when creating the GPU node pool to specify the zone or zones containing the requested GPUs.GPU_TYPE
: the GPU type. Can be one of the following:nvidia-tesla-k80
nvidia-tesla-p100
nvidia-tesla-p4
nvidia-tesla-v100
nvidia-tesla-t4
nvidia-tesla-a100
nvidia-a100-80gb
nvidia-l4
Add the following block to your Terraform configuration:
provider "google" { project = var.project_id region = var.region } resource "google_container_cluster" "ml_cluster" { name = var.cluster_name location = var.region node_locations = [var.zone] } resource "google_container_node_pool" "gpu_pool" { name = google_container_cluster.ml_cluster.name location = var.region cluster = google_container_cluster.ml_cluster.name node_count = 3 autoscaling { total_min_node_count = "1" total_max_node_count = "5" } management { auto_repair = "true" auto_upgrade = "true" } node_config { oauth_scopes = [ "https://www.googleapis.com/auth/logging.write", "https://www.googleapis.com/auth/monitoring", "https://www.googleapis.com/auth/devstorage.read_only", "https://www.googleapis.com/auth/trace.append", "https://www.googleapis.com/auth/service.management.readonly", "https://www.googleapis.com/auth/servicecontrol", ] labels = { env = var.project_id } guest_accelerator { type = var.gpu_type count = 1 } image_type = "cos_containerd" machine_type = "n1-standard-1" tags = ["gke-node", "${var.project_id}-gke"] disk_size_gb = "30" disk_type = "pd-standard" metadata = { disable-legacy-endpoints = "true" } } }
Terraform calls Google Cloud APIs to set create a new cluster with a node pool that uses GPUs. The node pool initially has three nodes and autoscaling is enabled. To learn more about Terraform, see the google_container_node_pool
resource spec.
You can also create a new cluster with GPUs and specify zones using the
--node-locations
flag. However, we recommend that you create a separate GPU node pool in an
existing cluster, as shown in this section.
Installing NVIDIA GPU device drivers
After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers on the nodes. Google provides a DaemonSet that you can apply to install the drivers. On GPU nodes that use Container-Optimized OS images, you also have the option of selecting between the default GPU driver version or a newer version. The following table describes the GPU driver versions available for specific GKE versions.
Refer to the section below for installation instructions for Container-Optimized OS (COS) and Ubuntu nodes. Terraform instructions are also available.
COS
To deploy the installation DaemonSet and install the default GPU driver version, run the following command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
Alternatively, to install the newer GPU driver version (see table below), run the following command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml
The installation takes several seconds to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.
Each version of Container-Optimized OS image has at least one supported NVIDIA GPU driver version. See the release notes of the major Container-Optimized OS LTS milestones for the default supported version.
The following table lists the GKE version each Nvidia driver version is supported from:
GKE version | Nvidia driver |
---|---|
1.26 | R470(default), R510, or R525 |
1.25 | R470(default), R510, or R525 |
1.24 | R470(default), R510, or R525 |
1.23 | R450(default), R470, R510, or R525 |
1.22 | R450(default), R470, R510, or R525 |
1.21 | R450(default), R470, or R510 |
1.20 | R450(default), R470 |
Ubuntu
Note that GPU support requires v1.11.3 or higher for Ubuntu nodes.
To deploy the installation DaemonSet, run the following command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml
The installation takes several seconds to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.
The following table lists the GKE version each Nvidia driver version is supported from:
GKE version | NVIDIA driver |
---|---|
1.26 | R470 |
1.25 | R470 |
1.24 | R470 |
1.23 | R470 |
1.22 | R450 |
1.21 | R450 |
1.20 | R450 |
Terraform
You can use Terraform to install the default GPU driver version based on the
type of nodes. In both cases, you must configure the
kubectl_manifest
Terraform resource type.
To install the DaemonSet on COS, add the following block in your Terraform configuration:
data "http" "nvidia_driver_installer_manifest" { url = "https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml" } resource "kubectl_manifest" "nvidia_driver_installer" { yaml_body = data.http.nvidia_driver_installer_manifest.body } }
To install DaemonSet on Ubuntu, add the following block in your Terraform configuration:
data "http" "nvidia_driver_installer_manifest" { url = "https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml" } resource "kubectl_manifest" "nvidia_driver_installer" { yaml_body = data.http.nvidia_driver_installer_manifest.body } }
Using node auto-provisioning with GPUs
When using node auto-provisioning with GPUs, the auto-provisioned node pools by
default do not have sufficient scopes to run the installation DaemonSet. To
grant the required scopes, modify the default scopes for node auto-provisioning
to add logging.write
, monitoring
, devstorage.read_only
, and compute
,
such as in the following example:
gcloud container clusters update CLUSTER_NAME --enable-autoprovisioning \
--min-cpu=1 --max-cpu=10 --min-memory=1 --max-memory=32 \
--autoprovisioning-scopes=https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring,https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/compute
To learn more about auto-provisioning, see Using node auto-provisioning.
Configuring Pods to consume GPUs
You use a resource limit to configure Pods to consume GPUs. You specify a resource limit in a Pod specification using the following key-value pair
- Key:
nvidia.com/gpu
- Value: Number of GPUs to consume
Below is an example of a Pod specification that consumes GPUs:
apiVersion: v1
kind: Pod
metadata:
name: my-gpu-pod
spec:
containers:
- name: my-gpu-container
image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 600; done;"]
resources:
limits:
nvidia.com/gpu: 2
Consuming multiple GPU types
If you want to use multiple GPU accelerator types per cluster, you must create multiple node pools, each with their own accelerator type. GKE attaches a unique node selector to GPU nodes to help place GPU workloads on nodes with specific GPU types:
- Key:
cloud.google.com/gke-accelerator
- Value:
nvidia-tesla-k80
,nvidia-tesla-p100
,nvidia-tesla-p4
,nvidia-tesla-v100
,nvidia-tesla-t4
,nvidia-tesla-a100
,nvidia-a100-80gb
, ornvidia-l4
.
You can target particular GPU types by adding this node selector to your workload's Pod specification. For example:
apiVersion: v1
kind: Pod
metadata:
name: my-gpu-pod
spec:
containers:
- name: my-gpu-container
image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
command: ["/bin/bash", "-c", "--"]
args: ["while true; do sleep 600; done;"]
resources:
limits:
nvidia.com/gpu: 2
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-k80
About the NVIDIA CUDA-X libraries
CUDA® is NVIDIA's parallel computing platform and programming model for GPUs. To use CUDA applications, the libraries must be present in the image you are using. You can do any of the following to add the NVIDIA CUDA-X libraries:
Use an image with the NVIDIA CUDA-X libraries pre-installed. For example, you can use Google's Deep Learning Containers. These containers pre-install the key data science frameworks, the NVIDIA CUDA-X libraries, and tools. Alternatively, NVIDIA's CUDA image contains the NVIDIA CUDA-X libraries only.
Build and use your own image. In this case, include
/usr/local/cuda-XX.X/lib64
, which contains the NVIDIA CUDA-X libraries, and/usr/local/nvidia/lib64
, which contains the NVIDIA device drivers, in theLD_LIBRARY_PATH
environment variable. For/usr/local/cuda-XX.X/lib64
, the name of the directory depends on the version of the image you used. For example, the NVIDIA CUDA-X libraries and debug utilities in Docker containers can be at/usr/local/cuda-11.0/lib64
and/usr/local/nvidia/bin
, respectively.
To check the minimum GPU driver version required for your version of CUDA, see CUDA Toolkit and Compatible Driver Versions. Ensure that the GKE patch version running on your nodes includes a GPU driver version that's compatible with your chosen CUDA version. For a list of GPU driver versions associated with GKE version, refer to the corresponding Container-Optimized OS page linked in the GKE current versions table.
Monitor GPU nodes
If your GKE cluster has system metrics enabled, then the following metrics are available in Cloud Monitoring to monitor your GPU workload performance:
- Duty Cycle (
container/accelerator/duty_cycle
): Percentage of time over the past sample period (10 seconds) during which the accelerator was actively processing. Between 1 and 100. - Memory Usage (
container/accelerator/memory_used
): Amount of accelerator memory allocated in bytes. - Memory Capacity (
container/accelerator/memory_total
): Total accelerator memory in bytes.
For more information about monitoring your clusters and their resources, refer to Monitoring.
View usage metrics
You view your workload GPU usage metrics from the Workloads dashboard in the Google Cloud console.
To view your workload GPU usage, perform the following steps:
Go to the Workloads page in the Google Cloud console.
Go to Workloads- Select a workload.
The Workloads dashboard displays charts for GPU memory usage and capacity, and GPU duty cycle.
What's next
- Learn more about node pools.
- Learn how to use a minimum CPU platform for your nodes.
- Learn how to create and set up a local deep learning container with Docker.