This page shows you how to use NVIDIA® graphics processing unit (GPU) hardware accelerators in your Google Kubernetes Engine clusters' nodes.
Overview
With GKE, you can create node pools equipped with NVIDIA Tesla® K80, P100, P4, V100, and T4 GPUs. GPUs provide compute power to drive deep-learning tasks such as image recognition, natural language processing, as well as other compute-intensive tasks such as video transcoding and image processing.
You can also use GPUs with preemptible VMs if your workloads can tolerate frequent node disruptions. Using preemptible VMs reduces the price of running GPUs. To learn more, refer to GPUs on preemptible instances.
Requirements
GPUs on GKE have the following requirements:
- Kubernetes version
- For node pools using the Container-Optimized OS node image, GPU nodes are available in GKE version 1.9 or higher. For node pools using the Ubuntu node image, GPU nodes are available in GKE version 1.11.3 or higher.
- GPU quota
You must have Compute Engine GPU quota in your desired zone before you can create GPU nodes. To ensure that you have enough GPU quota in your project, refer to Quotas in Google Cloud Console.
If you require additional GPU quota, you must request GPU quota in Cloud Console. If you have an established billing account, your project should automatically receive quota after you submit the quota request.
- NVIDIA GPU drivers
You must manually install NVIDIA GPU drivers on your nodes. This page explains how to install the drivers.
Limitations
Before using GPUs on GKE, keep in mind the following limitations:
- You cannot add GPUs to existing node pools.
- GPU nodes cannot be live migrated during maintenance events.
Availability
GPUs are available in specific regions and zones. When you request GPU quota, consider the regions in which you intend to run your clusters.
For a complete list of applicable regions and zones, refer to GPUs on Compute Engine.
You can also see GPUs available in your zone using the gcloud
command-line
tool.
gcloud
To see a list of all GPU accelerator types supported in each zone, run the following command:
gcloud compute accelerator-types list
Pricing
For GPU pricing information, refer to the pricing table on the GCP GPU page.
GPU quota
Your GPU quota is the total number of GPUs that can run in your GCP project. To create clusters with GPUs, your project must have sufficient GPU quota.
Your GPU quota should be at least equivalent to the total number of GPUs you intend to run in your cluster. If you enable cluster autoscaling, you should request GPU quota at least equivalent to your cluster's maximum number of nodes multiplied by the number of GPUs per node.
For example, if you create a cluster with three nodes that runs two GPUs per node, your project requires at least six GPU quota.
Requesting GPU quota
You request GPU quota using Cloud Console.
Searching for GPU quotas
Console
Perform the following steps:
Visit the Cloud Identity and Access Management (Cloud IAM) Quotas menu in Cloud Console.
From the Metrics drop-down menu, click None, then enter "gpus" in the search field.
From the search results, select the desired GPUs:
Close the Metrics drop-down menu.
Submitting quota request
Console
Perform the following steps:
- From the list of GPU quotas, select the quotas for your desired
regions, such as
us-central1
. - Click Edit Quotas. A request form opens on the right side of Cloud Console.
- Fill the New quota limit field for each quota request.
- Fill the Request description field with details about your request.
- Click Done.
- Click Submit request.
Running GPUs
The following sections explain how to run GPUs in GKE clusters.
Creating an autoscaling GPU node pool
To take the best, most cost-effective advantage of GPUs on GKE, and to take advantage of cluster autoscaling, we recommend creating separate GPU node pools in your clusters.
When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, GKE automatically taints the GPU nodes with the following node taint:
- Key:
nvidia.com/gpu
- Effect:
NoSchedule
Additionally, GKE automatically applies the corresponding tolerations to Pods requesting GPUs by running the ExtendedResourceToleration admission controller.
This causes only Pods requesting GPUs to be scheduled on GPU nodes, which enables more efficient autoscaling: your GPU nodes can quickly scale down if there are not enough Pods requesting GPUs.
You create a GPU node pool in an existing cluster using Cloud Console
or the gcloud
command-line tool.
gcloud
To create a node pool with GPUs, run the following command:
gcloud container node-pools create [POOL_NAME] \ --accelerator type=[GPU_TYPE],count=[AMOUNT] --zone [COMPUTE_ZONE] \ --cluster [CLUSTER_NAME] [--num-nodes 3 --min-nodes 0 --max-nodes 5 \ --enable-autoscaling]
where:
[POOL_NAME]
is the name you choose for the node pool.[GPU_TYPE]
is the GPU type, eithernvidia-tesla-k80
,nvidia-tesla-p100
,nvidia-tesla-p4
,nvidia-tesla-v100
, ornvidia-tesla-t4
.[AMOUNT]
is the number of GPUs to attach to nodes in the node pool.[COMPUTE_ZONE]
is the compute zone in which to create the node pool, such asus-central1-c
. The cluster must already run in the zone specified.[CLUSTER_NAME]
is the name of the cluster in which to create the node pool.--num-nodes
specifies the initial number of nodes to be created.--min-nodes
specifies the minimum number of nodes to run any given time.--max-nodes
specifies the maximum number of nodes that can run.--enable-autoscaling
allows the node pool to autoscale when workload demand changes.
For example, the following command creates an autoscaling node pool, p100
,
with two P100 GPUs, in the cluster p100-cluster
:
gcloud container node-pools create p100 \ --accelerator type=nvidia-tesla-p100,count=2 \ --zone us-central1-c --cluster p100-cluster \ --num-nodes 3 --min-nodes 0 --max-nodes 5 --enable-autoscaling
Console
Visit the Google Kubernetes Engine menu in Cloud Console.
Select the desired cluster.
Click Edit.
From Node pools, click Add node pool.
Optionally, from the Autoscaling drop-down menu, select On.
Configure your node pool as desired. Then, from Machine type, click Customize.
From the Number of GPUs drop-down menu, select the desired number of GPUs to run per node.
From the GPU type drop-down menu, select the desired GPU type.
Acknowledge the warning by selecting the I understand the limitations checkbox.
Click Save.
Creating a new zonal cluster with GPUs
You create a zonal cluster that runs GPUs using Cloud Console or the
gcloud
command-line tool.
gcloud
To create a zonal cluster with GPUs running in its default node pool, run the following command:
gcloud container clusters create [CLUSTER_NAME] \ --accelerator type=[GPU_TYPE],count=[AMOUNT] \ --zone [COMPUTE_ZONE]
where:
[CLUSTER_NAME]
is the name you choose for the cluster.[GPU_TYPE]
is the GPU type, eithernvidia-tesla-k80
,nvidia-tesla-p100
,nvidia-tesla-p4
,nvidia-tesla-v100
, ornvidia-tesla-t4
.[AMOUNT]
is the number of GPUs to run in the default node pool.[COMPUTE_ZONE]
is the cluster's compute zone, such asus-central1-c
.
For example, the following command creates a cluster, p100
, with three
nodes (the default when --num-nodes
is omitted) and two P100 GPUs per
node:
gcloud container clusters create p100 \ --accelerator type=nvidia-tesla-p100,count=2 \ --zone us-central1-c
Console
Visit the Google Kubernetes Engine menu in Cloud Console.
Click Create cluster.
Choose the GPU Accelerated Computing cluster template.
Configure your cluster as desired. Then customize the GPU Node Pool or add additional GPU Node Pools.
Acknowledge the warning by selecting I understand the limitations.
Click Create.
Creating a new regional cluster with GPUs
By default, regional clusters create nodes in three zones of a region. However, no GCP region has GPUs in all three zones. When you create a regional clusters with GPUs, you must manually specify the zones in which GPUs are available. To learn which zones have GPUs, see Availability.
You create a regional GPU cluster using the gcloud
command-line tool.
gcloud
To create a regional cluster with GPUs, run the following command:
gcloud container clusters create [CLUSTER_NAME] \ --accelerator type=[GPU_TYPE],count=[AMOUNT] \ --region [REGION] --node-locations [ZONE],[ZONE]
where:
- [CLUSTER_NAME] is the name you choose for the cluster.
- [GPU_TYPE] is the GPU type:
nvidia-tesla-k80
,nvidia-tesla-p100
,nvidia-tesla-p4
,nvidia-tesla-v100
, ornvidia-tesla-t4
. - [AMOUNT] is the number of GPUs to run per node.
- [REGION] is the cluster's region, such as
us-central1
. - [ZONE] is a compute zone within the region, such as
us-central1-c
. Zone(s) must have the GPU types you specify.
For example, the following command creates a cluster, p100
, with three
nodes (the default when --num-nodes
is omitted) and two P100 GPUs per
node, in two zones within us-central1
:
gcloud container clusters create p100 \ --accelerator type=nvidia-tesla-p100,count=2 \ --region us-central1 \ --node-locations us-central1-a,us-central1-c
Console
Visit the Google Kubernetes Engine menu in Cloud Console.
Click Create cluster.
Choose the GPU Accelerated Computing cluster template.
From Location type, choose Regional. From Region, select your desired region.
Configure your cluster as desired. Then customize the GPU Node Pool or add additional GPU Node Pools.
Acknowledge the warning by selecting I understand the limitations.
Click Create.
Installing NVIDIA GPU device drivers
After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers to the nodes. Google provides a DaemonSet that automatically installs the drivers for you.
Refer to the section below for installation instructions for Container-Optimized OS (COS) and Ubuntu nodes.
COS
To deploy the installation DaemonSet, run the following command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
The installation takes several minutes to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.
The following table lists the GKE version each Nvidia driver version is supported from:
GKE version | Nvidia driver |
---|---|
1.14.2-gke.3 and higher | 418.67 |
1.14.2-gke.2 and lower | 410.79 |
1.13.6-gke.6 and higher | 418.67 |
1.13.6-gke.5 and lower | 410.79 |
1.12.x | 410.79 |
1.11.5 and higher | 410.79 |
1.10.5-gke.4 and higher | 396.46 |
1.10.2-gke.3 and higher | 390.46 |
Ubuntu
Note that GPU support requires v1.11.3 or higher for Ubuntu nodes.
To deploy the installation DaemonSet, run the following command:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml
The installation takes several seconds to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.
The following table lists the GKE version each Nvidia driver version is supported from:
GKE version | Nvidia driver |
---|---|
1.14.x | 410.104 |
1.13.x | 410.104 |
1.12.6-gke.6 and higher | 410.104 |
1.12.6-gke.5 and lower | 384.111 |
1.11.8-gke.4 and higher | 410.104 |
1.11.8-gke.3 and lower | 384.111 |
Configuring Pods to consume GPUs
You use a resource limit to configure Pods to consume GPUs. You specify a resource limit in a Pod specification using the following key-value pair
- Key:
nvidia/gpu
- Value: Number of GPUs to consume
Below is an example of a Pod specification that consumes GPUs:
apiVersion: v1 kind: Pod metadata: name: my-gpu-pod spec: containers: - name: my-gpu-container image: nvidia/cuda:10.0-runtime-ubuntu18.04 command: ["/bin/bash"] resources: limits: nvidia.com/gpu: 2
Consuming multiple GPU types
If you want to use multiple GPU accelerator types per cluster, you must create multiple node pools, each with their own accelerator type. GKE attaches a unique node selector to GPU nodes to help place GPU workloads on nodes with specific GPU types:
- Key:
cloud.google.com/gke-accelerator
- Value:
nvidia-tesla-k80
,nvidia-tesla-p100
,nvidia-tesla-p4
,nvidia-tesla-v100
, ornvidia-tesla-t4
.
You can target particular GPU types by adding this node selector to your workload's Pod specification. For example:
apiVersion: v1 kind: Pod metadata: name: my-gpu-pod spec: containers: - name: my-gpu-container image: nvidia/cuda:10.0-runtime-ubuntu18.04 command: ["/bin/bash"] resources: limits: nvidia.com/gpu: 2 nodeSelector: cloud.google.com/gke-accelerator: nvidia-tesla-k80 # or nvidia-tesla-p100 or nvidia-tesla-p4 or nvidia-tesla-v100 or nvidia-tesla-t4
About the CUDA libraries
CUDA® is NVIDIA's parallel computing platform and programming model for GPUs. The NVIDIA device drivers you install in your cluster include the CUDA libraries.
CUDA libraries and debug utilities are made available inside the container at
/usr/local/nvidia/lib64
and /usr/local/nvidia/bin
, respectively.
CUDA applications running in Pods consuming NVIDIA GPUs need to dynamically
discover CUDA libraries. This requires including /usr/local/nvidia/lib64
in
the LD_LIBRARY_PATH
environment variable.
You should use Ubuntu-based CUDA Docker base images for
CUDA applications in GKE, where LD_LIBRARY_PATH
is already
set appropriately. The latest supported CUDA version is 10.0
on both COS
(1.11.5+) and Ubuntu (1.11.8-gke.4+, 1.12.6-gke.6+).
Monitoring GPU nodes
GKE exposes the following Stackdriver Monitoring metrics for containers using GPUs. You can use these metrics to monitor your GPU workloads' performance:
- Duty Cycle (
container/accelerator/duty_cycle
): Percentage of time over the past sample period (10 seconds) during which the accelerator was actively processing. Between 1 and 100. - Memory Usage (
container/accelerator/memory_used
): Amount of accelerator memory allocated in bytes. - Memory Capacity (
container/accelerator/memory_total
): Total accelerator memory in bytes.
These metrics are made available in Stackdriver.
For more information about monitoring your clusters and their resources, refer to Monitoring.
View usage metrics
You view your workloads' GPU usage metrics from the Workloads dashboard in Cloud Console.
Console
To view your workloads' GPU usage, perform the following steps:
Visit the Workloads menu in Cloud Console.
Select the desired workload.
The Workloads dashboard displays charts for GPU memory usage and capacity, and GPU duty cycle.