GPUs

This page shows you how to use NVIDIA® graphics processing unit (GPU) hardware accelerators in your Kubernetes Engine clusters' nodes.

Overview

With Kubernetes Engine, you can create node pools equipped with NVIDIA Tesla® V100, P100, and K80 GPUs. By running GPUs in your clusters, you can accelerate compute-intensive tasks such as deep learning (image recognition, natural language processing, etc.), video transcoding, image processing, and more.

You can also use GPUs with preemptible VMs if your workloads can tolerate frequent node disruptions. Using preemptible VMs reduces the price of running GPUs. To learn more, refer to GPUs on preemptible instances.

Requirements

GPUs on Kubernetes Engine have the following requirements:

Kubernetes version 1.9+
GPU nodes are available in Kubernetes Engine clusters running Kubernetes version 1.9 and later.
GPU quota
You must have Compute Engine GPU quota in your desired zone before you can create GPU nodes. To ensure that you have enough GPU quota in your project, refer to Quotas in Google Cloud Platform Console.
If you require additional GPU quota, you must request GPU quota in GCP Console. If you have an established billing account, your project should automatically receive quota after you submit the quota request.
NVIDIA GPU drivers

You must manually install NVIDIA GPU drivers on your nodes. This page explains how to install the drivers.

Container-Optimized OS

GPUs are only supported for the Container-Optimized OS node image.

Limitations

Before using GPUs on Kubernetes Engine, keep in mind the following limitations:

  • You cannot add GPUs to existing node pools.
  • GPU nodes cannot be live migrated during maintenance events.

Availability

GPUs are available in specific regions and zones. When you request GPU quota, consider the regions in which you intend to run your clusters.

For a complete list of applicable regions and zones, refer to [GPUs on Compute Engine].

You can also see GPUs available in your zone using the gcloud command-line tool.

gcloud

To see a list of all GPU accelerator types supported in each zone, run the following command:

gcloud compute accelerator-types list

Pricing

For GPU pricing information, refer to the pricing table on the GCP GPU page.

GPU quota

Your GPU quota is the total number of GPUs that can run in your GCP project. To create clusters with GPUs, your project must have sufficient GPU quota.

Your GPU quota should be at least equivalent to the total number of GPUs you intend to run in your cluster. If you enable cluster autoscaling, you should request GPU quota at least equivalent to your cluster's maximum number of nodes multiplied by the number of GPUs per node.

For example, if you create a cluster with three nodes that runs two GPUs per node, your project requires at least six GPU quota.

Requesting GPU quota

You request GPU quota using GCP Console.

Console

Perform the following steps:

  1. Visit the Cloud IAM Quotas menu in GCP Console.

    Visit the Quotas menu

  2. From the Metrics drop-down menu, click None, then enter "gpus" in the search field.

  3. From the search results, select the desired GPUs:

    A screenshot of the Metrics drop-down menu, showing GPUs metrics selected.

  4. Close the Metrics drop-down menu.

Submitting quota request

Console

Perform the following steps:

  1. From the list of GPU quotas, select the quotas for your desired regions, such as us-central1.
  2. Click Edit Quotas. A request form opens on the right side of GCP Console.
  3. Fill the New quota limit field for each quota request.
  4. Fill the Request description field with details about your request.
  5. Click Done.
  6. Click Submit request.

Running GPUs

The following sections explain how to run GPUs in Kubernetes Engine clusters.

Creating an autoscaling GPU node pool

To take the best, most cost-effective advantage of GPUs on Kubernetes Engine, and to take advantage of cluster autoscaling, we recommend creating separate GPU node pools in your clusters.

When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, Kubernetes Engine automatically taints the GPU nodes with the following node taint:

  • Key: nvidia.com/gpu
  • Effect: NoSchedule

Additionally, Kubernetes Engine automatically applies the corresponding tolerations to Pods requesting GPUs by running the ExtendedResourceToleration admission controller.

This causes only Pods requesting GPUs to be scheduled on GPU nodes, which enables more efficient autoscaling: your GPU nodes can quickly scale down if there are not enough Pods requesting GPUs.

You create a GPU node pool in an existing cluster using GCP Console or the gcloud command-line tool.

gcloud

To create a node pool with GPUs, run the following command:

gcloud container node-pools create [POOL_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] --zone [COMPUTE_ZONE] \
--cluster [CLUSTER_NAME] [--num-nodes 3 --min-nodes 0 --max-nodes 5 \
--enable-autoscaling]

where:

  • [POOL_NAME] is the name you choose for the node pool.
  • [GPU_TYPE] is the GPU type, either nvidia-tesla-v100, nvidia-tesla-p100, or nvidia-tesla-k80.
  • [AMOUNT] is the number of GPUs to attach to nodes in the node pool.
  • [COMPUTE_ZONE] is the compute zone in which to create the node pool, such as us-central1-c. The cluster must already run in the zone specified.
  • [CLUSTER_NAME] is the name of the cluster in which to create the node pool.
  • --num-nodes specifies the initial number of nodes to be created.
  • --min-nodes specifies the minimum number of nodes to run any given time.
  • --max-nodes specifies the maximum number of nodes that can run.
  • --enable-autoscaling allows the node pool to autoscale when workload demand changes.

For example, the following command creates an autoscaling node pool, p100, with two P100 GPUs, in the cluster p100-cluster:

gcloud container node-pools create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--zone us-central1-c --cluster p100-cluster \
--num-nodes 3 --min-nodes 0 --max-nodes 5 --enable-autoscaling

Console

  1. Visit the Kubernetes Engine menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Select the desired cluster.

  3. Click Edit.
  4. From Node pools, click Add node pool.
  5. Optionally, from the Autoscaling drop-down menu, select On.
  6. Configure your node pool as desired. Then, from Machine type, click Customize.
  7. From the Number of GPUs drop-down menu, select the desired number of GPUs to run per node.
  8. From the GPU type drop-down menu, select the desired GPU type.
  9. Acknowledge the warning by selecting the I understand the limitations checkbox.
  10. Click Save.

Creating a new zonal cluster with GPUs

You create a zonal cluster that runs GPUs using GCP Console or the gcloud command-line tool.

gcloud

To create a zonal cluster with GPUs running in its default node pool, run the following command:

gcloud container clusters create [CLUSTER_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] \
--zone [COMPUTE_ZONE] --cluster-version [CLUSTER_VERSION]

where:

  • [CLUSTER_NAME] is the name you choose for the cluster.
  • [GPU_TYPE] is the GPU type, either nvidia-tesla-v100, nvidia-tesla-p100 or nvidia-tesla-k80.
  • [AMOUNT] is the number of GPUs to run in the default node pool.
  • [COMPUTE_ZONE] is the cluster's compute zone, such as us-central1-c.
  • [CLUSTER_VERSION] is Kubernetes Engine version 1.9.0 or later.

For example, the following command creates a cluster, p100, with three nodes (the default when --num-nodes is omitted) and two P100 GPUs per node:

gcloud container clusters create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--zone us-central1-c --cluster-version 1.9

Console

  1. Visit the Kubernetes Engine menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Click Create cluster.

  3. From the Cluster Version drop-down menu, select Kubernetes version 1.9.X or later.
  4. Configure your cluster as desired. Then, from Machine type, click Customize.
  5. Click GPUs.
  6. From the Number of GPUs drop-down menu, select the desired number of GPUs to run per node.
  7. From the GPU type drop-down menu, select the desired GPU type.
  8. Acknowledge the warning by selecting I understand the limitations.
  9. Click Create.

Creating a new regional cluster with GPUs

By default, regional clusters create nodes in three zones of a region. However, no GCP region has GPUs in all three zones. When you create a regional clusters with GPUs, you must manually specify the zones in which GPUs are available. To learn which zones have GPUs, see Availability.

You create a regional GPU cluster using the gcloud command-line tool.

gcloud

To create a regional cluster with GPUs, run the following command:

gcloud container clusters create [CLUSTER_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] \
--region [REGION] --node-locations [ZONE],[ZONE] \
--cluster-version [VERSION]

where:

  • [CLUSTER_NAME] is the name you choose for the cluster.
  • [VERSION] is Kubernetes Engine version 1.9.0 or later.
  • [GPU_TYPE] is the GPU type: nvidia-tesla-v100, nvidia-tesla-p100, or nvidia-tesla-k80.
  • [AMOUNT] is the number of GPUs to run per node.
  • [REGION] is the cluster's region, such as us-central1.
  • [ZONE] is a compute zone within the region, such as us-central1-c. Zone(s) must have the GPU types you specify.

For example, the following command creates a cluster, p100, with three nodes (the default when --num-nodes is omitted) and two P100 GPUs per node, in two zones within us-central1:

gcloud container clusters create p100 \
--accelerator type=nvidia-tesla-k80,count=2 \
--region us-central1 \ --node-locations us-central1-a,us-central1-c \
--cluster-version 1.10

Installing NVIDIA GPU device drivers

After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers to the nodes. Google provides a DaemonSet that automatically installs the drivers for you.

kubectl

To deploy the installation DaemonSet, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/nvidia-driver-installer/cos/daemonset-preloaded.yaml

The installation takes several minutes to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.

Configuring Pods to consume GPUs

You use a resource limit to configure Pods to consume GPUs. You specify a resource limit in a Pod specification using the following key-value pair

  • Key: nvidia/gpu
  • Value: Number of GPUs to consume

Below is an example of a Pod specification that consumes GPUs:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-gpu-container
    resources:
      limits:
       nvidia.com/gpu: 2

Consuming multiple GPU types

If you want to use multiple GPU accelerator types per cluster, you must create multiple node pools, each with their own accelerator type. Kubernetes Engine attaches a unique node selector to GPU nodes to help place GPU workloads on nodes with specific GPU types:

  • Key: cloud.google.com/gke-accelerator
  • Value: nvidia-tesla-k80, nvidia-tesla-p100, or nvidia-tesla-v100

You can target particular GPU types by adding this node selector to your workload's Pod specification. For example:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-gpu-container
    resources:
      limits:
       nvidia.com/gpu: 2
  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-tesla-k80 # or nvidia-tesla-v100 or nvidia-tesla-p100

About the CUDA libraries

CUDA® is NVIDIA's parallel computing platform and programming model for GPUs. The NVIDIA device drivers you install in your cluster include the CUDA libraries.

CUDA libraries and debug utilities are made available inside the container at /usr/local/nvidia/lib64 and /usr/local/nvidia/bin, respectively.

CUDA applications running in Pods consuming NVIDIA GPUs need to dynamically discover CUDA libraries. This requires including /usr/local/nvidia/lib64 in the LD_LIBRARY_PATH environment variable.

You should use Ubuntu-based CUDA Docker base images for CUDA applications in Kubernetes Engine, where LD_LIBRARY_PATH is already set appropriately. The latest supported CUDA version is 9.0.

Monitoring GPU nodes

Kubernetes Engine exposes the following Stackdriver Monitoring metrics for containers using GPUs. You can use these metrics to monitor your GPU workloads' performance:

  • Duty Cycle (container/accelerator/duty_cycle): Percentage of time over the past sample period (10 seconds) during which the accelerator was actively processing. Between 1 and 100.
  • Memory Usage (container/accelerator/memory_used): Amount of accelerator memory allocated in bytes.
  • Memory Capacity (container/accelerator/memory_total): Total accelerator memory in bytes.

These metrics are made available in Stackdriver.

For more information about monitoring your clusters and their resources, refer to Monitoring.

View usage metrics

You view your workloads' GPU usage metrics from the Workloads dashboard in GCP Console.

Console

To view your workloads' GPU usage, perform the following steps:

  1. Visit the Workloads menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Select the desired workload.

The Workloads dashboard displays charts for GPU memory usage and capacity, and GPU duty cycle.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Kubernetes Engine