GPUs on Kubernetes Engine

This page explains how to use NVIDIA® graphics processing unit (GPU) hardware accelerators in your Kubernetes Engine cluster nodes.

Overview

You can create node pools where the nodes are equipped with NVIDIA Tesla® V100, P100 and K80 GPUs. GPUs are useful for accelerating specific workloads in your clusters, such as machine learning and image processing. To learn more about use cases for GPUs, refer to Google Cloud Platform's GPUs page.

Requirements

GPUs on Kubernetes Engine have the following requirements:

Kubernetes version 1.9+
GPU-enabled nodes are available in Kubernetes Engine clusters running Kubernetes version 1.9.0 or later.
GPU quota
You must have Compute Engine GPU quota in your desired zone before you can create clusters or node pools with GPUs. To ensure that you have enough GPUs available in your project, refer to Quotas in Google Cloud Platform Console.
If GPUs are not listed on the quotas page, or if you require additional GPU quota, you must request GPU quota in GCP Console. If you have an established billing account, your project should automatically receive quota after you submit the quota request.
NVIDIA GPU drivers

You must manually install NVIDIA GPU drivers on your nodes. This page explains how to install the drivers.

Limitations

Support for NVIDIA GPUs on Kubernetes Engine has the following limitations:

  • GPUs are only supported for the Container-Optimized OS node image.
  • You cannot add GPUs to existing node pools.
  • GPU nodes cannot be live migrated during maintenance events.
  • GPU nodes run the NVIDIA GPU device plugin system addon and have the DevicePlugins Kubernetes alpha feature enabled. Kubernetes Engine automatically manages this device plugin, but Google does not provide support for any third-party device plugins.

Availability

GPUs are available in specific regions and zones. When you request GPU quota, consider the regions in which you intend to run your clusters.

To see a list of all GPU accelerator types supported in each zone, run the following command:

gcloud compute accelerator-types list

For a complete list of applicable regions and zones, refer to GPUs on Compute Engine.

Pricing

For GPU pricing information, refer to the pricing table on the GCP GPU page.

Requesting GPU quota

Your GPU quota is the total number of GPUs that can run in your project. To create clusters with GPUs, your project must have sufficient GPU quota. Your GPU quota should be at least equivalent to the total number of GPUs you intend to run in your cluster.

For example, if you create a cluster with three nodes that runs two GPUs per node, your project requires at least six GPU quota. If you enable cluster autoscaling, you should request GPU quota at least equivalent to your cluster's maximum number of nodes multiplied by the number of GPUs per node.

You request GPU quota using GCP Console.

For more information about requesting quotas, refer to Requesting additional quota.

Console

  1. Visit the Cloud IAM Quotas menu in GCP Console.

    Visit the Quotas menu

  2. From the Metrics drop-down menu, enter "gpus" in the search field and select NVIDIA K80 GPUs and NVIDIA P100 GPUs.

  3. From the list of quotas, select the GPU quotas in your desired locations, such as us-central1.
  4. Click Edit Quotas. A request form opens on the right side of GCP Console.
  5. Fill the New quota limit field for each quota request.
  6. Fill the Request description field with details about your request.
  7. Click Done.
  8. Click Submit request.

Running GPUs

The following sections explain how to run GPUs in Kubernetes Engine clusters.

Creating an autoscaling GPU node pool

To take the best, most cost-effective advantage of GPUs on Kubernetes Engine, and to take advantage of cluster autoscaling, we recommend creating separate GPU node pools in your clusters.

When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, Kubernetes Engine automatically taints the GPU nodes with the following node taint:

  • Key: nvidia.com/gpu
  • Effect: NoSchedule

Additionally, Kubernetes Engine automatically applies the corresponding tolerations to Pods requesting GPUs by running the ExtendedResourceToleration admission controller.

This causes only Pods requesting GPUs to be scheduled on GPU nodes, which enables more efficient autoscaling: your GPU nodes can quickly scale down if there are not enough Pods requesting GPUs.

You create a GPU node pool in an existing cluster using GCP Console or the gcloud command-line tool.

Console

  1. Visit the Kubernetes Engine menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Select the desired cluster.

  3. Click Edit.
  4. From Node pools, click Add node pool.
  5. Optionally, from the Autoscaling drop-down menu, select On.
  6. Configure your node pool as desired. Then, from Machine type, click Customize.
  7. From the Number of GPUs drop-down menu, select the desired number of GPUs to run in the default node pool.
  8. From the GPU type drop-down menu, select the desired GPU type.
  9. Acknowledge the warning by selecting the I understand the limitations checkbox.
  10. Click Save.

gcloud

To create a node pool with GPUs attached to the default node pool, run the following command:

gcloud beta container node-pools create [POOL_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] --zone [COMPUTE-ZONE] \
--cluster [CLUSTER-NAME] [--num-nodes 3 --min-nodes 2 --max-nodes 5 \
--enable-autoscaling]

where:

  • [POOL_NAME] is the name you choose for the node pool.
  • [GPU_TYPE] is the GPU type, either nvidia-tesla-v100, nvidia-tesla-p100 or nvidia-tesla-k80.
  • [AMOUNT] is the number GPUs to attach to nodes in the node pool.
  • [COMPUTE_ZONE] is the cluster's compute zone, such as us-central1-c.
  • [CLUSTER-NAME] is the name of the cluster in which to create the node pool.
  • --num-nodes specifies the initial number of nodes to be created.
  • --min-nodes specifies the minimum number of nodes to run any given time.
  • --max-nodes specifies the maximum number of nodes that can run.
  • --enable-autoscaling allows the node pool to autoscale when workload demand changes.

For example, the following command creates an autoscaling node pool, p100, with two P100 GPUs, in the cluster p100-cluster:

gcloud beta container node-pools create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--zone us-central1-c --cluster p100-cluster \
--num-nodes 3 --min-nodes 0 --max-nodes 5 --enable-autoscaling

Creating a new cluster with GPUs

You create a cluster that runs GPUs using GCP Console or the gcloud command-line tool.

Console

  1. Visit the Kubernetes Engine menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Click Create cluster.

  3. From the Cluster Version drop-down menu, select Kubernetes version 1.9.X or later.
  4. Configure your cluster as desired. Then, from Machine type, click Customize.
  5. Click GPUs.
  6. From the Number of GPUs drop-down menu, select the desired number of GPUs to run per node.
  7. From the GPU type drop-down menu, select the desired GPU type.
  8. Acknowledge the warning by selecting I understand the limitations.
  9. Click Create.

gcloud

To create a cluster with GPUs attached to the default node pool, run the following command:

gcloud beta container clusters create [CLUSTER_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] \
--zone [COMPUTE_ZONE] --cluster-version [CLUSTER_VERSION]

where:

  • [CLUSTER_NAME] is the name you choose for the cluster.
  • [GPU_TYPE] is the GPU type, either nvidia-tesla-v100, nvidia-tesla-p100 or nvidia-tesla-k80.
  • [AMOUNT] is the number GPUs to attach to every node in the default node pool.
  • [COMPUTE_ZONE] is the cluster's compute zone, such as us-central1-c.
  • [CLUSTER_VERSION] is Kubernetes Engine version 1.9.0 or later.

For example, the following command creates a cluster, p100, with three nodes (the default when --num-nodes is omitted) and two P100 GPUs per node:

gcloud beta container clusters create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--zone us-central1-c --cluster-version 1.9

Installing NVIDIA GPU device drivers

After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers to the nodes. Google provides a DaemonSet that automatically installs the drivers for you.

To deploy the installation DaemonSet, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/nvidia-driver-installer/cos/daemonset-preloaded.yaml

The installation takes several minutes to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.

Configuring Pods to consume GPUs

You use a resource limit to configure Pods to consume GPUs. You specify a resource limit in a Pod specification using the following key-value pair

  • Key: nvidia/gpu
  • Value: Number of GPUs to consume

Below is an example of a Pod specification that consumes GPUs:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-gpu-container
    resources:
      limits:
       nvidia.com/gpu: 2

Consuming multiple GPU types

If you want to use multiple GPU accelerator types per cluster, you must create multiple node pools, each with their own accelerator type. Kubernetes Engine attaches a unique node selector to GPU nodes to help place GPU workloads on nodes with specific GPU types:

  • Key: cloud.google.com/gke-accelerator
  • Value: nvidia-tesla-k80 or nvidia-tesla-p100 or nvidia-tesla-v100

You can target particular GPU types by adding this node selector to your workload's Pod specification. For example:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-gpu-container
    resources:
      limits:
       nvidia.com/gpu: 2
  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-tesla-p100 # or nvidia-tesla-k80 or nvidia-tesla-v100

About the CUDA libraries

CUDA® is NVIDIA's parallel computing platform and programming model for GPUs. The NVIDIA device drivers you install in your cluster include the CUDA libraries.

CUDA libraries and debug utilities are made available inside the container at /usr/local/nvidia/lib64 and /usr/local/nvidia/bin, respectively.

CUDA applications running in Pods consuming NVIDIA GPUs need to dynamically discover CUDA libraries. This requires including /usr/local/nvidia/lib64 in the LD_LIBRARY_PATH environment variable.

You should use Ubuntu-based CUDA Docker base images for CUDA applications in Kubernetes Engine, where LD_LIBRARY_PATH is already set appropriately. The latest supported CUDA version is 9.0.

Monitoring GPU nodes

Kubernetes Engine exposes the following Stackdriver Monitoring metrics for containers using GPUs. You can use these metrics to monitor how your GPU workloads perform:

  • container/accelerator/duty_cycle
  • container/accelerator/memory_total
  • container/accelerator/memory_used

For more information about monitoring your clusters and their resources, refer to Monitoring.

What's next

Send feedback about...

Kubernetes Engine