GPUs on Kubernetes Engine

This page explains how to use NVIDIA® graphics processing unit (GPU) hardware accelerators in your Kubernetes Engine cluster nodes.

Overview

You can create node pools where the nodes are equipped with NVIDIA Tesla® P100 and K80 GPUs. GPUs are useful for accelerating specific workloads in your clusters, such as machine learning and image processing. To learn more about use cases for GPUs, refer to Google Cloud Platform's GPUS page.

GPU-enabled nodes are available in Kubernetes Engine clusters running Kubernetes version 1.9.0 or later.

Requirements

You must have GPU quota before you can create node pools with GPUs. To ensure that you have enough GPUs available in your project, refer to Quotas in Google Cloud Platform Console.

If GPUs are not listed on the quotas page, or if you require additional GPU quota, you must request additional quota. If you have an established billing account, your project should automatically receive quota after you submit the quota request.

During the beta period, you must manually install NVIDIA GPU device drivers on your nodes. This page explains how to install the drivers.

Limitations

Support for NVIDIA GPUs on Kubernetes Engine has the following limitations:

  • GPUs are only supported for the Container-Optimized OS node image.
  • You cannot add GPUs to existing node pools.
  • GPU nodes cannot be live migrated during maintenance events.
  • GPU nodes run the NVIDIA GPU device plugin system addon and have the DevicePlugins Kubernetes alpha feature enabled. Kubernetes Engine automatically manages this device plugin, but Google does not provide support for any third-party device plugins.

Availability

GPUs are available in specific regions. For a complete list of applicable regions and zones, refer to GPUs on Compute Engine.

To see a list of accelerator types supported in each zone, run the following command:

gcloud beta compute accelerator-types list

Pricing

For GPU pricing information, refer to the pricing table on the Google Cloud Platform GPU page.

Creating a cluster with GPUs

You create a cluster that runs GPUs in its default node pool using GCP Console or the gcloud command-line tool.

Console

  1. Visit the Kubernetes Engine menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Click Create cluster.

  3. Configure your cluster as desired. Then, from Machine type, click Customize.
  4. Click GPUs.
  5. From the Number of GPUs drop-down menu, select the desired number of GPUs to run in the default node pool.
  6. From the GPU type drop-down menu, select the desired GPU type.
  7. Acknowledge the warning by selecting I understand the limitations.
  8. Click Create.

gcloud

To create a cluster with GPUs attached to the default node pool, run the following command:

gcloud beta container clusters create [CLUSTER_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] \
--zone [COMPUTE_ZONE] --cluster-version [CLUSTER_VERSION]

where:

  • [CLUSTER_NAME] is the name you choose for the cluster.
  • [GPU_TYPE] is the GPU type, either nvidia-tesla-p100 or nvidia-tesla-k80.
  • [AMOUNT] is the number GPUs to attach to every node in the default node pool.
  • [COMPUTE_ZONE] is the cluster's compute zone, such as us-central1-a.
  • [CLUSTER_VERSION] is Kubernetes Engine version 1.9.0 or later.

For example, the following command creates a cluster, p100, with two P100 GPUs on each node:

gcloud beta container clusters create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--zone us-central1-a --cluster-version 1.9.2-gke.1

Installing NVIDIA GPU device drivers

After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers to the nodes. Google provides a DaemonSet that automatically installs the drivers for you.

To deploy the installation DaemonSet, run the following command:

kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/k8s-1.9/nvidia-driver-installer/cos/daemonset-preloaded.yaml

The installation takes several minutes to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.

Configuring Pods to consume GPUs

Below is an example of a Pod specification that consumes GPUs:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-gpu-container
    resources:
      limits:
       nvidia.com/gpu: 2

If you want to use multiple GPU accelerator types per cluster, you must create multiple node pools, each with their own accelerator type. Kubernetes Engine attaches a unique node selector to GPU nodes to help place GPU workloads on nodes with specific GPU types:

  • Key: cloud.google.com/gke-accelerator
  • Value: nvidia-tesla-k80 or nvidia-tesla-p100

You can target particular GPU types by adding this node selector to your workload's Pod specification. For example:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-gpu-container
    resources:
      limits:
       nvidia.com/gpu: 2
  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-tesla-p100 # or nvidia-tesla-k80

CUDA libraries and debug utilities are made available inside the container at /usr/local/nvidia/lib64 and /usr/local/nvidia/bin, respectively.

CUDA applications running in Pods consuming NVIDIA GPUs need to dynamically discover CUDA libraries. This requires including /usr/local/nvidia/lib64 in the LD_LIBRARY_PATH environment variable.

You should use Ubuntu-based CUDA Docker base images for CUDA applications in Kubernetes Engine, where LD_LIBRARY_PATH is already set appropriately. The latest supported CUDA version is 9.0.

Creating an autoscaling GPU node pool

To take the best, most cost-effective advantage of GPUs on Kubernetes Engine, and to take advantage of cluster autoscaling, we recommend creating separate GPU node pools in your clusters.

When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, Kubernetes Engine automatically taints the GPU nodes with the following node taint:

  • Key: nvidia.com/gpu
  • Effect: NoSchedule

Additionally, Kubernetes Engine automatically applies the corresponding tolerations to Pods requesting GPUs by running the ExtendedResourceToleration admission controller.

This causes only Pods requesting GPUs to be scheduled on GPU nodes, which enables more efficient autoscaling: your GPU nodes can quickly scale down if there are not enough Pods requesting GPUs.

You create a GPU node pool in an existing cluster using GCP Console or the gcloud command-line tool.

Console

  1. Visit the Kubernetes Engine menu in GCP Console.

    Visit the Kubernetes Engine menu

  2. Select the desired cluster.

  3. Click Edit.
  4. From Node pools, click Add node pool.
  5. Optionally, from the Autoscaling drop-down menu, select On.
  6. Configure your node pool as desired. Then, from Machine type, click Customize.
  7. From the Number of GPUs drop-down menu, select the desired number of GPUs to run in the default node pool.
  8. From the GPU type drop-down menu, select the desired GPU type.
  9. Acknowledge the warning by selecting the I understand the limitations checkbox.
  10. Click Save.

gcloud

To create a node pool with GPUs attached to the default node pool, run the following command:

gcloud beta container node-pools create [POOL_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] --zone [COMPUTE-ZONE] \
--cluster [CLUSTER-NAME] [--num-nodes 3 --min-nodes 2 --max-nodes 5 \
--enable-autoscaling]

where:

  • [POOL_NAME] is the name you choose for the node pool.
  • [GPU_TYPE] is the GPU type, either nvidia-tesla-p100 or nvidia-tesla-k80.
  • [AMOUNT] is the number GPUs to attach to nodes in the node pool.
  • [COMPUTE_ZONE] is the cluster's compute zone, such as us-central1-a.
  • [CLUSTER-NAME] is the name of the cluster in which to create the node pool.
  • --num-nodes specifies the initial number of nodes to be created.
  • --min-nodes specifies the minimum number of nodes to run any given time.
  • --max-nodes specifies the maximum number of nodes that can run.
  • --enable-autoscaling allows the node pool to autoscale when workload demand changes.

For example, the following command creates an autoscaling node pool, p100, with two P100 GPUs, in the cluster p100-cluster:

gcloud beta container node-pools create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--zone us-central1-a --cluster p100-cluster \
--num-nodes 3 --min-nodes 0 --max-nodes 5 --enable-autoscaling

Monitoring GPU nodes

Kubernetes Engine exposes the following Stackdriver Monitoring metrics for containers using GPUs. You can use these metrics to monitor how your GPU workloads perform:

  • container/accelerator/duty_cycle
  • container/accelerator/memory_total
  • container/accelerator/memory_used

For more information about monitoring your clusters and their resources, refer to Monitoring.

What's next

Send feedback about...

Kubernetes Engine