GPUs

This page shows you how to use NVIDIA® graphics processing unit (GPU) hardware accelerators in your Google Kubernetes Engine clusters' nodes.

Overview

With GKE, you can create node pools equipped with NVIDIA Tesla® K80, P100, P4, V100, and T4 GPUs. GPUs provide compute power to drive deep-learning tasks such as image recognition, natural language processing, as well as other compute-intensive tasks such as video transcoding and image processing.

You can also use GPUs with preemptible VMs if your workloads can tolerate frequent node disruptions. Using preemptible VMs reduces the price of running GPUs. To learn more, refer to GPUs on preemptible instances.

Requirements

GPUs on GKE have the following requirements:

Kubernetes version
For node pools using the Container-Optimized OS node image, GPU nodes are available in GKE version 1.9 or higher. For node pools using the Ubuntu node image, GPU nodes are available in GKE version 1.11.3 or higher.
GPU quota

You must have Compute Engine GPU quota in your desired zone before you can create GPU nodes. To ensure that you have enough GPU quota in your project, refer to Quotas in Google Cloud Platform Console.

If you require additional GPU quota, you must request GPU quota in GCP Console. If you have an established billing account, your project should automatically receive quota after you submit the quota request.

NVIDIA GPU drivers

You must manually install NVIDIA GPU drivers on your nodes. This page explains how to install the drivers.

Limitations

Before using GPUs on GKE, keep in mind the following limitations:

  • You cannot add GPUs to existing node pools.
  • GPU nodes cannot be live migrated during maintenance events.

Availability

GPUs are available in specific regions and zones. When you request GPU quota, consider the regions in which you intend to run your clusters.

For a complete list of applicable regions and zones, refer to GPUs on Compute Engine.

You can also see GPUs available in your zone using the gcloud command-line tool.

gcloud

To see a list of all GPU accelerator types supported in each zone, run the following command:

gcloud compute accelerator-types list

Pricing

For GPU pricing information, refer to the pricing table on the GCP GPU page.

GPU quota

Your GPU quota is the total number of GPUs that can run in your GCP project. To create clusters with GPUs, your project must have sufficient GPU quota.

Your GPU quota should be at least equivalent to the total number of GPUs you intend to run in your cluster. If you enable cluster autoscaling, you should request GPU quota at least equivalent to your cluster's maximum number of nodes multiplied by the number of GPUs per node.

For example, if you create a cluster with three nodes that runs two GPUs per node, your project requires at least six GPU quota.

Requesting GPU quota

You request GPU quota using GCP Console.

Console

Perform the following steps:

  1. Visit the Cloud IAM Quotas menu in GCP Console.

    Visit the Quotas menu

  2. From the Metrics drop-down menu, click None, then enter "gpus" in the search field.

  3. From the search results, select the desired GPUs:

    A screenshot of the Metrics drop-down menu, showing GPUs metrics selected.

  4. Close the Metrics drop-down menu.

Submitting quota request

Console

Perform the following steps:

  1. From the list of GPU quotas, select the quotas for your desired regions, such as us-central1.
  2. Click Edit Quotas. A request form opens on the right side of GCP Console.
  3. Fill the New quota limit field for each quota request.
  4. Fill the Request description field with details about your request.
  5. Click Done.
  6. Click Submit request.

Running GPUs

The following sections explain how to run GPUs in GKE clusters.

Creating an autoscaling GPU node pool

To take the best, most cost-effective advantage of GPUs on GKE, and to take advantage of cluster autoscaling, we recommend creating separate GPU node pools in your clusters.

When you add a GPU node pool to an existing cluster that already runs a non-GPU node pool, GKE automatically taints the GPU nodes with the following node taint:

  • Key: nvidia.com/gpu
  • Effect: NoSchedule

Additionally, GKE automatically applies the corresponding tolerations to Pods requesting GPUs by running the ExtendedResourceToleration admission controller.

This causes only Pods requesting GPUs to be scheduled on GPU nodes, which enables more efficient autoscaling: your GPU nodes can quickly scale down if there are not enough Pods requesting GPUs.

You create a GPU node pool in an existing cluster using GCP Console or the gcloud command-line tool.

gcloud

To create a node pool with GPUs, run the following command:

gcloud container node-pools create [POOL_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] --zone [COMPUTE_ZONE] \
--cluster [CLUSTER_NAME] [--num-nodes 3 --min-nodes 0 --max-nodes 5 \
--enable-autoscaling]

where:

  • [POOL_NAME] is the name you choose for the node pool.
  • [GPU_TYPE] is the GPU type, either nvidia-tesla-k80, nvidia-tesla-p100, nvidia-tesla-p4, nvidia-tesla-v100, or nvidia-tesla-t4.
  • [AMOUNT] is the number of GPUs to attach to nodes in the node pool.
  • [COMPUTE_ZONE] is the compute zone in which to create the node pool, such as us-central1-c. The cluster must already run in the zone specified.
  • [CLUSTER_NAME] is the name of the cluster in which to create the node pool.
  • --num-nodes specifies the initial number of nodes to be created.
  • --min-nodes specifies the minimum number of nodes to run any given time.
  • --max-nodes specifies the maximum number of nodes that can run.
  • --enable-autoscaling allows the node pool to autoscale when workload demand changes.

For example, the following command creates an autoscaling node pool, p100, with two P100 GPUs, in the cluster p100-cluster:

gcloud container node-pools create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--zone us-central1-c --cluster p100-cluster \
--num-nodes 3 --min-nodes 0 --max-nodes 5 --enable-autoscaling

Console

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Select the desired cluster.

  3. Click Edit.

  4. From Node pools, click Add node pool.

  5. Optionally, from the Autoscaling drop-down menu, select On.

  6. Configure your node pool as desired. Then, from Machine type, click Customize.

  7. From the Number of GPUs drop-down menu, select the desired number of GPUs to run per node.

  8. From the GPU type drop-down menu, select the desired GPU type.

  9. Acknowledge the warning by selecting the I understand the limitations checkbox.

  10. Click Save.

Creating a new zonal cluster with GPUs

You create a zonal cluster that runs GPUs using GCP Console or the gcloud command-line tool.

gcloud

To create a zonal cluster with GPUs running in its default node pool, run the following command:

gcloud container clusters create [CLUSTER_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] \
--zone [COMPUTE_ZONE]

where:

  • [CLUSTER_NAME] is the name you choose for the cluster.
  • [GPU_TYPE] is the GPU type, either nvidia-tesla-k80, nvidia-tesla-p100, nvidia-tesla-p4, nvidia-tesla-v100, or nvidia-tesla-t4.
  • [AMOUNT] is the number of GPUs to run in the default node pool.
  • [COMPUTE_ZONE] is the cluster's compute zone, such as us-central1-c.

For example, the following command creates a cluster, p100, with three nodes (the default when --num-nodes is omitted) and two P100 GPUs per node:

gcloud container clusters create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--zone us-central1-c

Console

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Click Create cluster.

  3. Choose the GPU Accelerated Computing cluster template.

  4. Configure your cluster as desired. Then customize the GPU Node Pool or add additional GPU Node Pools.

  5. Acknowledge the warning by selecting I understand the limitations.

  6. Click Create.

Creating a new regional cluster with GPUs

By default, regional clusters create nodes in three zones of a region. However, no GCP region has GPUs in all three zones. When you create a regional clusters with GPUs, you must manually specify the zones in which GPUs are available. To learn which zones have GPUs, see Availability.

You create a regional GPU cluster using the gcloud command-line tool.

gcloud

To create a regional cluster with GPUs, run the following command:

gcloud container clusters create [CLUSTER_NAME] \
--accelerator type=[GPU_TYPE],count=[AMOUNT] \
--region [REGION] --node-locations [ZONE],[ZONE]

where:

  • [CLUSTER_NAME] is the name you choose for the cluster.
  • [GPU_TYPE] is the GPU type: nvidia-tesla-k80, nvidia-tesla-p100, nvidia-tesla-p4, nvidia-tesla-v100, or nvidia-tesla-t4.
  • [AMOUNT] is the number of GPUs to run per node.
  • [REGION] is the cluster's region, such as us-central1.
  • [ZONE] is a compute zone within the region, such as us-central1-c. Zone(s) must have the GPU types you specify.

For example, the following command creates a cluster, p100, with three nodes (the default when --num-nodes is omitted) and two P100 GPUs per node, in two zones within us-central1:

gcloud container clusters create p100 \
--accelerator type=nvidia-tesla-p100,count=2 \
--region us-central1 \ --node-locations us-central1-a,us-central1-c

Console

  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Click Create cluster.

  3. Choose the GPU Accelerated Computing cluster template.

  4. From Location type, choose Regional. From Region, select your desired region.

  5. Configure your cluster as desired. Then customize the GPU Node Pool or add additional GPU Node Pools.

  6. Acknowledge the warning by selecting I understand the limitations.

  7. Click Create.

Installing NVIDIA GPU device drivers

After adding GPU nodes to your cluster, you need to install NVIDIA's device drivers to the nodes. Google provides a DaemonSet that automatically installs the drivers for you.

Refer to the section below for installation instructions for Container-Optimized OS (COS) and Ubuntu nodes.

COS

To deploy the installation DaemonSet, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/stable/nvidia-driver-installer/cos/daemonset-preloaded.yaml

The installation takes several minutes to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.

The following table lists the GKE version each Nvidia driver version is supported from:

GKE version Nvidia driver
1.11.5+ 410.79
1.10.5-gke.4+ 396.46
1.10.2-gke.3+ 390.46

Ubuntu

Note that GPU support requires v1.11.3 or higher for Ubuntu nodes.

To deploy the installation DaemonSet, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml

The installation takes several seconds to complete. Once installed, the NVIDIA GPU device plugin surfaces NVIDIA GPU capacity via Kubernetes APIs.

The following table lists the GKE version each Nvidia driver version is supported from:

GKE version Nvidia driver
1.11.8-gke.4+/1.12.6-gke.6+ 410.104
Other 1.11/1.12 versions 384.111

Configuring Pods to consume GPUs

You use a resource limit to configure Pods to consume GPUs. You specify a resource limit in a Pod specification using the following key-value pair

  • Key: nvidia/gpu
  • Value: Number of GPUs to consume

Below is an example of a Pod specification that consumes GPUs:

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:10.0-runtime-ubuntu18.04
    resources:
      limits:
       nvidia.com/gpu: 2

Consuming multiple GPU types

If you want to use multiple GPU accelerator types per cluster, you must create multiple node pools, each with their own accelerator type. GKE attaches a unique node selector to GPU nodes to help place GPU workloads on nodes with specific GPU types:

  • Key: cloud.google.com/gke-accelerator
  • Value: nvidia-tesla-k80, nvidia-tesla-p100, nvidia-tesla-p4, nvidia-tesla-v100, or nvidia-tesla-t4.

You can target particular GPU types by adding this node selector to your workload's Pod specification. For example:

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:10.0-runtime-ubuntu18.04
    resources:
      limits:
       nvidia.com/gpu: 2
  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-tesla-k80 # or nvidia-tesla-p100 or nvidia-tesla-p4 or nvidia-tesla-v100 or nvidia-tesla-t4

About the CUDA libraries

CUDA® is NVIDIA's parallel computing platform and programming model for GPUs. The NVIDIA device drivers you install in your cluster include the CUDA libraries.

CUDA libraries and debug utilities are made available inside the container at /usr/local/nvidia/lib64 and /usr/local/nvidia/bin, respectively.

CUDA applications running in Pods consuming NVIDIA GPUs need to dynamically discover CUDA libraries. This requires including /usr/local/nvidia/lib64 in the LD_LIBRARY_PATH environment variable.

You should use Ubuntu-based CUDA Docker base images for CUDA applications in GKE, where LD_LIBRARY_PATH is already set appropriately. The latest supported CUDA version is 10.0 on both COS (1.11.5+) and Ubuntu (1.11.8-gke.4+, 1.12.6-gke.6+).

Monitoring GPU nodes

GKE exposes the following Stackdriver Monitoring metrics for containers using GPUs. You can use these metrics to monitor your GPU workloads' performance:

  • Duty Cycle (container/accelerator/duty_cycle): Percentage of time over the past sample period (10 seconds) during which the accelerator was actively processing. Between 1 and 100.
  • Memory Usage (container/accelerator/memory_used): Amount of accelerator memory allocated in bytes.
  • Memory Capacity (container/accelerator/memory_total): Total accelerator memory in bytes.

These metrics are made available in Stackdriver.

For more information about monitoring your clusters and their resources, refer to Monitoring.

View usage metrics

You view your workloads' GPU usage metrics from the Workloads dashboard in GCP Console.

Console

To view your workloads' GPU usage, perform the following steps:

  1. Visit the Workloads menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Select the desired workload.

The Workloads dashboard displays charts for GPU memory usage and capacity, and GPU duty cycle.

What's next

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Kubernetes Engine Documentation