This page shows you how to let multiple workloads get time-shared access to a single NVIDIA® GPU hardware accelerator in your Google Kubernetes Engine (GKE) nodes. To learn more about how time-sharing works, as well as limitations and examples of when you should use time-sharing GPUs, refer to Time-sharing GPUs on GKE.
Who should use this guide
The instructions in this topic apply to you if you are one of the following:
- Platform administrator: Creates and manages a GKE cluster, plans infrastructure and resourcing requirements, and monitors the cluster's performance.
- Application developer: Designs and deploys workloads on GKE clusters. If you want instructions for requesting time-shared GPUs, refer to Deploy workloads that use time-shared GPUs.
Requirements and limitations
- You can enable time-sharing GPUs on GKE Standard clusters and node pools running GKE version 1.23.7-gke.1400 and later.
- You can't update existing clusters or node pools to enable time-sharing GPUs. Instead, create a new node pool with time-sharing enabled.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI.
- Ensure that you have sufficient NVIDIA Tesla GPU quota. If you need more quota, refer to Requesting an increase in quota.
- Ensure that your Google Cloud CLI components are at version 384.0.0 or later.
- Plan your time-sharing GPU capacity based on the resource needs of the workloads and the capacity of the underlying GPU.
Enable time-sharing GPUs on GKE clusters and node pools
As a platform administrator, you must enable time-sharing GPUs on a GKE Standard cluster before developers can deploy workloads to use the GPUs. To enable time-sharing, you must do the following:
- Enable time-sharing GPUs on a GKE cluster.
- Install NVIDIA GPU device drivers.
- Verify the GPU resources available on your nodes.
Enable time-sharing GPUs on a GKE cluster
You can enable time-sharing GPUs when you create GKE Standard clusters. The default node pool in the cluster has the feature enabled. You still need to enable time-sharing GPUs when you manually create new node pools in that cluster.
gcloud container clusters create CLUSTER_NAME \
--region=COMPUTE_REGION \
--cluster-version=CLUSTER_VERSION \
--machine-type=MACHINE_TYPE \
--accelerator=type=GPU_TYPE,count=GPU_QUANTITY,gpu-sharing-strategy=time-sharing,max-shared-clients-per-gpu=CLIENTS_PER_GPU
Replace the following:
CLUSTER_NAME
: the name of your new cluster.COMPUTE_REGION
: the Compute Engine region for your new cluster. For zonal clusters, specify--zone=COMPUTE_ZONE
.CLUSTER_VERSION
: the GKE version for the cluster control plane and nodes. Use GKE version 1.23.7-gke.1400 or later. Alternatively, specify a release channel with that GKE version by using the--release-channel=RELEASE_CHANNEL
flag.MACHINE_TYPE
: the Compute Engine machine type for your nodes. For A100 GPUs, use an A2 machine type. For all other GPUs, use an N1 machine type.GPU_TYPE
: the GPU type, which must be an NVIDIA Tesla GPU platform such asnvidia-tesla-v100
.GPU_QUANTITY
: the number of physical GPUs to attach to each node in the default node pool.CLIENTS_PER_GPU
: the maximum number of containers that can share each physical GPU.
Enable time-sharing GPUs on a GKE node pool
You can enable time-sharing GPUs when you manually create new node pools in a GKE cluster.
gcloud container node-pools create NODEPOOL_NAME \
--cluster=CLUSTER_NAME \
--machine-type=MACHINE_TYPE \
--region=COMPUTE_REGION \
--accelerator=type=GPU_TYPE,count=GPU_QUANTITY,gpu-sharing-strategy=time-sharing,max-shared-clients-per-gpu=CLIENTS_PER_GPU
Replace the following:
NODEPOOL_NAME
: the name of your new node pool.CLUSTER_NAME
: the name of your cluster, which must run GKE version 1.23.7-gke.1400 or later.COMPUTE_REGION
: the Compute Engine region of your cluster. For zonal clusters, specify--zone=COMPUTE_ZONE
.MACHINE_TYPE
: the Compute Engine machine type for your nodes. For A100 GPUs, use an A2 machine type. For all other GPUs, use an N1 machine type.GPU_TYPE
: the GPU type, which must be an NVIDIA Tesla GPU platform such asnvidia-tesla-v100
.GPU_QUANTITY
: the number of physical GPUs to attach to each node in the node pool.CLIENTS_PER_GPU
: the maximum number of containers that can share each physical GPU.
Install NVIDIA GPU device drivers
Before you proceed, connect to your cluster by running the following command:
gcloud container clusters get-credentials CLUSTER_NAME
After you create a new cluster or node pool and enable time-sharing GPUs, you need to install the GPU device drivers from NVIDIA that manage the time-sharing division of the physical GPUs. To install the drivers, you deploy a GKE installation DaemonSet that sets the drivers up.
For instructions, refer to Installing NVIDIA GPU device drivers.
If you plan to use node auto-provisioning in your cluster, you must also configure node auto-provisioning with the scopes that allow GKE to install the GPU device drivers for you. For instructions, refer to Using node auto-provisioning with GPUs
Verify the GPU resources available on your nodes
To verify that the number of GPUs visible in your nodes matches the number you specified when you enabled time-sharing, describe your nodes:
kubectl describe nodes NODE_NAME
The output is similar to the following:
...
Capacity:
...
nvidia.com/gpu: 3
Allocatable:
...
nvidia.com/gpu: 3
In this example output, the number of GPU resources on the node is 3
because
the value that was specified for max-shared-clients-per-gpu
was 3
and the
count
of physical GPUs to attach to the node was 1
. As another example, if
the count
of physical GPUs was 2
, the output would show 6
allocatable GPU
resources, three on each physical GPU.
Deploy workloads that use time-shared GPUs
As an application operator who is deploying GPU workloads, you can select
time-shared GPU nodes by specifying the appropriate node labels in a
nodeSelector
in your manifests. When planning your requests, review the
request limits
to ensure that GKE doesn't reject your deployments.
To deploy a workload to consume time-sharing GPUs, you need to do the following:
Add a
nodeSelector
to your Pod manifest for the following labels:cloud.google.com/gke-gpu-sharing-strategy: time-sharing
: selects nodes that use time-sharing GPUs.cloud.google.com/gke-max-shared-clients-per-gpu: "CLIENTS_PER_GPU"
: selects nodes that allow a specific number of containers to share the underlying GPU.
Add the
nvidia.com/gpu=1
GPU resource request to your container specification, inspec.containers.resources.limits
.
For example, the following steps show you how to deploy three Pods to a time-sharing GPU node pool. GKE allocates a time-shared GPU to each container. The containers print the UUID of the GPU that's attached to that container.
Save the following manifest as
gpu-timeshare.yaml
:apiVersion: apps/v1 kind: Deployment metadata: name: cuda-simple spec: replicas: 3 selector: matchLabels: app: cuda-simple template: metadata: labels: app: cuda-simple spec: nodeSelector: cloud.google.com/gke-gpu-sharing-strategy: time-sharing cloud.google.com/gke-max-shared-clients-per-gpu: "3" containers: - name: cuda-simple image: nvidia/cuda:11.0.3-base-ubi7 command: - bash - -c - | /usr/local/nvidia/bin/nvidia-smi -L; sleep 300 resources: limits: nvidia.com/gpu: 1
Apply the manifest:
kubectl apply -f gpu-timeshare.yaml
Check that all Pods are running:
kubectl get pods -l=app=cuda-simple
Check the logs for any Pod to view the UUID of the GPU:
kubectl logs POD_NAME
The output is similar to the following:
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-0771302b-eb3a-6756-7a23-0adcae8efd47)
If your nodes have one physical GPU attached, check the logs for any other Pod on the same node to verify that the GPU UUID is the same:
kubectl logs POD2_NAME
The output is similar to the following:
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-0771302b-eb3a-6756-7a23-0adcae8efd47)
Use time-sharing GPUs with multi-instance GPUs
As a platform administrator, you might want to combine multiple GKE GPU features. Time-sharing GPUs works with multi-instance GPUs, which partition a single physical GPU into up to seven slices. These partitions are isolated from each other. You can configure time-sharing GPUs for each multi-instance GPU partition.
For example, if you set the gpu-partition-size
to 1g.5gb
, the underlying GPU
would be split into seven partitions. If you also set max-shared-clients-per-gpu
to 3
,
each partition would support up to three containers, for a total of 21 time-shared
GPU devices available to allocate. To learn about how the gpu-partition-size
converts to actual partitions, refer to Multi-instance GPU partitions.
To create a time-shared, multi-instance GPU cluster, run the following command:
gcloud container node-pools create NODEPOOL_NAME \
--cluster=CLUSTER_NAME \
--machine-type=MACHINE_TYPE \
--region=COMPUTE_REGION \
--accelerator=type=nvidia-tesla-a100,count=GPU_QUANTITY,gpu-partition-size=PARTITION_SIZE,gpu-sharing-strategy=time-sharing,max-shared-clients-per-gpu=CLIENTS_PER_GPU
Replace PARTITION_SIZE
with the
multi-instance GPU partition size
that you want, such as 1g.5gb
.
What's next
- Learn more about Time-sharing GPUs on GKE.
- Learn more about GPUs.
- Learn more about Running multi-instance GPUs.
- For more information about compute preemption for the NVIDIA GPU, refer to the NVIDIA Pascal Tuning Guide.