This page explains how to set up your Google Kubernetes Engine (GKE) infrastructure to support dynamic resource allocation (DRA). On this page, you'll create clusters that can deploy GPU or TPU workloads, and manually install the drivers that you need to enable DRA.
This page is intended for platform administrators who want to reduce the complexity and overhead of setting up infrastructure with specialized hardware devices.
About DRA
DRA is a built-in Kubernetes feature that lets you flexibly request, allocate, and share hardware in your cluster among Pods and containers. For more information, see About dynamic resource allocation.
Limitations
- Node auto-provisioning isn't supported.
- Autopilot clusters don't support DRA.
- Automatic GPU driver installation isn't supported with DRA.
- You can't use the following GPU sharing features:
- Time-sharing GPUs
- Multi-instance GPUs
- Multi-process Service (MPS)
Requirements
To use DRA, your GKE version must be version 1.32.1-gke.1489001 or later.
You should also be familiar with the following requirements and limitations, depending on the type of hardware that you want to use:
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
If you're not using the Cloud Shell, install the Helm CLI:
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 chmod 700 get_helm.sh ./get_helm.sh
Create a GKE Standard cluster
Create a Standard mode cluster that enables the Kubernetes beta APIs for DRA:
gcloud container clusters create CLUSTER_NAME \
--enable-kubernetes-unstable-apis="resource.k8s.io/v1beta1/deviceclasses,resource.k8s.io/v1beta1/resourceclaims,resource.k8s.io/v1beta1/resourceclaimtemplates,resource.k8s.io/v1beta1/resourceslices" \
--cluster-version=GKE_VERSION
Replace the following:
CLUSTER_NAME
: a name for your cluster.GKE_VERSION
: the GKE version to use for the cluster and nodes. Must be 1.32.1-gke.1489001 or later.
Create a GKE node pool with GPUs or TPUs
On GKE, you can use DRA with both GPUs and TPUs. The node pool configuration settings—such as machine type, accelerator type, count, node operating system, and node locations—depend on your requirements.
GPU
To use DRA for GPUs, you must do the following when you create the node pool:
- Disable automatic GPU driver installation with
gpu-driver-version=disabled
. - Disable GPU device plugin by adding the
gke-no-default-nvidia-gpu-device-plugin=true
node label. - Let the DRA Driver DaemonSet run on the nodes by adding the
nvidia.com/gpu.present=true
node label.
To create a GPU node pool for DRA, follow these steps:
Create a node pool with the required hardware. The following example creates a node pool that has g2-standard-24 instances on Container-Optimized OS with two L4 GPUs.
gcloud container node-pools create NODEPOOL_NAME \ --cluster=CLUSTER_NAME \ --machine-type "g2-standard-24" \ --accelerator "type=nvidia-l4,count=2,gpu-driver-version=disabled" \ --num-nodes "1" \ --node-labels=gke-no-default-nvidia-gpu-device-plugin=true,nvidia.com/gpu.present=true
Replace the following:
NODEPOOL_NAME
: the name for your node pool.CLUSTER_NAME
: the name of your cluster.
Manually install the drivers on your Container-Optimized OS or Ubuntu nodes. For detailed instructions, refer to Manually install NVIDIA GPU drivers.
TPU
To use DRA for TPUs, you must disable TPU device plugin by adding the gke-no-default-tpu-device-plugin=true
node label.
Create a node pool that uses TPUs. The following example creates a TPU Trillium node pool:
gcloud container node-pools create NODEPOOL_NAME \
--cluster CLUSTER_NAME --num-nodes 1 \
--node-labels "gke-no-default-tpu-device-plugin=true,gke-no-default-tpu-dra-plugin=true" \
--machine-type=ct6e-standard-8t
Replace the following:
NODEPOOL_NAME
: the name for your node pool.CLUSTER_NAME
: the name of your cluster.
Install DRA drivers
GPU
Pull and update the Helm chart that contains the NVIDIA DRA driver:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update
Install the NVIDIA DRA driver with version 25.3.0-rc.4:
helm install nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu --version="25.3.0-rc.4" --create-namespace --namespace nvidia-dra-driver-gpu \ --set nvidiaDriverRoot="/home/kubernetes/bin/nvidia/" \ --set gpuResourcesEnabledOverride=true \ --set resources.computeDomains.enabled=false \ --set kubeletPlugin.priorityClassName="" \ --set kubeletPlugin.tolerations[0].key=nvidia.com/gpu \ --set kubeletPlugin.tolerations[0].operator=Exists \ --set kubeletPlugin.tolerations[0].effect=NoSchedule
For Ubuntu nodes, use the
nvidiaDriverRoot="/opt/nvidia"
directory path.
TPU
You can install DRA drivers for TPUs with the provided Helm chart. To get access to the Helm charts, complete the following steps:
Clone the
ai-on-gke
repository to access the Helm charts that contain the DRA drivers for GPUs and TPUs:git clone https://github.com/ai-on-gke/common-infra.git
Navigate to the directory that contains the charts:
cd common-infra/common/charts
Install the TPU DRA driver:
./tpu-dra-driver/install-tpu-dra-driver.sh
Verify that your infrastructure is ready for DRA
Verify that the DRA driver Pod is running.
GPU
kubectl get pods -n nvidia-dra-driver-gpu
NAME READY STATUS RESTARTS AGE
nvidia-dra-driver-gpu-kubelet-plugin-52cdm 1/1 Running 0 46s
TPU
kubectl get pods -n tpu-dra-driver
NAME READY STATUS RESTARTS AGE
tpu-dra-driver-kubeletplugin-h6m57 1/1 Running 0 30s
Confirm that the ResourceSlice
lists the hardware devices that you added:
kubectl get resourceslices -o yaml
If you used the example in the previous section, the ResourceSlice
resembles the following,
depending on the type of hardware you used:
GPU
The following example creates a g2-standard-24 machine with two L4 GPUs.
apiVersion: v1
items:
- apiVersion: resource.k8s.io/v1beta1
kind: ResourceSlice
metadata:
# lines omitted for clarity
spec:
devices:
- basic:
attributes:
architecture:
string: Ada Lovelace
brand:
string: Nvidia
cudaComputeCapability:
version: 8.9.0
cudaDriverVersion:
version: 12.9.0
driverVersion:
version: 575.57.8
index:
int: 0
minor:
int: 0
productName:
string: NVIDIA L4
type:
string: gpu
uuid:
string: GPU-4d403095-4294-6ddd-66fd-cfe5778ef56e
capacity:
memory:
value: 23034Mi
name: gpu-0
- basic:
attributes:
architecture:
string: Ada Lovelace
brand:
string: Nvidia
cudaComputeCapability:
version: 8.9.0
cudaDriverVersion:
version: 12.9.0
driverVersion:
version: 575.57.8
index:
int: 1
minor:
int: 1
productName:
string: NVIDIA L4
type:
string: gpu
uuid:
string: GPU-cc326645-f91d-d013-1c2f-486827c58e50
capacity:
memory:
value: 23034Mi
name: gpu-1
driver: gpu.nvidia.com
nodeName: gke-cluster-gpu-pool-9b10ff37-mf70
pool:
generation: 1
name: gke-cluster-gpu-pool-9b10ff37-mf70
resourceSliceCount: 1
kind: List
metadata:
resourceVersion: ""
TPU
apiVersion: v1
items:
- apiVersion: resource.k8s.io/v1beta1
kind: ResourceSlice
metadata:
# lines omitted for clarity
spec:
devices:
- basic:
attributes:
index:
int: 0
tpuGen:
string: v6e
uuid:
string: tpu-54de4859-dd8d-f67e-6f91-cf904d965454
name: "0"
- basic:
attributes:
index:
int: 1
tpuGen:
string: v6e
uuid:
string: tpu-54de4859-dd8d-f67e-6f91-cf904d965454
name: "1"
- basic:
attributes:
index:
int: 2
tpuGen:
string: v6e
uuid:
string: tpu-54de4859-dd8d-f67e-6f91-cf904d965454
name: "2"
- basic:
attributes:
index:
int: 3
tpuGen:
string: v6e
uuid:
string: tpu-54de4859-dd8d-f67e-6f91-cf904d965454
name: "3"
driver: tpu.google.com
nodeName: gke-tpu-b4d4b61b-fwbg
pool:
generation: 1
name: gke-tpu-b4d4b61b-fwbg
resourceSliceCount: 1
kind: List
metadata:
resourceVersion: ""