You can enable and manage graphics processing unit (GPU) resources on your containers. For example, you might prefer running artificial intelligence (AI) and machine learning (ML) notebooks in a GPU environment. GPU support is enabled by default in Google Distributed Cloud (GDC) air-gapped appliance.
Before you begin
To deploy GPUs to your containers, you must have the following:
The Namespace Admin role (
namespace-admin
) to deploy GPU workloads in your project namespace.The kubeconfig path for the bare metal Kubernetes cluster. Sign in and generate the kubeconfig file if you don't have one.
Configure a container to use GPU resources
To use GPUs in a container, complete the following steps:
Confirm your Kubernetes cluster nodes support your GPU resource allocation:
kubectl describe nodes NODE_NAME
Replace
NODE_NAME
with the node managing the GPUs you want to inspect.The relevant output is similar to the following snippet:
Capacity: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1 Allocatable: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1
Add the
.containers.resources.requests
and.containers.resources.limits
fields to your container spec. Since your Kubernetes cluster is preconfigured with GPU machines, the configuration is the same for all workloads:... containers: - name: CONTAINER_NAME image: CONTAINER_IMAGE resources: requests: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1 limits: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1 ...
Replace the following:
CONTAINER_NAME
: the name of the container.CONTAINER_IMAGE
: the container image to access the GPU machines. You must include the container registry path and version of the image, such asREGISTRY_PATH/hello-app:1.0
.
Containers also require additional permissions to access GPUs. For each container that requests GPUs, add the following permissions to your container spec:
... securityContext: seLinuxOptions: type: unconfined_t ...
Apply your container manifest file:
kubectl apply -f CONTAINER_MANIFEST_FILE \ -n NAMESPACE \ --kubeconfig CLUSTER_KUBECONFIG
Replace the following:
CONTAINER_MANIFEST_FILE
: the YAML file for your container workload custom resource.NAMESPACE
: the project namespace in which to deploy the container workloads.CLUSTER_KUBECONFIG
: the kubeconfig file for the bare metal Kubernetes cluster to which you're deploying container workloads.
Verify that your pods are running and are using the GPUs:
kubectl get pods -A | grep CONTAINER_NAME \ -n NAMESPACE \ --kubeconfig CLUSTER_KUBECONFIG
The relevant output is similar to the following snippet:
Port: 80/TCP Host Port: 0/TCP State: Running Ready: True Restart Count: 0 Limits: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1 Requests: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1