You can enable and manage graphics processing unit (GPU) resources on your containers. For example, you might prefer running artificial intelligence (AI) and machine learning (ML) notebooks in a GPU environment. GPU support is enabled by default in Google Distributed Cloud (GDC) air-gapped appliance.
Before you begin
To deploy GPUs to your containers, you must have the following:
The Namespace Admin role (
namespace-admin) to deploy GPU workloads in your project namespace.The kubeconfig path for the bare metal Kubernetes cluster. Sign in and generate the kubeconfig file if you don't have one.
Configure a container to use GPU resources
To use GPUs in a container, complete the following steps:
Confirm your Kubernetes cluster nodes support your GPU resource allocation:
kubectl describe nodes NODE_NAMEReplace
NODE_NAMEwith the node managing the GPUs you want to inspect.The relevant output is similar to the following snippet:
Capacity: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1 Allocatable: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1Add the
.containers.resources.requestsand.containers.resources.limitsfields to your container spec. Since your Kubernetes cluster is preconfigured with GPU machines, the configuration is the same for all workloads:... containers: - name: CONTAINER_NAME image: CONTAINER_IMAGE resources: requests: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1 limits: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1 ...Replace the following:
CONTAINER_NAME: the name of the container.CONTAINER_IMAGE: the container image to access the GPU machines. You must include the container registry path and version of the image, such asREGISTRY_PATH/hello-app:1.0.
Containers also require additional permissions to access GPUs. For each container that requests GPUs, add the following permissions to your container spec:
... securityContext: seLinuxOptions: type: unconfined_t ...Apply your container manifest file:
kubectl apply -f CONTAINER_MANIFEST_FILE \ -n NAMESPACE \ --kubeconfig CLUSTER_KUBECONFIGReplace the following:
CONTAINER_MANIFEST_FILE: the YAML file for your container workload custom resource.NAMESPACE: the project namespace in which to deploy the container workloads.CLUSTER_KUBECONFIG: the kubeconfig file for the bare metal Kubernetes cluster to which you're deploying container workloads.
Verify that your pods are running and are using the GPUs:
kubectl get pods -A | grep CONTAINER_NAME \ -n NAMESPACE \ --kubeconfig CLUSTER_KUBECONFIGThe relevant output is similar to the following snippet:
Port: 80/TCP Host Port: 0/TCP State: Running Ready: True Restart Count: 0 Limits: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1 Requests: nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1