Manage GPU container workloads

You can enable and manage graphics processing unit (GPU) resources on your containers. For example, you might prefer running artificial intelligence (AI) and machine learning (ML) notebooks in a GPU environment. GPU support is enabled by default in Google Distributed Cloud (GDC) air-gapped appliance.

Before you begin

To deploy GPUs to your containers, you must have the following:

The Namespace Admin role (namespace-admin) to deploy GPU workloads in your project namespace.
The kubeconfig path for the bare metal Kubernetes cluster. Sign in and generate the kubeconfig file if you don't have one.

Configure a container to use GPU resources

To use GPUs in a container, complete the following steps:

Confirm your Kubernetes cluster nodes support your GPU resource allocation:
```
kubectl describe nodes NODE_NAME
```
Replace NODE_NAME with the node managing the GPUs you want to inspect.

The relevant output is similar to the following snippet:
```
Capacity:
  nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1
Allocatable:
  nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1
```
Add the .containers.resources.requests and .containers.resources.limits fields to your container spec. Since your Kubernetes cluster is preconfigured with GPU machines, the configuration is the same for all workloads:
```
 ...
 containers:
 - name: CONTAINER_NAME
   image: CONTAINER_IMAGE
   resources:
     requests:
       nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1
     limits:
       nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1
 ...
```
Replace the following:
- CONTAINER_NAME: the name of the container.
- CONTAINER_IMAGE: the container image to access the GPU machines. You must include the container registry path and version of the image, such as REGISTRY_PATH/hello-app:1.0.
Containers also require additional permissions to access GPUs. For each container that requests GPUs, add the following permissions to your container spec:
```
...
securityContext:
 seLinuxOptions:
   type: unconfined_t
...
```
Apply your container manifest file:
```
kubectl apply -f CONTAINER_MANIFEST_FILE \
    -n NAMESPACE \
    --kubeconfig CLUSTER_KUBECONFIG
```
Replace the following:
- CONTAINER_MANIFEST_FILE: the YAML file for your container workload custom resource.
- NAMESPACE: the project namespace in which to deploy the container workloads.
- CLUSTER_KUBECONFIG: the kubeconfig file for the bare metal Kubernetes cluster to which you're deploying container workloads.

Verify that your pods are running and are using the GPUs:

kubectl get pods -A | grep CONTAINER_NAME \
    -n NAMESPACE \
    --kubeconfig CLUSTER_KUBECONFIG

The relevant output is similar to the following snippet:

Port:           80/TCP
Host Port:      0/TCP
State:          Running
Ready:          True
Restart Count:  0
Limits:
  nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE:  1
Requests:
  nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE:  1