Gerenciar cargas de trabalho de contêineres de GPU
Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
É possível ativar e gerenciar recursos de unidade de processamento gráfico (GPU) nos seus
contêineres. Por exemplo, talvez você prefira executar notebooks de inteligência artificial (IA)
e machine learning (ML) em um ambiente de GPU. O suporte a GPU
é ativado por padrão no dispositivo com isolamento físico do Google Distributed Cloud (GDC).
Antes de começar
Para implantar GPUs nos contêineres, você precisa ter o seguinte:
A função de administrador de namespace (namespace-admin) para implantar cargas de trabalho de GPU no
namespace do projeto.
O caminho do kubeconfig para o cluster do Kubernetes bare metal.
Faça login e gere
o arquivo kubeconfig se você não tiver um.
Configurar um contêiner para usar recursos de GPU
Para usar GPUs em um contêiner, siga estas etapas:
Confirme se os nós do cluster do Kubernetes oferecem suporte à alocação de recursos de GPU:
kubectldescribenodesNODE_NAME
Substitua NODE_NAME pelo nó que gerencia as GPUs
que você quer inspecionar.
A saída relevante é semelhante ao seguinte snippet:
Adicione os campos .containers.resources.requests e .containers.resources.limits à especificação do contêiner. Como o cluster do Kubernetes já está pré-configurado com máquinas de GPU, a configuração é a mesma para todas as cargas de trabalho:
CONTAINER_IMAGE: a imagem do contêiner para acessar
as máquinas de GPU. É preciso incluir o caminho e a versão do registro de contêiner da imagem, como REGISTRY_PATH/hello-app:1.0.
Os contêineres também exigem permissões extras para acessar GPUs. Para cada
contêiner que solicita GPUs, adicione as seguintes permissões à especificação do
contêiner:
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-09-04 UTC."],[],[],null,["# Manage GPU container workloads\n\nYou can enable and manage graphics processing unit (GPU) resources on your\ncontainers. For example, you might prefer running artificial intelligence (AI)\nand machine learning (ML) notebooks in a GPU environment. GPU support\nis enabled by default in Google Distributed Cloud (GDC) air-gapped appliance.\n\nBefore you begin\n----------------\n\nTo deploy GPUs to your containers, you must have the following:\n\n- The Namespace Admin role (`namespace-admin`) to deploy GPU workloads in\n your project namespace.\n\n- The kubeconfig path for the bare metal Kubernetes cluster.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/appliance/platform/pa-user/iam/sign-in#kubernetes-cluster-kubeconfig)\n the kubeconfig file if you don't have one.\n\nConfigure a container to use GPU resources\n------------------------------------------\n\nTo use GPUs in a container, complete the following steps:\n\n1. Confirm your Kubernetes cluster nodes support your GPU resource allocation:\n\n kubectl describe nodes \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e with the node managing the GPUs\n you want to inspect.\n\n The relevant output is similar to the following snippet: \n\n Capacity:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n Allocatable:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n\n2. Add the `.containers.resources.requests` and `.containers.resources.limits`\n fields to your container spec. Since your Kubernetes cluster is preconfigured\n with GPU machines, the configuration is the same for all workloads:\n\n ...\n containers:\n - name: \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-l devsite-syntax-l-Scalar devsite-syntax-l-Scalar-Plain\"\u003eCONTAINER_NAME\u003c/span\u003e\u003c/var\u003e\n image: \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-l devsite-syntax-l-Scalar devsite-syntax-l-Scalar-Plain\"\u003eCONTAINER_IMAGE\u003c/span\u003e\u003c/var\u003e\n resources:\n requests:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n limits:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n ...\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eCONTAINER_NAME\u003c/var\u003e: the name of the container.\n - \u003cvar translate=\"no\"\u003eCONTAINER_IMAGE\u003c/var\u003e: the container image to access the GPU machines. You must include the container registry path and version of the image, such as \u003cvar class=\"readonly\" translate=\"no\"\u003eREGISTRY_PATH\u003c/var\u003e`/hello-app:1.0`.\n3. Containers also require additional permissions to access GPUs. For each\n container that requests GPUs, add the following permissions to your\n container spec:\n\n ...\n securityContext:\n seLinuxOptions:\n type: unconfined_t\n ...\n\n4. Apply your container manifest file:\n\n kubectl apply -f \u003cvar translate=\"no\"\u003eCONTAINER_MANIFEST_FILE\u003c/var\u003e \\\n -n \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eCLUSTER_KUBECONFIG\u003c/var\u003e\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eCONTAINER_MANIFEST_FILE\u003c/var\u003e: the YAML file for your container workload custom resource.\n - \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e: the project namespace in which to deploy the container workloads.\n - \u003cvar translate=\"no\"\u003eCLUSTER_KUBECONFIG\u003c/var\u003e: the kubeconfig file for the bare metal Kubernetes cluster to which you're deploying container workloads.\n5. Verify that your pods are running and are using the GPUs:\n\n kubectl get pods -A | grep \u003cvar translate=\"no\"\u003eCONTAINER_NAME\u003c/var\u003e \\\n -n \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eCLUSTER_KUBECONFIG\u003c/var\u003e\n\n The relevant output is similar to the following snippet: \n\n Port: 80/TCP\n Host Port: 0/TCP\n State: Running\n Ready: True\n Restart Count: 0\n Limits:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n Requests:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1"]]