Mantieni tutto organizzato con le raccolte
Salva e classifica i contenuti in base alle tue preferenze.
Puoi attivare e gestire le risorse dell'unità di elaborazione grafica (GPU) sui tuoi
container. Ad esempio, potresti preferire eseguire notebook di intelligenza artificiale (AI)
e machine learning (ML) in un ambiente GPU. Per eseguire carichi di lavoro
dei container GPU, devi disporre di un cluster Kubernetes che supporti i dispositivi GPU. Il supporto GPU
è abilitato per impostazione predefinita per i cluster Kubernetes per cui sono state sottoposte a provisioning macchine GPU.
Prima di iniziare
Per eseguire il deployment delle GPU nei container, devi disporre di quanto segue:
Un cluster Kubernetes con una classe di macchine GPU. Consulta la sezione
Schede GPU supportate
per scoprire le opzioni che puoi configurare per le macchine del cluster.
Il ruolo Visualizzatore nodi cluster utente (user-cluster-node-viewer) per controllare le GPU e il ruolo Amministratore spazio dei nomi (namespace-admin) per eseguire il deployment dei carichi di lavoro GPU nello spazio dei nomi del progetto.
Il percorso kubeconfig per il server API di gestione zonale che ospita il tuo
cluster Kubernetes.
Accedi e genera il file kubeconfig se non ne hai uno.
Il percorso kubeconfig per il cluster di infrastruttura dell'organizzazione nella zona destinata a
ospitare le GPU.
Accedi e genera il file kubeconfig se non ne hai uno.
Il nome del cluster Kubernetes. Se non le hai, chiedi queste informazioni all'amministratore della piattaforma.
Il percorso kubeconfig del cluster Kubernetes.
Accedi e genera il file kubeconfig se non ne hai uno.
Configura un container per utilizzare le risorse GPU
Per utilizzare queste GPU in un container, completa i seguenti passaggi:
Verifica che il cluster Kubernetes disponga di node pool che supportano le GPU:
Aggiungi i campi .containers.resources.requests e .containers.resources.limits
alla specifica del contenitore. Ogni nome di risorsa è diverso a seconda
della classe della macchina.
Controlla l'allocazione delle risorse GPU per trovare
i nomi delle risorse GPU.
Ad esempio, la seguente specifica del container richiede tre partizioni di una GPU
da un nodo a2-ultragpu-1g-gdc:
I container richiedono anche autorizzazioni aggiuntive per accedere alle GPU. Per ogni
container che richiede GPU, aggiungi le seguenti autorizzazioni alla specifica del container:
[[["Facile da capire","easyToUnderstand","thumb-up"],["Il problema è stato risolto","solvedMyProblem","thumb-up"],["Altra","otherUp","thumb-up"]],[["Difficile da capire","hardToUnderstand","thumb-down"],["Informazioni o codice di esempio errati","incorrectInformationOrSampleCode","thumb-down"],["Mancano le informazioni o gli esempi di cui ho bisogno","missingTheInformationSamplesINeed","thumb-down"],["Problema di traduzione","translationIssue","thumb-down"],["Altra","otherDown","thumb-down"]],["Ultimo aggiornamento 2025-09-04 UTC."],[[["\u003cp\u003eGPU support is enabled by default for Kubernetes clusters that have GPU machines provisioned, making it suitable for running workloads like AI and ML notebooks.\u003c/p\u003e\n"],["\u003cp\u003eDeploying GPUs to containers requires a Kubernetes cluster with a GPU machine class, along with specific roles such as User Cluster Node Viewer and Namespace Admin.\u003c/p\u003e\n"],["\u003cp\u003eTo configure a container for GPU use, users must verify that their Kubernetes cluster supports GPUs and add requests and limits fields to the container specification.\u003c/p\u003e\n"],["\u003cp\u003eEach container requiring GPU access must also include specific security permissions in their specification, ensuring they can properly interact with the GPU resources.\u003c/p\u003e\n"],["\u003cp\u003eUsers can check their GPU resource allocation by running a command, which will output information on GPU capacity and the resource names needed for configuration.\u003c/p\u003e\n"]]],[],null,["# Manage GPU container workloads\n\nYou can enable and manage graphics processing unit (GPU) resources on your\ncontainers. For example, you might prefer running artificial intelligence (AI)\nand machine learning (ML) notebooks in a GPU environment. To run GPU container\nworkloads, you must have a Kubernetes cluster that supports GPU devices. GPU support\nis enabled by default for Kubernetes clusters that have GPU machines provisioned for\nthem.\n\nBefore you begin\n----------------\n\nTo deploy GPUs to your containers, you must have the following:\n\n- A Kubernetes cluster with a GPU machine class. Check the\n [supported GPU cards](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/create-user-cluster#supported-gpu-cards)\n section for options on what you can configure for your cluster machines.\n\n- The User Cluster Node Viewer role (`user-cluster-node-viewer`) to check GPUs,\n and the Namespace Admin role (`namespace-admin`) to deploy GPU workloads in\n your project namespace.\n\n- The kubeconfig path for the zonal management API server that hosts your\n Kubernetes cluster.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\n- The kubeconfig path for the org infrastructure cluster in the zone intended to\n host your GPUs.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\n- The Kubernetes cluster name. Ask your Platform Administrator for this information if\n you don't have it.\n\n- The Kubernetes cluster kubeconfig path.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\nConfigure a container to use GPU resources\n------------------------------------------\n\nTo use these GPUs in a container, complete the following steps:\n\n1. Verify your Kubernetes cluster has node pools that support GPUs:\n\n kubectl describe nodepoolclaims -n \u003cvar translate=\"no\"\u003eKUBERNETES_CLUSTER_NAME\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eORG_INFRASTRUCTURE_CLUSTER\u003c/var\u003e\n\n The relevant output is similar to the following snippet: \n\n Spec:\n Machine Class Name: a2-ultragpu-1g-gdc\n Node Count: 2\n\n For a full list of supported GPU machine types and Multi-Instance GPU (MIG)\n profiles, see\n [Cluster node machine types](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/cluster-node-machines).\n2. Add the `.containers.resources.requests` and `.containers.resources.limits`\n fields to your container spec. Each resource name is different depending on\n your machine class.\n [Check your GPU resource allocation](#check-gpu-resource-allocation) to find\n your GPU resource names.\n\n For example, the following container spec requests three partitions of a GPU\n from an `a2-ultragpu-1g-gdc` node: \n\n ...\n containers:\n - name: my-container\n image: \"my-image\"\n resources:\n requests:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 3\n limits:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 3\n ...\n\n | **Note:** You can request a maximum of seven GPU partitions per pod.\n3. Containers also require additional permissions to access GPUs. For each\n container that requests GPUs, add the following permissions to your\n container spec:\n\n ...\n securityContext:\n seLinuxOptions:\n type: unconfined_t\n ...\n\n4. Apply your container manifest file:\n\n kubectl apply -f \u003cvar translate=\"no\"\u003eCONTAINER_MANIFEST_FILE\u003c/var\u003e \\\n -n \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eKUBERNETES_CLUSTER_KUBECONFIG\u003c/var\u003e\n\nCheck GPU resource allocation\n-----------------------------\n\n- To check your GPU resource allocation, use the following command:\n\n kubectl describe nodes \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e with the node managing the GPUs\n you want to inspect.\n\n The relevant output is similar to the following snippet: \n\n Capacity:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 7\n Allocatable:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 7\n\nNote the resource names for your GPUs; you must specify them when configuring\na container to use GPU resources."]]