Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Anda dapat mengaktifkan dan mengelola resource unit pemrosesan grafis (GPU) di
container Anda. Misalnya, Anda mungkin lebih suka menjalankan notebook kecerdasan buatan (AI) dan machine learning (ML) di lingkungan GPU. Dukungan GPU
diaktifkan secara default di perangkat Google Distributed Cloud (GDC) dengan air gap.
Sebelum memulai
Untuk men-deploy GPU ke container, Anda harus memiliki hal berikut:
Peran Namespace Admin (namespace-admin) untuk men-deploy beban kerja GPU di namespace project Anda.
Jalur kubeconfig untuk cluster Kubernetes bare metal.
Login dan buat
file kubeconfig jika Anda belum memilikinya.
Mengonfigurasi container untuk menggunakan resource GPU
Untuk menggunakan GPU dalam container, selesaikan langkah-langkah berikut:
Pastikan node cluster Kubernetes Anda mendukung alokasi resource GPU Anda:
kubectldescribenodesNODE_NAME
Ganti NODE_NAME dengan node yang mengelola GPU yang ingin Anda periksa.
Output yang relevan mirip dengan cuplikan berikut:
Tambahkan kolom .containers.resources.requests dan .containers.resources.limits
ke spesifikasi container Anda. Karena cluster Kubernetes Anda telah dikonfigurasi sebelumnya
dengan mesin GPU, konfigurasinya sama untuk semua beban kerja:
CONTAINER_IMAGE: image container untuk mengakses
mesin GPU. Anda harus menyertakan jalur dan versi registry container
image, seperti REGISTRY_PATH/hello-app:1.0.
Kontainer juga memerlukan izin tambahan untuk mengakses GPU. Untuk setiap
container yang meminta GPU, tambahkan izin berikut ke spesifikasi
container Anda:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[],[],null,["# Manage GPU container workloads\n\nYou can enable and manage graphics processing unit (GPU) resources on your\ncontainers. For example, you might prefer running artificial intelligence (AI)\nand machine learning (ML) notebooks in a GPU environment. GPU support\nis enabled by default in Google Distributed Cloud (GDC) air-gapped appliance.\n\nBefore you begin\n----------------\n\nTo deploy GPUs to your containers, you must have the following:\n\n- The Namespace Admin role (`namespace-admin`) to deploy GPU workloads in\n your project namespace.\n\n- The kubeconfig path for the bare metal Kubernetes cluster.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/appliance/platform/pa-user/iam/sign-in#kubernetes-cluster-kubeconfig)\n the kubeconfig file if you don't have one.\n\nConfigure a container to use GPU resources\n------------------------------------------\n\nTo use GPUs in a container, complete the following steps:\n\n1. Confirm your Kubernetes cluster nodes support your GPU resource allocation:\n\n kubectl describe nodes \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e with the node managing the GPUs\n you want to inspect.\n\n The relevant output is similar to the following snippet: \n\n Capacity:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n Allocatable:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n\n2. Add the `.containers.resources.requests` and `.containers.resources.limits`\n fields to your container spec. Since your Kubernetes cluster is preconfigured\n with GPU machines, the configuration is the same for all workloads:\n\n ...\n containers:\n - name: \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-l devsite-syntax-l-Scalar devsite-syntax-l-Scalar-Plain\"\u003eCONTAINER_NAME\u003c/span\u003e\u003c/var\u003e\n image: \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-l devsite-syntax-l-Scalar devsite-syntax-l-Scalar-Plain\"\u003eCONTAINER_IMAGE\u003c/span\u003e\u003c/var\u003e\n resources:\n requests:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n limits:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n ...\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eCONTAINER_NAME\u003c/var\u003e: the name of the container.\n - \u003cvar translate=\"no\"\u003eCONTAINER_IMAGE\u003c/var\u003e: the container image to access the GPU machines. You must include the container registry path and version of the image, such as \u003cvar class=\"readonly\" translate=\"no\"\u003eREGISTRY_PATH\u003c/var\u003e`/hello-app:1.0`.\n3. Containers also require additional permissions to access GPUs. For each\n container that requests GPUs, add the following permissions to your\n container spec:\n\n ...\n securityContext:\n seLinuxOptions:\n type: unconfined_t\n ...\n\n4. Apply your container manifest file:\n\n kubectl apply -f \u003cvar translate=\"no\"\u003eCONTAINER_MANIFEST_FILE\u003c/var\u003e \\\n -n \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eCLUSTER_KUBECONFIG\u003c/var\u003e\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eCONTAINER_MANIFEST_FILE\u003c/var\u003e: the YAML file for your container workload custom resource.\n - \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e: the project namespace in which to deploy the container workloads.\n - \u003cvar translate=\"no\"\u003eCLUSTER_KUBECONFIG\u003c/var\u003e: the kubeconfig file for the bare metal Kubernetes cluster to which you're deploying container workloads.\n5. Verify that your pods are running and are using the GPUs:\n\n kubectl get pods -A | grep \u003cvar translate=\"no\"\u003eCONTAINER_NAME\u003c/var\u003e \\\n -n \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eCLUSTER_KUBECONFIG\u003c/var\u003e\n\n The relevant output is similar to the following snippet: \n\n Port: 80/TCP\n Host Port: 0/TCP\n State: Running\n Ready: True\n Restart Count: 0\n Limits:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1\n Requests:\n nvidia.com/gpu-pod-NVIDIA_A100_80GB_PCIE: 1"]]