컨테이너에서 그래픽 처리 장치 (GPU) 리소스를 사용 설정하고 관리할 수 있습니다. 예를 들어 GPU 환경에서 인공지능 (AI) 및 머신러닝 (ML) 노트북을 실행하는 것이 좋습니다. GPU 컨테이너 워크로드를 실행하려면 GPU 기기를 지원하는 Kubernetes 클러스터가 있어야 합니다. GPU 지원은 GPU 머신이 프로비저닝된 Kubernetes 클러스터에 기본적으로 사용 설정됩니다.
시작하기 전에
컨테이너에 GPU를 배포하려면 다음이 필요합니다.
GPU 머신 클래스가 있는 Kubernetes 클러스터 지원되는 GPU 카드 섹션에서 클러스터 머신에 구성할 수 있는 옵션을 확인하세요.
GPU를 확인하는 사용자 클러스터 노드 뷰어 역할 (user-cluster-node-viewer)과 프로젝트 네임스페이스에 GPU 워크로드를 배포하는 네임스페이스 관리자 역할 (namespace-admin)
Kubernetes 클러스터를 호스팅하는 영역 관리 API 서버의 kubeconfig 경로입니다.
kubeconfig 파일이 없는 경우 로그인하여 생성합니다.
GPU를 호스팅할 영역의 조직 인프라 클러스터에 대한 kubeconfig 경로입니다.
kubeconfig 파일이 없는 경우 로그인하여 생성합니다.
Kubernetes 클러스터 이름입니다. 이 정보가 없는 경우 플랫폼 관리자에게 문의하세요.
Kubernetes 클러스터 kubeconfig 경로입니다.
kubeconfig 파일이 없는 경우 로그인하여 생성합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eGPU support is enabled by default for Kubernetes clusters that have GPU machines provisioned, making it suitable for running workloads like AI and ML notebooks.\u003c/p\u003e\n"],["\u003cp\u003eDeploying GPUs to containers requires a Kubernetes cluster with a GPU machine class, along with specific roles such as User Cluster Node Viewer and Namespace Admin.\u003c/p\u003e\n"],["\u003cp\u003eTo configure a container for GPU use, users must verify that their Kubernetes cluster supports GPUs and add requests and limits fields to the container specification.\u003c/p\u003e\n"],["\u003cp\u003eEach container requiring GPU access must also include specific security permissions in their specification, ensuring they can properly interact with the GPU resources.\u003c/p\u003e\n"],["\u003cp\u003eUsers can check their GPU resource allocation by running a command, which will output information on GPU capacity and the resource names needed for configuration.\u003c/p\u003e\n"]]],[],null,["# Manage GPU container workloads\n\nYou can enable and manage graphics processing unit (GPU) resources on your\ncontainers. For example, you might prefer running artificial intelligence (AI)\nand machine learning (ML) notebooks in a GPU environment. To run GPU container\nworkloads, you must have a Kubernetes cluster that supports GPU devices. GPU support\nis enabled by default for Kubernetes clusters that have GPU machines provisioned for\nthem.\n\nBefore you begin\n----------------\n\nTo deploy GPUs to your containers, you must have the following:\n\n- A Kubernetes cluster with a GPU machine class. Check the\n [supported GPU cards](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/create-user-cluster#supported-gpu-cards)\n section for options on what you can configure for your cluster machines.\n\n- The User Cluster Node Viewer role (`user-cluster-node-viewer`) to check GPUs,\n and the Namespace Admin role (`namespace-admin`) to deploy GPU workloads in\n your project namespace.\n\n- The kubeconfig path for the zonal management API server that hosts your\n Kubernetes cluster.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\n- The kubeconfig path for the org infrastructure cluster in the zone intended to\n host your GPUs.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\n- The Kubernetes cluster name. Ask your Platform Administrator for this information if\n you don't have it.\n\n- The Kubernetes cluster kubeconfig path.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\nConfigure a container to use GPU resources\n------------------------------------------\n\nTo use these GPUs in a container, complete the following steps:\n\n1. Verify your Kubernetes cluster has node pools that support GPUs:\n\n kubectl describe nodepoolclaims -n \u003cvar translate=\"no\"\u003eKUBERNETES_CLUSTER_NAME\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eORG_INFRASTRUCTURE_CLUSTER\u003c/var\u003e\n\n The relevant output is similar to the following snippet: \n\n Spec:\n Machine Class Name: a2-ultragpu-1g-gdc\n Node Count: 2\n\n For a full list of supported GPU machine types and Multi-Instance GPU (MIG)\n profiles, see\n [Cluster node machine types](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/cluster-node-machines).\n2. Add the `.containers.resources.requests` and `.containers.resources.limits`\n fields to your container spec. Each resource name is different depending on\n your machine class.\n [Check your GPU resource allocation](#check-gpu-resource-allocation) to find\n your GPU resource names.\n\n For example, the following container spec requests three partitions of a GPU\n from an `a2-ultragpu-1g-gdc` node: \n\n ...\n containers:\n - name: my-container\n image: \"my-image\"\n resources:\n requests:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 3\n limits:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 3\n ...\n\n | **Note:** You can request a maximum of seven GPU partitions per pod.\n3. Containers also require additional permissions to access GPUs. For each\n container that requests GPUs, add the following permissions to your\n container spec:\n\n ...\n securityContext:\n seLinuxOptions:\n type: unconfined_t\n ...\n\n4. Apply your container manifest file:\n\n kubectl apply -f \u003cvar translate=\"no\"\u003eCONTAINER_MANIFEST_FILE\u003c/var\u003e \\\n -n \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eKUBERNETES_CLUSTER_KUBECONFIG\u003c/var\u003e\n\nCheck GPU resource allocation\n-----------------------------\n\n- To check your GPU resource allocation, use the following command:\n\n kubectl describe nodes \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e with the node managing the GPUs\n you want to inspect.\n\n The relevant output is similar to the following snippet: \n\n Capacity:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 7\n Allocatable:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 7\n\nNote the resource names for your GPUs; you must specify them when configuring\na container to use GPU resources."]]