이 페이지는 Cloud Translation API를 통해 번역되었습니다.

DRA를 사용하여 워크로드에 기기 동적으로 할당

표준

이 페이지에서는 Google Kubernetes Engine 클러스터에 동적 리소스 할당 (DRA) 워크로드를 배포하는 방법을 설명합니다. ResourceClaimTemplate을 만들어 DRA로 하드웨어를 요청한 다음 기본 워크로드를 배포하여 Kubernetes가 포드에 하드웨어를 유연하게 할당하는 방법을 보여줍니다.

이 페이지는 AI/ML 또는 고성능 컴퓨팅 (HPC)과 같은 워크로드를 실행하는 애플리케이션 운영자 및 데이터 엔지니어를 대상으로 합니다.

동적 리소스 할당 정보

DRA는 클러스터 내에서 포드와 컨테이너 간에 하드웨어를 유연하게 요청하고 할당하며 공유할 수 있도록 해주는 Kubernetes의 기본 제공 기능입니다. 자세한 내용은 동적 리소스 할당 정보를 참조하세요.

DRA로 기기 요청 정보

DRA용 GKE 인프라를 설정하면 노드의 DRA 드라이버가 클러스터에 DeviceClass 객체를 만듭니다. DeviceClass는 워크로드에 요청할 수 있는 기기 카테고리(예: GPU)를 정의합니다. 플랫폼 관리자는 특정 워크로드에서 요청할 수 있는 기기를 제한하는 추가 DeviceClass를 선택적으로 배포할 수 있습니다.

DeviceClass 내에서 기기를 요청하려면 다음 객체 중 하나를 만듭니다.

ResourceClaim: ResourceClaim을 사용하면 포드 또는 사용자가 DeviceClass 내에서 특정 매개변수를 필터링하여 하드웨어 리소스를 요청할 수 있습니다.
ResourceClaimTemplate: ResourceClaimTemplate은 포드가 새 포드별 ResourceClaim을 자동으로 만드는 데 사용할 수 있는 템플릿을 정의합니다.

ResourceClaim 및 ResourceClaimTemplate 객체에 관한 자세한 내용은 ResourceClaim 및 ResourceClaimTemplate 사용 시기를 참고하세요.

이 페이지의 예에서는 기본 ResourceClaimTemplate를 사용하여 지정된 기기 구성을 요청합니다. 자세한 내용은 ResourceClaimTemplateSpec Kubernetes 문서를 참고하세요.

제한사항

노드 자동 프로비저닝은 지원되지 않습니다.
Autopilot 클러스터에서는 DRA를 지원하지 않습니다.
다음 GPU 공유 기능은 사용할 수 없습니다.
- 시간 공유 GPU
- 멀티 인스턴스 GPU
- 멀티 프로세스 서비스(MPS)

요구사항

DRA를 사용하려면 GKE 버전이 1.32.1-gke.1489001 이상이어야 합니다.

또한 다음 요구사항과 제한사항을 숙지해야 합니다.

시작하기 전에

시작하기 전에 다음 태스크를 수행했는지 확인합니다.

Google Kubernetes Engine API를 사용 설정합니다.

Google Kubernetes Engine API 사용 설정

이 태스크에 Google Cloud CLI를 사용하려면 gcloud CLI를 설치한 후 초기화합니다. 이전에 gcloud CLI를 설치한 경우 gcloud components update 명령어를 실행하여 최신 버전을 가져옵니다. 이전 gcloud CLI 버전에서는 이 문서의 명령어를 실행하지 못할 수 있습니다.
참고: 기존 gcloud CLI 설치의 경우 compute/region 속성을 설정해야 합니다. 주로 영역 클러스터를 사용하는 경우에는 대신 compute/zone을 설정합니다. 기본 위치를 설정하면 gcloud CLI에서 One of [--zone, --region] must be supplied: Please specify location과 같은 오류를 방지할 수 있습니다. 클러스터의 위치가 설정한 기본값과 다른 경우 특정 명령어에서 위치를 지정해야 할 수 있습니다.

GKE 클러스터가 DRA 워크로드용으로 구성되었는지 확인합니다.

DRA를 사용하여 워크로드 배포

포드별 기기 할당을 요청하려면 먼저 GPU 또는 TPU 요청을 설명하는 ResourceClaim을 생성하는 ResourceClaimTemplate을 만듭니다. Kubernetes는 이를 템플릿으로 사용하여 워크로드의 각 포드에 대한 새 ResourceClaim 객체를 만듭니다. 워크로드에 ResourceClaimTemplate을 지정하면 Kubernetes가 요청된 리소스를 할당하고 해당 노드에 포드를 예약합니다.

GPU

다음 매니페스트를 claim-template.yaml로 저장합니다.

apiVersion: resource.k8s.io/v1beta2
kind: ResourceClaimTemplate
metadata:
  name: gpu-claim-template
spec:
  spec:
    devices:
      requests:
      - name: single-gpu
        deviceClassName: gpu.nvidia.com
        allocationMode: ExactCount
        count: 1

ResourceClaimTemplate을 만듭니다.
```
kubectl create -f claim-template.yaml
```

ResourceClaimTemplate을 참조하는 워크로드를 만들려면 다음 매니페스트를 dra-gpu-example.yaml로 저장합니다.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dra-gpu-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dra-gpu-example
  template:
    metadata:
      labels:
        app: dra-gpu-example
    spec:
      containers:
      - name: ctr
        image: ubuntu:22.04
        command: ["bash", "-c"]
        args: ["while [ 1 ]; do date; echo $(nvidia-smi -L || echo Waiting...); sleep 60; done"]
        resources:
          claims:
          - name: single-gpu
      resourceClaims:
      - name: single-gpu
        resourceClaimTemplateName: gpu-claim-template
      tolerations:
      - key: "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoSchedule"

워크로드를 배포합니다.
```
kubectl create -f dra-gpu-example.yaml
```

TPU

다음 매니페스트를 claim-template.yaml로 저장합니다.

apiVersion: resource.k8s.io/v1beta2
kind: ResourceClaimTemplate
metadata:
  name: tpu-claim-template
spec:
  spec:
    devices:
      requests:
      - name: all-tpus
        deviceClassName: tpu.google.com
        allocationMode: All

이 ResourceClaimTemplate은 GKE가 모든 ResourceClaim에 전체 TPU 노드 풀을 할당하도록 요청합니다.

ResourceClaimTemplate을 만듭니다.
```
kubectl create -f claim-template.yaml
```

ResourceClaimTemplate을 참조하는 워크로드를 만들려면 다음 매니페스트를 dra-tpu-example.yaml로 저장합니다.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dra-tpu-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dra-tpu-example
  template:
    metadata:
      labels:
        app: dra-tpu-example
    spec:
      containers:
      - name: ctr
        image: ubuntu:22.04
        command:
          - /bin/sh
          - -c
          - |
            echo "Environment Variables:"
            env
            echo "Sleeping indefinitely..."
            sleep infinity
        resources:
          claims:
          - name: all-tpus
      resourceClaims:
      - name: all-tpus
        resourceClaimTemplateName: tpu-claim-template
      tolerations:
      - key: "google.com/tpu"
        operator: "Exists"
        effect: "NoSchedule"

워크로드를 배포합니다.
```
kubectl create -f dra-tpu-example.yaml
```

하드웨어 할당 확인

ResourceClaim을 확인하거나 포드의 로그를 확인하여 워크로드에 하드웨어가 할당되었는지 확인할 수 있습니다.

GPU

배포한 워크로드와 연결된 ResourceClaim을 가져옵니다.

kubectl get resourceclaims

다음과 유사한 결과가 출력됩니다.

NAME                                               STATE                AGE
dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh   allocated,reserved   9s

포드에 할당된 하드웨어에 대한 자세한 내용을 보려면 다음 명령어를 실행합니다.

kubectl describe resourceclaims RESOURCECLAIM

RESOURCECLAIM을 이전 단계의 출력에서 가져온 ResourceClaim의 전체 이름으로 바꿉니다.

다음과 유사한 결과가 출력됩니다.

Name:         dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh
Namespace:    default
Labels:       <none>
Annotations:  resource.kubernetes.io/pod-claim-name: single-gpu
API Version:  resource.k8s.io/v1beta1
Kind:         ResourceClaim
Metadata:
  Creation Timestamp:  2025-03-31T17:11:37Z
  Finalizers:
    resource.kubernetes.io/delete-protection
  Generate Name:  dra-gpu-example-64b75dc6b-x8bd6-single-gpu-
  Owner References:
    API Version:           v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Pod
    Name:                  dra-gpu-example-64b75dc6b-x8bd6
    UID:                   cb3cb1db-e62a-4961-9967-cdc7d599105b
  Resource Version:        12953269
  UID:                     3e0c3925-e15a-40e9-b552-d03610fff040
Spec:
  Devices:
    Requests:
      Allocation Mode:    ExactCount
      Count:              1
      Device Class Name:  gpu.nvidia.com
      Name:               single-gpu
Status:
  Allocation:
    Devices:
      Results:
        Admin Access:  <nil>
        Device:        gpu-0
        Driver:        gpu.nvidia.com
        Pool:          gke-cluster-gpu-pool-11026a2e-zgt1
        Request:       single-gpu
    Node Selector:
      # lines omitted for clarity
  Reserved For:
    Name:      dra-gpu-example-64b75dc6b-x8bd6
    Resource:  pods
    UID:       cb3cb1db-e62a-4961-9967-cdc7d599105b
Events:        <none>

배포한 워크로드의 로그를 가져오려면 다음 명령어를 실행합니다.
```
kubectl logs deployment/dra-gpu-example --all-pods=true | grep "GPU"
```
다음과 유사한 결과가 출력됩니다.
```
[pod/dra-gpu-example-64b75dc6b-x8bd6/ctr] GPU 0: Tesla T4 (UUID: GPU-2087ac7a-f781-8cd7-eb6b-b00943cc13ef)
```
이 단계의 출력은 GKE가 포드에 GPU 하나를 할당했음을 보여줍니다.

TPU

배포한 워크로드와 연결된 ResourceClaim을 가져옵니다.

kubectl get resourceclaims | grep dra-tpu-example

다음과 유사한 결과가 출력됩니다.

NAME                                               STATE                AGE
dra-tpu-example-64b75dc6b-x8bd6-all-tpus-jwwdh     allocated,reserved   9s

포드에 할당된 하드웨어에 대한 자세한 내용을 보려면 다음 명령어를 실행합니다.

kubectl describe resourceclaims RESOURCECLAIM -o yaml

RESOURCECLAIM을 이전 단계의 출력에서 가져온 ResourceClaim의 전체 이름으로 바꿉니다.

다음과 유사한 결과가 출력됩니다.

apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaim
metadata:
  annotations:
    resource.kubernetes.io/pod-claim-name: all-tpus
  creationTimestamp: "2025-03-04T21:00:54Z"
  finalizers:
  - resource.kubernetes.io/delete-protection
  generateName: dra-tpu-example-59b8785697-k9kzd-all-gpus-
  name: dra-tpu-example-59b8785697-k9kzd-all-gpus-gnr7z
  namespace: default
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Pod
    name: dra-tpu-example-59b8785697-k9kzd
    uid: c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f
  resourceVersion: "12189603"
  uid: 279b5014-340b-4ef6-9dda-9fbf183fbb71
spec:
  devices:
    requests:
    - allocationMode: All
      deviceClassName: tpu.google.com
      name: all-tpus
status:
  allocation:
    devices:
      results:
      - adminAccess: null
        device: "0"
        driver: tpu.google.com
        pool: gke-tpu-2ec29193-bcc0
        request: all-tpus
      - adminAccess: null
        device: "1"
        driver: tpu.google.com
        pool: gke-tpu-2ec29193-bcc0
        request: all-tpus
      - adminAccess: null
        device: "2"
        driver: tpu.google.com
        pool: gke-tpu-2ec29193-bcc0
        request: all-tpus
      - adminAccess: null
        device: "3"
        driver: tpu.google.com
        pool: gke-tpu-2ec29193-bcc0
        request: all-tpus
      - adminAccess: null
        device: "4"
        driver: tpu.google.com
        pool: gke-tpu-2ec29193-bcc0
        request: all-tpus
      - adminAccess: null
        device: "5"
        driver: tpu.google.com
        pool: gke-tpu-2ec29193-bcc0
        request: all-tpus
      - adminAccess: null
        device: "6"
        driver: tpu.google.com
        pool: gke-tpu-2ec29193-bcc0
        request: all-tpus
      - adminAccess: null
        device: "7"
        driver: tpu.google.com
        pool: gke-tpu-2ec29193-bcc0
        request: all-tpus
    nodeSelector:
      nodeSelectorTerms:
      - matchFields:
        - key: metadata.name
          operator: In
          values:
          - gke-tpu-2ec29193-bcc0
  reservedFor:
  - name: dra-tpu-example-59b8785697-k9kzd
    resource: pods
    uid: c2f4fe66-9a73-4bd3-a574-4c3eea5fda3f

배포한 워크로드의 로그를 가져오려면 다음 명령어를 실행합니다.

kubectl logs deployment/dra-tpu-example --all-pods=true | grep "TPU"

다음과 유사한 결과가 출력됩니다.

[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_CHIPS_PER_HOST_BOUNDS=2,4,1
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY_WRAP=false,false,false
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_SKIP_MDS_QUERY=true
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_RUNTIME_METRICS_PORTS=8431,8432,8433,8434,8435,8436,8437,8438
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_WORKER_ID=0
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_WORKER_HOSTNAMES=localhost
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY=2x4
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_ACCELERATOR_TYPE=v6e-8
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_HOST_BOUNDS=1,1,1
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_TOPOLOGY_ALT=false
[pod/dra-tpu-example-59b8785697-tm2lc/ctr] TPU_DEVICE_0_RESOURCE_CLAIM=77e68f15-fa2f-4109-9a14-6c91da1a38d3

이 단계의 출력은 노드 풀의 모든 TPU가 포드에 할당되었음을 나타냅니다.

다음 단계

GKE의 AI/ML 조정에 관한 추가 리소스 살펴보기