이 페이지는 Cloud Translation API를 통해 번역되었습니다.

엔드포인트에 모델 배포

모델을 온라인 예측을 제공하는 데 사용하려면 먼저 모델을 엔드포인트에 배포해야 합니다. 모델을 배포하면 지연 시간이 짧은 온라인 예측을 제공하기 위해 물리적 리소스가 연결됩니다.

이 페이지에서는 온라인 예측을 사용하여 엔드포인트에 모델을 배포하기 위해 따라야 하는 단계를 설명합니다.

시작하기 전에

엔드포인트에 모델을 배포하기 전에 예측을 위해 모델 아티팩트를 내보내고 해당 페이지의 모든 필수 요건을 충족하는지 확인하세요.

리소스 풀 만들기

ResourcePool 맞춤 리소스를 사용하면 모델의 동작을 세밀하게 제어할 수 있습니다. 다음과 같은 설정을 정의할 수 있습니다.

자동 확장 구성입니다.
CPU 및 메모리 요구사항을 정의하는 머신 유형입니다.
GPU 리소스와 같은 가속기 옵션입니다.

머신 유형은 예측 클러스터를 생성하기 위해 전송하는 노드 풀 사양 요청에 필수적입니다.

배포된 모델의 리소스 풀에서 가속기 수와 유형에 따라 GPU 사용량이 결정됩니다. 머신 유형은 요청된 CPU 및 메모리 리소스만 지정합니다. 따라서 ResourcePool 사양에 GPU 가속기를 포함할 때 machineType 필드는 모델의 CPU 및 메모리 요구사항을 제어하고 acceleratorType 필드는 GPU를 제어합니다. 또한 acceleratorCount 필드는 GPU 슬라이스 수를 제어합니다.

다음 단계에 따라 ResourcePool 커스텀 리소스를 만듭니다.

ResourcePool 커스텀 리소스를 정의하는 YAML 파일을 만듭니다. 다음 예에는 GPU 가속기가 있는 리소스 풀 (GPU 기반 모델)과 GPU 가속기가 없는 리소스 풀 (CPU 기반 모델)의 YAML 파일이 포함되어 있습니다.

GPU 기반 모델

  apiVersion: prediction.aiplatform.gdc.goog/v1
  kind: ResourcePool
  metadata:
    name: RESOURCE_POOL_NAME
    namespace: PROJECT_NAMESPACE
  spec:
    resourcePoolID: RESOURCE_POOL_NAME
    enableContainerLogging: false
    dedicatedResources:
      machineSpec:
        # The system adds computing overhead to the nodes for mandatory components.
        # Choose a machineType value that allocates fewer CPU and memory resources
        # than those used by the nodes in the prediction cluster.
        machineType: a2-highgpu-1g-gdc
        acceleratorType: nvidia-a100-80gb
        # The accelerator count is a slice of the requested virtualized GPUs.
        # The value corresponds to one-seventh of 80 GB of GPUs for each count.
        acceleratorCount: 2
      autoscaling:
        minReplica: 2
        maxReplica: 10

CPU 기반 모델

  apiVersion: prediction.aiplatform.gdc.goog/v1
  kind: ResourcePool
  metadata:
    name: RESOURCE_POOL_NAME
    namespace: PROJECT_NAMESPACE
  spec:
    resourcePoolID: RESOURCE_POOL_NAME
    enableContainerLogging: false
    dedicatedResources:
      machineSpec:
        # The system adds computing overhead to the nodes for mandatory components.
        # Choose a machineType value that allocates fewer CPU and memory resources
        # than those used by the nodes in the prediction cluster.
        machineType: n2-highcpu-8-gdc
      autoscaling:
        minReplica: 2
        maxReplica: 10

다음을 바꿉니다.

RESOURCE_POOL_NAME: ResourcePool 정의 파일에 지정할 이름입니다.
PROJECT_NAMESPACE: 예측 클러스터와 연결된 프로젝트 네임스페이스의 이름입니다.

리소스 요구사항과 예측 클러스터에서 사용할 수 있는 항목에 따라 dedicatedResources 필드의 값을 수정합니다.

예측 클러스터에 ResourcePool 정의 파일을 적용합니다.
```
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f RESOURCE_POOL_NAME.yaml
```
다음을 바꿉니다.
- PREDICTION_CLUSTER_KUBECONFIG: 예측 클러스터의 kubeconfig 파일 경로입니다.
- RESOURCE_POOL_NAME: ResourcePool 정의 파일의 이름입니다.

ResourcePool 커스텀 리소스를 만들면 Kubernetes API와 웹훅 서비스가 YAML 파일을 검증하고 성공 또는 실패를 보고합니다. 예측 연산자는 엔드포인트에 모델을 배포할 때 리소스 풀에서 리소스를 프로비저닝하고 예약합니다.