TPU가 있는 Google Kubernetes Engine(GKE)에 Stable Diffusion 모델이 있는 Ray Serve 애플리케이션 배포

표준

이 가이드에서는 TPU, Ray Serve, Ray Operator 부가기능을 사용하여 Google Kubernetes Engine(GKE)에 Stable Diffusion 모델을 배포하고 제공하는 방법을 보여줍니다.

이 가이드는 Ray를 사용하여 모델을 서빙하기 위해 Kubernetes 컨테이너 조정 기능을 사용하는 데 관심이 있는 생성형 AI 고객, GKE의 신규 또는 기존 사용자, ML 엔지니어, MLOps(DevOps) 엔지니어, 플랫폼 관리자를 대상으로 합니다.

Ray 및 Ray Serve 정보

Ray는 AI/ML 애플리케이션을 위한 확장 가능한 오픈소스 컴퓨팅 프레임워크입니다. Ray Serve는 분산 환경에서 모델을 확장하고 서빙하는 데 사용되는 Ray용 모델 서빙 라이브러리입니다. 자세한 내용은 Ray 문서의 Ray Serve를 참고하세요.

TPU 정보

Tensor Processing Unit(TPU)은 대규모 머신러닝 모델의 학습 및 추론 속도를 크게 높이도록 설계된 특수 하드웨어 가속기입니다. TPU와 함께 Ray를 사용하면 고성능 ML 애플리케이션을 원활하게 확장할 수 있습니다. TPU에 대한 자세한 내용은 Cloud TPU 문서의 Cloud TPU 소개를 참조하세요.

KubeRay TPU 초기화 웹훅 정보

GKE는 Ray Operator 부가기능의 일부로 TPU Pod 일정 예약과 컨테이너 초기화를 위한 JAX와 같은 프레임워크에서 필요로 하는 특정 TPU 환경 변수를 처리하는 검증 및 변형 웹훅을 제공합니다. KubeRay TPU 웹훅은 다음 속성을 사용해서 TPU를 요청하는 app.kubernetes.io/name: kuberay 라벨이 있는 포드를 변형합니다.

TPU_WORKER_ID: TPU 슬라이스의 각 워커 포드에 대한 고유한 정수입니다.
TPU_WORKER_HOSTNAMES: 슬라이스 내에서 서로 통신해야 하는 모든 TPU 작업자의 DNS 호스트 이름 목록입니다. 이 변수는 멀티 호스트 그룹의 TPU Pod에만 삽입됩니다.
replicaIndex: 포드가 속한 작업자 그룹 복제본의 고유 식별자가 포함된 포드 라벨입니다. 여러 작업자 포드가 동일한 복제본에 속할 수 있는 다중 호스트 작업자 그룹에 유용하며, Ray에서 다중 호스트 자동 확장을 사용 설정하는 데 사용됩니다.
TPU_NAME: 이 포드가 속한 GKE TPU PodSlice를 나타내는 문자열로, replicaIndex 라벨과 동일한 값으로 설정됩니다.
podAffinity: GKE가 동일한 노드 풀에서 일치하는 replicaIndex 라벨을 사용하여 TPU Pod를 예약하도록 합니다. 이렇게 하면 GKE가 단일 노드가 아닌 노드 풀별로 멀티 호스트 TPU를 원자적으로 확장할 수 있습니다.

목표

TPU 노드 풀이 있는 GKE 클러스터를 만듭니다.
TPU가 있는 Ray 클러스터를 배포합니다.
RayService 커스텀 리소스를 배포합니다.
Stable Diffusion 모델 서버와 상호작용합니다.

비용

이 문서에서는 비용이 청구될 수 있는 Google Cloud구성요소( )를 사용합니다.

프로젝트 사용량을 기준으로 예상 비용을 산출하려면 가격 계산기를 사용하세요.

Google Cloud 신규 사용자는 무료 체험판을 사용할 수 있습니다.

이 문서에 설명된 태스크를 완료했으면 만든 리소스를 삭제하여 청구가 계속되는 것을 방지할 수 있습니다. 자세한 내용은 삭제를 참조하세요.

시작하기 전에

Cloud Shell에는 kubectl 및 gcloud CLI 등 이 튜토리얼에 필요한 소프트웨어가 사전 설치되어 있습니다. Cloud Shell을 사용하지 않는 경우에는 gcloud CLI를 설치합니다.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

참고: 이전에 gcloud CLI를 설치했으면 gcloud components update를 실행하여 최신 버전이 설치되어 있는지 확인하세요.

외부 ID 공급업체(IdP)를 사용하는 경우 먼저 제휴 ID로 gcloud CLI에 로그인해야 합니다.

gcloud CLI를 초기화하려면, 다음 명령어를 실행합니다.

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable container.googleapis.com

Install the Google Cloud CLI.

참고: 이전에 gcloud CLI를 설치했으면 gcloud components update를 실행하여 최신 버전이 설치되어 있는지 확인하세요.

외부 ID 공급업체(IdP)를 사용하는 경우 먼저 제휴 ID로 gcloud CLI에 로그인해야 합니다.

gcloud CLI를 초기화하려면, 다음 명령어를 실행합니다.

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable container.googleapis.com

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
```
Replace the following:
- PROJECT_ID: Your project ID.
- USER_IDENTIFIER: The identifier for your user account. For example, myemail@example.com.
- ROLE: The IAM role that you grant to your user account.

충분한 할당량 보장

Google Cloud 프로젝트에 Compute Engine 리전 또는 영역에서 TPU 할당량이 충분한지 확인합니다. 자세한 내용은 Cloud TPU 문서의 TPU 및 GKE 할당량이 충분한지 확인을 참조하세요. 다음에 대한 할당량도 늘려야 할 수 있습니다.

Persistent Disk SSD(GB)
사용 중인 IP 주소

개발 환경 준비

환경을 준비하려면 다음 단계를 수행합니다.

Google Cloud 콘솔에서 Cloud Shell 세션을 시작합니다. Google Cloud 콘솔에서 Cloud Shell 활성화를 클릭합니다. 그러면 Google Cloud 콘솔 하단 창에서 세션이 실행됩니다.
환경 변수를 설정합니다.
```
export PROJECT_ID=PROJECT_ID
export CLUSTER_NAME=ray-cluster
export COMPUTE_REGION=us-central2-b
export CLUSTER_VERSION=CLUSTER_VERSION
```
다음을 바꿉니다.
- PROJECT_ID: Google Cloud프로젝트 ID입니다.
- CLUSTER_VERSION: 사용할 GKE 버전입니다. 1.30.1 이상이어야 합니다.

GitHub 저장소를 클론합니다.

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

작업 디렉터리로 변경합니다.

cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion

TPU 노드 풀이 있는 클러스터 만들기

다음과 같이 TPU 노드 풀이 있는 표준 GKE 클러스터를 만듭니다.

다음과 같이 Ray 연산자가 사용 설정된 표준 모드 클러스터를 만듭니다.

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --machine-type=n1-standard-8 \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

다음과 같이 단일 호스트 TPU 노드 풀을 만듭니다.

gcloud container node-pools create tpu-pool \
    --location=${COMPUTE_REGION} \
    --cluster=${CLUSTER_NAME} \
    --machine-type=ct4p-hightpu-4t \
    --num-nodes=1

표준 모드에서 TPU를 사용하려면 다음을 선택해야 합니다.

TPU 가속기 용량이 있는 Compute Engine 위치
TPU와 호환되는 머신 유형
TPU PodSlice의 물리적 토폴로지

TPU가 있는 RayCluster 리소스 구성

다음과 같이 TPU 워크로드를 준비하도록 RayCluster 매니페스트를 구성합니다.

TPU `nodeSelector` 구성

GKE는 Kubernetes nodeSelectors를 사용하여 TPU 워크로드가 적절한 TPU 토폴로지 및 가속기에 예약되도록 합니다. TPU nodeSelector 선택에 대한 자세한 내용은 GKE Standard에서 TPU 워크로드 배포를 참조하세요.

2x2x1 토폴로지가 있는 v4 TPU podslice에서 포드를 예약하도록 ray-cluster.yaml 매니페스트를 업데이트합니다.

nodeSelector:
  cloud.google.com/gke-tpu-accelerator: tpu-v4-podslice
  cloud.google.com/gke-tpu-topology: 2x2x1

TPU 컨테이너 리소스 구성

TPU 가속기를 사용하려면 RayCluster 매니페스트 workerGroupSpecs의 TPU 컨테이너 필드에서 google.com/tpu 리소스 limits 및 requests를 구성하여 GKE가 각 포드에 할당해야 하는 TPU 칩 수를 지정해야 합니다.

리소스 한도 및 요청으로 ray-cluster.yaml 매니페스트를 다음과 같이 업데이트합니다.

resources:
  limits:
    cpu: "1"
    ephemeral-storage: 10Gi
    google.com/tpu: "4"
    memory: "2G"
   requests:
    cpu: "1"
    ephemeral-storage: 10Gi
    google.com/tpu: "4"
    memory: "2G"

작업자 그룹 `numOfHosts` 구성

KubeRay v1.1.0에서는 작업자 그룹 복제본당 만들 TPU 호스트 수를 지정하는 numOfHosts 필드를 RayCluster 커스텀 리소스에 추가합니다. 멀티호스트 워커 그룹의 경우 복제본은 개별 워커가 아닌 PodSlice로 취급되며 복제본당 numOfHosts 워커 노드가 생성됩니다.

다음으로 ray-cluster.yaml 매니페스트를 업데이트합니다.

workerGroupSpecs:
  # Several lines omitted
  numOfHosts: 1 # the number of "hosts" or workers per replica

RayService 커스텀 리소스 만들기

다음과 같이 RayService 커스텀 리소스를 만듭니다.

다음 매니페스트를 검토합니다.

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: stable-diffusion-tpu
spec:
  serveConfigV2: |
    applications:
      - name: stable_diffusion
        import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion_tpu:deployment
        runtime_env:
          working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/refs/heads/main.zip"
          pip:
            - diffusers==0.7.2
            - flax
            - jax[tpu]==0.4.11
            - -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
            - fastapi
  rayClusterConfig:
    rayVersion: '2.9.0'
    headGroupSpec:
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-head
            image: rayproject/ray-ml:2.9.0-py310
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "2"
                memory: "8G"
              requests:
                cpu: "2"
                memory: "8G"
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 10
      numOfHosts: 1
      groupName: tpu-group
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray-ml:2.9.0-py310
            resources:
              limits:
                cpu: "100"
                ephemeral-storage: 20Gi
                google.com/tpu: "4"
                memory: 200G
              requests:
                cpu: "100"
                ephemeral-storage: 20Gi
                google.com/tpu: "4"
                memory: 200G
          nodeSelector:
            cloud.google.com/gke-tpu-accelerator: tpu-v4-podslice
            cloud.google.com/gke-tpu-topology: 2x2x1

이 매니페스트는 헤드 노드가 1개인 RayCluster 리소스와 2x2x1 토폴로지가 있는 TPU 워커 그룹을 생성하는 RayService 커스텀 리소스를 설명합니다. 즉, 각 워커 노드에 v4 TPU 칩 4개가 있습니다.

TPU 노드는 2x2x1 토폴로지가 있는 단일 v4 TPU Podslice에 속합니다. 멀티 호스트 작업자 그룹을 만들려면 gke-tpu nodeSelector 값, google.com/tpu 컨테이너 한도 및 요청, numOfHosts 값을 멀티 호스트 구성으로 바꿉니다. TPU 멀티 호스트 토폴로지에 대한 자세한 내용은 Cloud TPU 문서의 시스템 아키텍처를 참조하세요.

매니페스트를 클러스터에 적용합니다.
```
kubectl apply -f ray-service-tpu.yaml
```
RayService 리소스가 실행 중인지 확인합니다.
```
kubectl get rayservices
```
출력은 다음과 비슷합니다.
```
NAME                   SERVICE STATUS   NUM SERVE ENDPOINTS
stable-diffusion-tpu   Running          2
```
이 출력에서 SERVICE STATUS 열의 Running는 RayService 리소스가 준비되었다는 것을 나타냅니다.

(선택사항) Ray 대시보드 보기

Ray 대시보드에서 Ray Serve 배포 및 관련 로그를 볼 수 있습니다.

다음과 같이 Ray 헤드 서비스에서 Ray 대시보드로의 포트 전달 세션을 설정합니다.
```
kubectl port-forward svc/stable-diffusion-tpu-head-svc 8265:8265
```
웹브라우저에서 http://localhost:8265/로 이동합니다.
Serve 탭을 클릭합니다.

모델 서버에 프롬프트 전송

Ray 헤드 서비스에서 Serve 엔드포인트로의 포트 전달 세션을 설정합니다.
```
kubectl port-forward svc/stable-diffusion-tpu-serve-svc 8000
```
새 Cloud Shell 세션을 엽니다.
Stable Diffusion 모델 서버에 텍스트 이미지 변환 프롬프트를 제출합니다.
```
python stable_diffusion_tpu_req.py  --save_pictures
```
Stable Diffusion 추론 결과는 diffusion_results.png 파일에 저장됩니다.

삭제

프로젝트 삭제

주의: 프로젝트 삭제가 미치는 영향은 다음과 같습니다.

프로젝트의 모든 항목이 삭제됩니다. 이 문서의 태스크에 기존 프로젝트를 사용한 경우 프로젝트를 삭제하면 프로젝트에서 수행한 다른 작업도 삭제됩니다.
커스텀 프로젝트 ID가 손실됩니다. 이 프로젝트를 만들 때 앞으로 사용할 커스텀 프로젝트 ID를 만들었을 수 있습니다. appspot.com URL과 같이 프로젝트 ID를 사용하는 URL을 보존하려면 전체 프로젝트를 삭제하는 대신 프로젝트 내에서 선택한 리소스만 삭제합니다.

여러 아키텍처, 튜토리얼, 빠른 시작을 살펴보려는 경우 프로젝트를 재사용하면 프로젝트 할당량 한도 초과를 방지할 수 있습니다.

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

개별 리소스 삭제

클러스터를 삭제하려면 다음을 입력하세요.

gcloud container clusters delete ${CLUSTER_NAME}

다음 단계

Kubernetes의 Ray 알아보기
KubeRay 문서 살펴보기
Google Cloud에 대한 참조 아키텍처, 다이어그램, 권장사항 살펴보기 Cloud 아키텍처 센터 살펴보기

TPU가 있는 Google Kubernetes Engine(GKE)에 Stable Diffusion 모델이 있는 Ray Serve 애플리케이션 배포

Ray 및 Ray Serve 정보

TPU 정보

KubeRay TPU 초기화 웹훅 정보

목표

비용

시작하기 전에

충분한 할당량 보장

개발 환경 준비

TPU 노드 풀이 있는 클러스터 만들기

TPU가 있는 RayCluster 리소스 구성

TPU nodeSelector 구성

TPU 컨테이너 리소스 구성

작업자 그룹 numOfHosts 구성

RayService 커스텀 리소스 만들기

(선택사항) Ray 대시보드 보기

모델 서버에 프롬프트 전송

삭제

프로젝트 삭제

개별 리소스 삭제

다음 단계

TPU `nodeSelector` 구성

작업자 그룹 `numOfHosts` 구성