Google Kubernetes Engine(GKE)에 안정적인 확산 모델을 사용하여 Ray Serve 애플리케이션 배포

Autopilot Standard

이 가이드에서는 Ray Serve 및 Ray Operator 부가기능을 예시 구현으로 사용하여 Google Kubernetes Engine (GKE)에 안정적인 확산 모델을 배포하고 서빙하는 방법의 예시를 제공합니다.

Ray 및 Ray Serve 정보

Ray는 AI/ML 애플리케이션을 위한 확장 가능한 오픈소스 컴퓨팅 프레임워크입니다. Ray Serve는 분산 환경에서 모델을 확장하고 서빙하는 데 사용되는 Ray용 모델 서빙 라이브러리입니다. 자세한 내용은 Ray 문서의 Ray Serve를 참조하세요.

RayCluster 또는 RayService 리소스를 사용하여 Ray Serve 애플리케이션을 배포할 수 있습니다. 다음과 같은 이유로 프로덕션에서 RayService 리소스를 사용해야 합니다.

RayService 애플리케이션의 인플레이스 업데이트
RayCluster 리소스의 다운타임 없는 업그레이드
가용성이 높은 Ray Serve 애플리케이션

목표

이 가이드는 Ray를 사용하여 모델을 서빙하기 위해 Kubernetes 컨테이너 조정 기능을 사용하는 데 관심이 있는 생성형 AI 고객, GKE의 신규 또는 기존 사용자, ML 엔지니어, MLOps(DevOps) 엔지니어, 플랫폼 관리자를 대상으로 합니다.

GPU 노드 풀이 있는 GKE 클러스터를 만듭니다.
RayCluster 커스텀 리소스를 사용하여 Ray 클러스터를 만듭니다.
Ray Serve 애플리케이션을 실행합니다.
RayService 커스텀 리소스를 배포합니다.

비용

이 문서에서는 비용이 청구될 수 있는 다음과 같은 Google Cloud구성요소를 사용합니다.

프로젝트 사용량을 기준으로 예상 비용을 산출하려면 가격 계산기를 사용합니다. 신규 Google Cloud 사용자는 무료 체험판을 이용할 수 있습니다.

이 문서에 설명된 태스크를 완료했으면 만든 리소스를 삭제하여 청구가 계속되는 것을 방지할 수 있습니다. 자세한 내용은 삭제를 참조하세요.

시작하기 전에

Cloud Shell에는 kubectl 및 gcloud CLI 등 이 튜토리얼에 필요한 소프트웨어가 사전 설치되어 있습니다. Cloud Shell을 사용하지 않는 경우에는 gcloud CLI를 설치해야 합니다.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

Note: If you installed the gcloud CLI previously, make sure you have the latest version by running

gcloud components
      update

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the GKE API:

gcloud services enable container.googleapis.com

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

Note: If you installed the gcloud CLI previously, make sure you have the latest version by running

gcloud components
      update

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the GKE API:

gcloud services enable container.googleapis.com

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
```
- Replace PROJECT_ID with your project ID.
- Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.
- Replace ROLE with each individual role.
RayServe를 설치합니다.

개발 환경 준비

환경을 준비하려면 다음 단계를 수행합니다.

Google Cloud 콘솔에서 Cloud Shell 활성화를 클릭하여 Google Cloud 콘솔에서 Cloud Shell 세션을 시작합니다. 그러면 Google Cloud 콘솔 하단 창에서 세션이 시작됩니다.

환경 변수를 설정합니다.

export PROJECT_ID=PROJECT_ID
export CLUSTER_NAME=rayserve-cluster
export COMPUTE_REGION=us-central1
export COMPUTE_ZONE=us-central1-c
export CLUSTER_VERSION=CLUSTER_VERSION
export TUTORIAL_HOME=`pwd`

다음을 바꿉니다.

PROJECT_ID: Google Cloud 프로젝트 ID
CLUSTER_VERSION: 사용할 GKE 버전. 1.30.1 이상이어야 합니다.

GitHub 저장소를 클론합니다.

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

작업 디렉터리로 변경합니다.

cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion

GPU 노드 풀이 있는 클러스터 만들기

GPU 노드 풀이 있는 Autopilot 또는 Standard GKE 클러스터를 만듭니다.

Autopilot

Autopilot 클러스터를 만듭니다.

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

표준

표준 클러스터 만들기

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --cluster-version=${CLUSTER_VERSION}  \
    --machine-type=g2-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=2 \
    --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest

RayCluster 리소스 배포

RayCluster 리소스를 배포하려면 다음 안내를 따르세요.

다음 매니페스트를 검토합니다.

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: stable-diffusion-cluster
spec:
  rayVersion: '2.9.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      metadata:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray-ml:2.9.0
          ports:
          - containerPort: 6379
            name: gcs
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          resources:
            limits:
              cpu: "2"
              memory: "8Gi"
            requests:
              cpu: "2"
              memory: "8Gi"
  workerGroupSpecs:
  - replicas: 1
    minReplicas: 1
    maxReplicas: 4
    groupName: gpu-group
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray-ml:2.9.0
          resources:
            limits:
              cpu: 4
              memory: "16Gi"
              nvidia.com/gpu: 1
            requests:
              cpu: 3
              memory: "16Gi"
              nvidia.com/gpu: 1
        nodeSelector:
          cloud.google.com/gke-accelerator: nvidia-l4

이 매니페스트는 RayCluster 리소스를 설명합니다.

매니페스트를 클러스터에 적용합니다.
```
kubectl apply -f ray-cluster.yaml
```

RayCluster 리소스가 준비되었는지 확인합니다.

kubectl get raycluster

출력은 다음과 비슷합니다.

NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s

이 출력에서 STATUS 열의 ready는 RayCluster 리소스가 준비되었다는 것을 나타냅니다.

RayCluster 리소스에 연결

RayCluster 리소스에 연결하려면 다음 안내를 따르세요.

GKE가 RayCluster 서비스를 만들었는지 확인합니다.

kubectl get svc stable-diffusion-cluster-head-svc

출력은 다음과 비슷합니다.

NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s

Ray 헤드에 포트 전달 세션을 설정합니다.

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &

Ray 클라이언트가 localhost를 사용하여 Ray 클러스터에 연결할 수 있는지 확인합니다.

ray list nodes --address http://localhost:8265

출력은 다음과 비슷합니다.

======== List: 2024-06-19 15:15:15.707336 ========
Stats:
------------------------------
Total: 3

Table:
------------------------------
    NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
# Several lines of output omitted

Ray Serve 애플리케이션 실행

Ray Serve 애플리케이션을 실행하는 방법은 다음과 같습니다.

안정적인 확산 Ray Serve 애플리케이션을 실행합니다.

serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1"]}' --address ray://localhost:10001

출력은 다음과 비슷합니다.

2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.

Ray Serve 포트(8000)로의 포트 전달 세션을 설정합니다.

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &

Python 스크립트를 실행합니다.
```
python generate_image.py
```
스크립트는 output.png 파일에 이미지를 생성합니다. 출력은 다음과 비슷합니다.

RayService 배포

RayService 커스텀 리소스는 RayCluster 리소스 및 Ray Serve 애플리케이션의 수명 주기를 관리합니다.

RayService에 관한 자세한 내용은 Ray 문서의 Ray Serve 애플리케이션 배포 및 프로덕션 가이드를 참조하세요.

RayService 리소스를 배포하려면 다음 단계를 수행합니다.

다음 매니페스트를 검토합니다.

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: stable-diffusion
spec:
  serveConfigV2: |
    applications:
      - name: stable_diffusion
        import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
        runtime_env:
          working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
          pip: ["diffusers==0.12.1"]
  rayClusterConfig:
    rayVersion: '2.9.0'
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image: rayproject/ray-ml:2.9.0
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "2"
                memory: "8Gi"
              requests:
                cpu: "2"
                memory: "8Gi"
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 4
      groupName: gpu-group
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray-ml:2.9.0
            resources:
              limits:
                cpu: 4
                memory: "16Gi"
                nvidia.com/gpu: 1
              requests:
                cpu: 3
                memory: "16Gi"
                nvidia.com/gpu: 1
          nodeSelector:
            cloud.google.com/gke-accelerator: nvidia-l4

이 매니페스트는 RayService 커스텀 리소스를 설명합니다.

매니페스트를 클러스터에 적용합니다.
```
kubectl apply -f ray-service.yaml
```

서비스가 준비되었는지 확인합니다.

kubectl get svc stable-diffusion-serve-svc

출력은 다음과 비슷합니다.

NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE 

stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m

Ray Serve 서비스에 대한 포트 전달을 구성합니다.

kubectl port-forward stable-diffusion-serve-svc 8000:8000

이전 섹션의 Python 스크립트를 실행합니다.
```
python generate_image.py
```
이 스크립트는 이전 섹션에서 생성된 이미지와 비슷한 이미지를 생성합니다.

삭제

프로젝트 삭제

주의: 프로젝트를 삭제하면 다음과 같은 효과가 발생합니다.

프로젝트의 모든 항목이 삭제됩니다. 이 문서의 태스크에 기존 프로젝트를 사용한 경우 프로젝트를 삭제하면 프로젝트에서 수행한 다른 작업도 삭제됩니다.
커스텀 프로젝트 ID가 손실됩니다. 이 프로젝트를 만들 때 앞으로 사용할 커스텀 프로젝트 ID를 만들었을 수 있습니다. appspot.com URL과 같이 프로젝트 ID를 사용하는 URL을 보존하려면 전체 프로젝트를 삭제하는 대신 프로젝트 내에서 선택한 리소스만 삭제합니다.

여러 아키텍처, 튜토리얼, 빠른 시작을 살펴보려는 경우 프로젝트를 재사용하면 프로젝트 할당량 한도 초과를 방지할 수 있습니다.

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

개별 리소스 삭제

클러스터를 삭제하려면 다음을 입력하세요.

gcloud container clusters delete ${CLUSTER_NAME}

다음 단계

Google Cloud에 대한 참조 아키텍처, 다이어그램, 권장사항 살펴보기. Cloud 아키텍처 센터를 살펴보세요.

Google Kubernetes Engine(GKE)에 안정적인 확산 모델을 사용하여 Ray Serve 애플리케이션 배포 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

Ray 및 Ray Serve 정보

목표

비용

시작하기 전에

개발 환경 준비

GPU 노드 풀이 있는 클러스터 만들기

Autopilot

표준

RayCluster 리소스 배포

RayCluster 리소스에 연결

Ray Serve 애플리케이션 실행

RayService 배포

삭제

프로젝트 삭제

개별 리소스 삭제

다음 단계

Google Kubernetes Engine(GKE)에 안정적인 확산 모델을 사용하여 Ray Serve 애플리케이션 배포