在 Google Kubernetes Engine (GKE) 上,部署搭載 Stable Diffusion 模型的 Ray Serve 應用程式


本指南提供範例,說明如何使用 Ray ServeRay Operator 外掛程式,在 Google Kubernetes Engine (GKE) 上部署及提供 Stable Diffusion 模型。

關於 Ray 和 Ray Serve

Ray 是開放原始碼的可擴充運算架構,適用於 AI/機器學習應用程式。Ray Serve 是 Ray 適用的模型服務程式庫,用於在分散式環境中擴充及提供模型。詳情請參閱 Ray 說明文件中的「Ray Serve」。

您可以使用 RayCluster 或 RayService 資源部署 Ray Serve 應用程式。在實際工作環境中,您應使用 RayService 資源,原因如下:

  • RayService 應用程式的就地更新
  • RayCluster 資源升級時完全不必停機
  • 高可用性的 Ray Serve 應用程式

目標

本指南適用於生成式 AI 客戶、GKE 新手或現有使用者、機器學習工程師、MLOps (DevOps) 工程師,或是有意使用 Kubernetes 容器協調功能,透過 Ray 服務模型的平台管理員。

  • 建立具有 GPU 節點集區的 GKE 叢集。
  • 使用 RayCluster 自訂資源建立 Ray 叢集。
  • 執行 Ray Serve 應用程式。
  • 部署 RayService 自訂資源。

費用

在本文件中,您會使用 Google Cloud的下列計費元件:

如要根據預測用量估算費用,請使用 Pricing Calculator

初次使用 Google Cloud 的使用者可能符合免費試用資格。

完成本文所述工作後,您可以刪除已建立的資源,避免繼續計費。詳情請參閱清除所用資源一節。

事前準備

Cloud Shell 已預先安裝本教學課程所需的軟體,包括 kubectlgcloud CLI。如果您未使用 Cloud Shell,則必須安裝 gcloud CLI。

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.

  3. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  4. To initialize the gcloud CLI, run the following command:

    gcloud init
  5. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the GKE API:

    gcloud services enable container.googleapis.com
  8. Install the Google Cloud CLI.

  9. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  10. To initialize the gcloud CLI, run the following command:

    gcloud init
  11. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  12. Make sure that billing is enabled for your Google Cloud project.

  13. Enable the GKE API:

    gcloud services enable container.googleapis.com
  14. Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin

    gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
    • Replace PROJECT_ID with your project ID.
    • Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.

    • Replace ROLE with each individual role.

準備環境

如要準備環境,請按照下列步驟操作:

  1. 在 Google Cloud 控制台中,按一下Cloud Shell 啟用圖示Google Cloud 控制台中的「啟用 Cloud Shell」,即可啟動 Cloud Shell 工作階段。系統會在 Google Cloud 控制台的底部窗格啟動工作階段。

  2. 設定環境變數:

    export PROJECT_ID=PROJECT_ID
    export CLUSTER_NAME=rayserve-cluster
    export COMPUTE_REGION=us-central1
    export COMPUTE_ZONE=us-central1-c
    export CLUSTER_VERSION=CLUSTER_VERSION
    export TUTORIAL_HOME=`pwd`
    

    更改下列內容:

    • PROJECT_ID:您的 Google Cloud 專案 ID
    • CLUSTER_VERSION:要使用的 GKE 版本。必須為 1.30.1 或之後。
  3. 複製 GitHub 存放區:

    git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
    
  4. 變更為工作目錄:

    cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion
    
  5. 建立 Python 虛擬環境:

    venv

    python -m venv myenv && \
    source myenv/bin/activate
    

    Conda

    1. 安裝 Conda

    2. 執行下列指令:

      conda create -c conda-forge python=3.9.19 -n myenv && \
      conda activate myenv
      

    使用 serve run 部署 Serve 應用程式時,Ray 會要求本機用戶端的 Python 版本與 Ray 叢集使用的版本相符。rayproject/ray:2.37.0 映像檔使用 Python 3.9。如果您執行的是其他用戶端版本,請選取適當的 Ray 映像檔

  6. 安裝執行 Serve 應用程式所需的依附元件:

    pip install ray[serve]==2.37.0
    pip install torch
    pip install requests
    

建立具有 GPU 節點集區的叢集

建立具有 GPU 節點集區的 Autopilot 或 Standard GKE 叢集:

Autopilot

建立 Autopilot 叢集:

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

標準

  1. 建立標準叢集:

    gcloud container clusters create ${CLUSTER_NAME} \
        --addons=RayOperator \
        --cluster-version=${CLUSTER_VERSION}  \
        --machine-type=c3d-standard-8 \
        --location=${COMPUTE_ZONE} \
        --num-nodes=1
    
  2. 建立 GPU 節點集區:

    gcloud container node-pools create gpu-pool \
        --cluster=${CLUSTER_NAME} \
        --machine-type=g2-standard-8 \
        --location=${COMPUTE_ZONE} \
        --num-nodes=1 \
        --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest
    

部署 RayCluster 資源

如要部署 RayCluster 資源:

  1. 請查看下列資訊清單:

    apiVersion: ray.io/v1
    kind: RayCluster
    metadata:
      name: stable-diffusion-cluster
    spec:
      rayVersion: '2.37.0'
      headGroupSpec:
        rayStartParams:
          dashboard-host: '0.0.0.0'
        template:
          metadata:
          spec:
            containers:
            - name: ray-head
              image: rayproject/ray:2.37.0
              ports:
              - containerPort: 6379
                name: gcs
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
              - containerPort: 8000
                name: serve
              resources:
                limits:
                  cpu: "2"
                  ephemeral-storage: "15Gi"
                  memory: "8Gi"
                requests:
                  cpu: "2"
                  ephemeral-storage: "15Gi"
                  memory: "8Gi"
            nodeSelector:
              cloud.google.com/machine-family: c3d
      workerGroupSpecs:
      - replicas: 1
        minReplicas: 1
        maxReplicas: 4
        groupName: gpu-group
        rayStartParams: {}
        template:
          spec:
            containers:
            - name: ray-worker
              image: rayproject/ray:2.37.0-gpu
              resources:
                limits:
                  cpu: 4
                  memory: "16Gi"
                  nvidia.com/gpu: 1
                requests:
                  cpu: 3
                  memory: "16Gi"
                  nvidia.com/gpu: 1
            nodeSelector:
              cloud.google.com/gke-accelerator: nvidia-l4

    這個資訊清單說明 RayCluster 資源。

  2. 將資訊清單套用至叢集:

    kubectl apply -f ray-cluster.yaml
    
  3. 確認 RayCluster 資源已準備就緒:

    kubectl get raycluster
    

    輸出結果會與下列內容相似:

    NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
    stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s
    

    在這個輸出內容中,STATUS 資料欄中的 ready 表示 RayCluster 資源已準備就緒。

連線至 RayCluster 資源

如要連線至 RayCluster 資源,請按照下列步驟操作:

  1. 確認 GKE 是否已建立 RayCluster 服務:

    kubectl get svc stable-diffusion-cluster-head-svc
    

    輸出結果會與下列內容相似:

    NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
    pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s
    
  2. 建立通訊埠轉送工作階段至 Ray 標頭:

    kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
    kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &
    
  3. 確認 Ray 用戶端可以使用 localhost 連線至 Ray 叢集:

    ray list nodes --address http://localhost:8265
    

    輸出結果會與下列內容相似:

    ======== List: 2024-06-19 15:15:15.707336 ========
    Stats:
    ------------------------------
    Total: 3
    
    Table:
    ------------------------------
        NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
    0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
    # Several lines of output omitted
    

執行 Ray Serve 應用程式

如要執行 Ray Serve 應用程式,請按照下列步驟操作:

  1. 執行 Stable Diffusion Ray Serve 應用程式:

    serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001
    
    

    輸出結果會與下列內容相似:

    2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
    2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
    2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
    2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
    2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
    2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
    2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
    2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
    2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.
    
  2. 建立通訊埠轉送工作階段至 Ray Serve 通訊埠 (8000):

    kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &
    
  3. 執行 Python 指令碼:

    python generate_image.py
    

    這項指令碼會將圖片產生至名為 output.png 的檔案。如下圖所示:

    日落時分的海灘。Stable Diffusion 生成的圖片。

部署 RayService

RayService 自訂資源可管理 RayCluster 資源和 Ray Serve 應用程式的生命週期。

如要進一步瞭解 RayService,請參閱 Ray 說明文件中的「Deploy Ray Serve Applications」和「Production Guide」。

如要部署 RayService 資源,請按照下列步驟操作:

  1. 請查看下列資訊清單:

    apiVersion: ray.io/v1
    kind: RayService
    metadata:
      name: stable-diffusion
    spec:
      serveConfigV2: |
        applications:
          - name: stable_diffusion
            import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
            runtime_env:
              working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
              pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
      rayClusterConfig:
        rayVersion: '2.37.0'
        headGroupSpec:
          rayStartParams:
            dashboard-host: '0.0.0.0'
          template:
            spec:
              containers:
              - name: ray-head
                image:  rayproject/ray:2.37.0
                ports:
                - containerPort: 6379
                  name: gcs
                - containerPort: 8265
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
                resources:
                  limits:
                    cpu: "2"
                    ephemeral-storage: "15Gi"
                    memory: "8Gi"
                  requests:
                    cpu: "2"
                    ephemeral-storage: "15Gi"
                    memory: "8Gi"
              nodeSelector:
                cloud.google.com/machine-family: c3d
        workerGroupSpecs:
        - replicas: 1
          minReplicas: 1
          maxReplicas: 4
          groupName: gpu-group
          rayStartParams: {}
          template:
            spec:
              containers:
              - name: ray-worker
                image: rayproject/ray:2.37.0-gpu
                resources:
                  limits:
                    cpu: 4
                    memory: "16Gi"
                    nvidia.com/gpu: 1
                  requests:
                    cpu: 3
                    memory: "16Gi"
                    nvidia.com/gpu: 1
              nodeSelector:
                cloud.google.com/gke-accelerator: nvidia-l4

    這個資訊清單說明 RayService 自訂資源。

  2. 將資訊清單套用至叢集:

    kubectl apply -f ray-service.yaml
    
  3. 確認服務已準備就緒:

    kubectl get svc stable-diffusion-serve-svc
    

    輸出結果會與下列內容相似:

    NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
    
    stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m
    
  4. 設定通訊埠轉送至 Ray Serve 服務:

    kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &
    
  5. 執行上一節的 Python 指令碼:

    python generate_image.py
    

    這個指令碼會產生類似上一節中生成的圖片。

清除所用資源

刪除專案

    Delete a Google Cloud project:

    gcloud projects delete PROJECT_ID

刪除個別資源

如要刪除叢集,請輸入:

gcloud container clusters delete ${CLUSTER_NAME}

後續步驟

  • 探索 Google Cloud 的參考架構、圖表和最佳做法。 歡迎瀏覽我們的雲端架構中心