本頁面由 Cloud Translation API 翻譯而成。

在 Google Kubernetes Engine (GKE) 上，使用 Stable Diffusion 模型部署 Ray Serve 應用程式

本指南提供範例，說明如何使用 Ray Serve 和 Ray Operator 外掛程式，在 Google Kubernetes Engine (GKE) 上部署及提供 Stable Diffusion 模型。

關於 Ray 和 Ray Serve

Ray 是開放原始碼的可擴充運算架構，適用於 AI/ML 應用程式。Ray Serve 是 Ray 的模型服務程式庫，用於在分散式環境中擴充及提供模型。詳情請參閱 Ray 說明文件中的「Ray Serve」。

您可以使用 RayCluster 或 RayService 資源部署 Ray Serve 應用程式。在實際工作環境中，您應使用 RayService 資源，原因如下：

RayService 應用程式的就地更新
RayCluster 資源升級時完全不必停機
高可用性的 Ray Serve 應用程式

準備環境

如要準備環境，請按照下列步驟操作：

在 Google Cloud 控制台中，按一下Google Cloud 控制台中的「啟用 Cloud Shell」，即可啟動 Cloud Shell 工作階段。系統會在 Google Cloud 控制台的底部窗格啟動工作階段。

設定環境變數：

export PROJECT_ID=PROJECT_ID
export CLUSTER_NAME=rayserve-cluster
export COMPUTE_REGION=us-central1
export COMPUTE_ZONE=us-central1-c
export CLUSTER_VERSION=CLUSTER_VERSION
export TUTORIAL_HOME=`pwd`

更改下列內容：

PROJECT_ID：您的 Google Cloud 專案 ID。
CLUSTER_VERSION：要使用的 GKE 版本。必須為 1.30.1 或之後。

複製 GitHub 存放區：

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

變更為工作目錄：

cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion

建立 Python 虛擬環境：
venv
```
python -m venv myenv && \
source myenv/bin/activate
```
Conda
1. 安裝 Conda。
2. 執行下列指令：
  conda create -c conda-forge python=3.9.19 -n myenv && \ conda activate myenv
使用 serve run 部署 Serve 應用程式時，Ray 會要求本機用戶端的 Python 版本與 Ray 叢集使用的版本相符。rayproject/ray:2.37.0 映像檔使用 Python 3.9。如果您執行的是其他用戶端版本，請選取適當的 Ray 映像檔。

安裝執行 Serve 應用程式所需的依附元件：

pip install ray[serve]==2.37.0
pip install torch
pip install requests

建立具有 GPU 節點集區的叢集

建立具有 GPU 節點集區的 Autopilot 或 Standard GKE 叢集：

Autopilot

建立 Autopilot 叢集：

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

標準

建立標準叢集：

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --cluster-version=${CLUSTER_VERSION}  \
    --machine-type=c3d-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1

建立 GPU 節點集區：

gcloud container node-pools create gpu-pool \
    --cluster=${CLUSTER_NAME} \
    --machine-type=g2-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1 \
    --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest

部署 RayCluster 資源

如要部署 RayCluster 資源：

請查看下列資訊清單：

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: stable-diffusion-cluster
spec:
  rayVersion: '2.37.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      metadata:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.37.0
          ports:
          - containerPort: 6379
            name: gcs
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          resources:
            limits:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
            requests:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
        nodeSelector:
          cloud.google.com/machine-family: c3d
  workerGroupSpecs:
  - replicas: 1
    minReplicas: 1
    maxReplicas: 4
    groupName: gpu-group
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.37.0-gpu
          resources:
            limits:
              cpu: 4
              memory: "16Gi"
              nvidia.com/gpu: 1
            requests:
              cpu: 3
              memory: "16Gi"
              nvidia.com/gpu: 1
        nodeSelector:
          cloud.google.com/gke-accelerator: nvidia-l4

這個資訊清單說明 RayCluster 資源。

將資訊清單套用至叢集：
```
kubectl apply -f ray-cluster.yaml
```

確認 RayCluster 資源已準備就緒：

kubectl get raycluster

輸出結果會與下列內容相似：

NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s

在這個輸出內容中，STATUS 資料欄中的 ready 表示 RayCluster 資源已準備就緒。

連線至 RayCluster 資源

如要連線至 RayCluster 資源，請按照下列步驟操作：

確認 GKE 是否已建立 RayCluster 服務：

kubectl get svc stable-diffusion-cluster-head-svc

輸出結果會與下列內容相似：

NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s

建立通訊埠轉送工作階段，將流量轉送至 Ray head：

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &

確認 Ray 用戶端可以使用 localhost 連線至 Ray 叢集：

ray list nodes --address http://localhost:8265

輸出結果會與下列內容相似：

======== List: 2024-06-19 15:15:15.707336 ========
Stats:
------------------------------
Total: 3

Table:
------------------------------
    NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
# Several lines of output omitted

執行 Ray Serve 應用程式

如要執行 Ray Serve 應用程式，請按照下列步驟操作：

執行 Stable Diffusion Ray Serve 應用程式：

serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001

輸出結果會與下列內容相似：

2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.

建立通訊埠轉送工作階段至 Ray Serve 通訊埠 (8000)：

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &

執行 Python 指令碼：
```
python generate_image.py
```
這個指令碼會將圖片生成至名為 output.png 的檔案。如下圖所示：

部署 RayService

RayService 自訂資源可管理 RayCluster 資源和 Ray Serve 應用程式的生命週期。

如要進一步瞭解 RayService，請參閱 Ray 說明文件中的「Deploy Ray Serve Applications」和「Production Guide」。

如要部署 RayService 資源，請按照下列步驟操作：

請查看下列資訊清單：

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: stable-diffusion
spec:
  serveConfigV2: |
    applications:
      - name: stable_diffusion
        import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
        runtime_env:
          working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
          pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
  rayClusterConfig:
    rayVersion: '2.37.0'
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image:  rayproject/ray:2.37.0
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
              requests:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
          nodeSelector:
            cloud.google.com/machine-family: c3d
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 4
      groupName: gpu-group
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray:2.37.0-gpu
            resources:
              limits:
                cpu: 4
                memory: "16Gi"
                nvidia.com/gpu: 1
              requests:
                cpu: 3
                memory: "16Gi"
                nvidia.com/gpu: 1
          nodeSelector:
            cloud.google.com/gke-accelerator: nvidia-l4

這個資訊清單說明 RayService 自訂資源。

將資訊清單套用至叢集：
```
kubectl apply -f ray-service.yaml
```

確認服務已準備就緒：

kubectl get svc stable-diffusion-serve-svc

輸出結果會與下列內容相似：

NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE

stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m

設定通訊埠轉送至 Ray Serve 服務：

kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &

執行上一節的 Python 指令碼：
```
python generate_image.py
```
這個指令碼會產生類似上一節中生成的圖片。

在 Google Kubernetes Engine (GKE) 上，使用 Stable Diffusion 模型部署 Ray Serve 應用程式

關於 Ray 和 Ray Serve

準備環境

venv

Conda

建立具有 GPU 節點集區的叢集

Autopilot

標準

部署 RayCluster 資源

連線至 RayCluster 資源

執行 Ray Serve 應用程式

部署 RayService