在 Google Kubernetes Engine (GKE) 上使用 Stable Diffusion 模型部署 Ray Serve 应用


本指南通过使用 Ray ServeRay Operator 插件作为示例实现,来举例说明如何在 Google Kubernetes Engine (GKE) 上部署和应用 Stable Diffusion 模型。

Ray 和 Ray Serve 简介

Ray 是一个适用于 AI/机器学习应用的开源可伸缩计算框架。Ray Serve 是 Ray 的模型部署库,用于在分布式环境中扩缩和应用模型。如需了解详情,请参阅 Ray 文档中的 Ray Serve

您可以使用 RayCluster 或 RayService 资源部署 Ray Serve 应用。出于以下原因,您应该在生产环境中使用 RayService 资源:

  • RayService 应用的就地更新
  • RayCluster 资源的零停机时间升级
  • 高可用性 Ray Serve 应用

目标

本指南适用于对通过 Kubernetes 容器编排功能来使用 Ray 应用模型感兴趣的生成式 AI 客户、GKE 的新用户或现有用户、机器学习工程师、MLOps (DevOps) 工程师或平台管理员。

  • 创建具有 GPU 节点池的 GKE 集群。
  • 使用 RayCluster 自定义资源创建 Ray 集群。
  • 运行 Ray Serve 应用。
  • 部署 RayService 自定义资源。

费用

在本文档中,您将使用 Google Cloud 的以下收费组件:

您可使用价格计算器根据您的预计使用情况来估算费用。 Google Cloud 新用户可能有资格申请免费试用

完成本文档中描述的任务后,您可以通过删除所创建的资源来避免继续计费。如需了解详情,请参阅清理

准备工作

Cloud Shell 中预安装了本教程所需的软件,包括 kubectlgcloud CLI。如果您不使用 Cloud Shell,则必须安装 gcloud CLI。

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.
  3. To initialize the gcloud CLI, run the following command:

    gcloud init
  4. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  5. Make sure that billing is enabled for your Google Cloud project.

  6. Enable the GKE API:

    gcloud services enable container.googleapis.com
  7. Install the Google Cloud CLI.
  8. To initialize the gcloud CLI, run the following command:

    gcloud init
  9. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID

      Replace PROJECT_ID with a name for the Google Cloud project you are creating.

    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID

      Replace PROJECT_ID with your Google Cloud project name.

  10. Make sure that billing is enabled for your Google Cloud project.

  11. Enable the GKE API:

    gcloud services enable container.googleapis.com
  12. Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin

    gcloud projects add-iam-policy-binding PROJECT_ID --member="USER_IDENTIFIER" --role=ROLE
    • Replace PROJECT_ID with your project ID.
    • Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.

    • Replace ROLE with each individual role.
  13. 安装 RayServe

准备环境

如需准备您的环境,请按以下步骤操作:

  1. 点击 Google Cloud 控制台中的 Cloud Shell 激活图标 激活 Cloud Shell,从 Google Cloud 控制台启动 Cloud Shell 会话。此操作会在 Google Cloud 控制台的底部窗格中启动会话。

  2. 设置环境变量:

    export PROJECT_ID=PROJECT_ID
    export CLUSTER_NAME=rayserve-cluster
    export COMPUTE_REGION=us-central1
    export COMPUTE_ZONE=us-central1-c
    export CLUSTER_VERSION=CLUSTER_VERSION
    export TUTORIAL_HOME=`pwd`
    

    替换以下内容:

    • PROJECT_ID:您的 Google Cloud 项目 ID
    • CLUSTER_VERSION:要使用的 GKE 版本。必须为 1.30.1 或更高版本。
  3. 克隆 GitHub 代码库:

    git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
    
  4. 切换到工作目录:

    cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion
    

创建具有 GPU 节点池的集群

创建具有 GPU 节点池的 Autopilot 或 Standard GKE 集群:

Autopilot

创建 Autopilot 集群:

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

标准

创建 Standard 集群:

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --cluster-version=${CLUSTER_VERSION}  \
    --machine-type=g2-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=2 \
    --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest

部署 RayCluster 资源

如需部署 RayCluster 资源,请执行以下操作:

  1. 请查看以下清单:

    apiVersion: ray.io/v1
    kind: RayCluster
    metadata:
      name: stable-diffusion-cluster
    spec:
      rayVersion: '2.9.0'
      headGroupSpec:
        rayStartParams:
          dashboard-host: '0.0.0.0'
        template:
          metadata:
          spec:
            containers:
            - name: ray-head
              image: rayproject/ray-ml:2.9.0
              ports:
              - containerPort: 6379
                name: gcs
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
              - containerPort: 8000
                name: serve
              resources:
                limits:
                  cpu: "2"
                  memory: "8Gi"
                requests:
                  cpu: "2"
                  memory: "8Gi"
      workerGroupSpecs:
      - replicas: 1
        minReplicas: 1
        maxReplicas: 4
        groupName: gpu-group
        rayStartParams: {}
        template:
          spec:
            containers:
            - name: ray-worker
              image: rayproject/ray-ml:2.9.0
              resources:
                limits:
                  cpu: 4
                  memory: "16Gi"
                  nvidia.com/gpu: 1
                requests:
                  cpu: 3
                  memory: "16Gi"
                  nvidia.com/gpu: 1
            nodeSelector:
              cloud.google.com/gke-accelerator: nvidia-l4

    此清单描述了一个 RayCluster 资源。

  2. 将清单应用到您的集群:

    kubectl apply -f ray-cluster.yaml
    
  3. 验证 RayCluster 资源是否已准备就绪:

    kubectl get raycluster
    

    输出类似于以下内容:

    NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
    stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s
    

    在此输出中,STATUS 列中的 ready 表示 RayCluster 资源已准备就绪。

连接到 RayCluster 资源

如需连接到 RayCluster 资源,请执行以下操作:

  1. 验证 GKE 是否已创建 RayCluster 服务:

    kubectl get svc stable-diffusion-cluster-head-svc
    

    输出类似于以下内容:

    NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
    pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s
    
  2. 建立到 Ray 头节点的端口转发会话:

    kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
    kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &
    
  3. 验证 Ray 客户端是否可以使用 localhost 连接到 Ray 集群:

    ray list nodes --address http://localhost:8265
    

    输出类似于以下内容:

    ======== List: 2024-06-19 15:15:15.707336 ========
    Stats:
    ------------------------------
    Total: 3
    
    Table:
    ------------------------------
        NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
    0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
    # Several lines of output omitted
    

运行 Ray Serve 应用

如需运行 Ray Serve 应用,请执行以下操作:

  1. 运行 Stable Diffusion Ray Serve 应用:

    serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1"]}' --address ray://localhost:10001
    

    输出类似于以下内容:

    2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
    2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
    2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
    2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
    2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
    2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
    2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
    2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
    2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.
    
  2. 建立到 Ray Serve 端口 (8000) 的端口转发会话:

    kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &
    
  3. 运行 Python 脚本:

    python generate_image.py
    

    该脚本会向名为 output.png 的文件生成图片。图片类似于以下内容:

    落日下的海滩。由 Stable Diffusion 生成的图片。

部署 RayService

RayService 自定义资源会管理 RayCluster 资源和 Ray Serve 应用的生命周期。

如需详细了解 RayService,请参阅 Ray 文档中的部署 Ray Serve 应用生产指南

如需部署 RayService 资源,请按照以下步骤操作:

  1. 请查看以下清单:

    apiVersion: ray.io/v1
    kind: RayService
    metadata:
      name: stable-diffusion
    spec:
      serveConfigV2: |
        applications:
          - name: stable_diffusion
            import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
            runtime_env:
              working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
              pip: ["diffusers==0.12.1"]
      rayClusterConfig:
        rayVersion: '2.9.0'
        headGroupSpec:
          rayStartParams:
            dashboard-host: '0.0.0.0'
          template:
            spec:
              containers:
              - name: ray-head
                image: rayproject/ray-ml:2.9.0
                ports:
                - containerPort: 6379
                  name: gcs
                - containerPort: 8265
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
                resources:
                  limits:
                    cpu: "2"
                    memory: "8Gi"
                  requests:
                    cpu: "2"
                    memory: "8Gi"
        workerGroupSpecs:
        - replicas: 1
          minReplicas: 1
          maxReplicas: 4
          groupName: gpu-group
          rayStartParams: {}
          template:
            spec:
              containers:
              - name: ray-worker
                image: rayproject/ray-ml:2.9.0
                resources:
                  limits:
                    cpu: 4
                    memory: "16Gi"
                    nvidia.com/gpu: 1
                  requests:
                    cpu: 3
                    memory: "16Gi"
                    nvidia.com/gpu: 1
              nodeSelector:
                cloud.google.com/gke-accelerator: nvidia-l4

    此清单描述了一个 RayService 自定义资源。

  2. 将清单应用到您的集群:

    kubectl apply -f ray-service.yaml
    
  3. 验证 Service 是否已准备就绪:

    kubectl get svc stable-diffusion-serve-svc
    

    输出类似于以下内容:

    NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE 
    
    stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m 
    
  4. 配置到 Ray Serve 服务的端口转发:

    kubectl port-forward stable-diffusion-serve-svc 8000:8000 
    
  5. 运行上一部分中的 Python 脚本:

    python generate_image.py
    

    该脚本会生成与上一部分中生成的图片类似的图片。

清理

删除项目

    Delete a Google Cloud project:

    gcloud projects delete PROJECT_ID

删除各个资源

如需删除集群,请输入以下命令:

gcloud container clusters delete ${CLUSTER_NAME}

后续步骤

  • 探索有关 Google Cloud 的参考架构、图表和最佳做法。查看我们的 Cloud 架构中心