Google Kubernetes Engine（GKE）に Stable Diffusion モデルを使用して Ray Serve アプリケーションをデプロイする

Autopilot Standard

このガイドでは、実装例として Ray Serve と Ray Operator アドオンを使用して、Google Kubernetes Engine（GKE）に Stable Diffusion モデルをデプロイして提供する方法の例を示します。

Ray と Ray Serve について

Ray は、AI / ML アプリケーション向けのオープンソースのスケーラブルなコンピューティングフレームワークです。Ray Serve は、分散環境でのモデルのスケーリングとサービングに使用される Ray のモデルサービングライブラリです。詳細については、Ray ドキュメントの Ray Serve をご覧ください。

RayCluster リソースまたは RayService リソースを使用して、Ray Serve アプリケーションをデプロイできます。本番環境では、次の理由から RayService リソースを使用する必要があります。

RayService アプリケーションのインプレースアップデート
RayCluster リソースのゼロダウンタイムアップグレード
高可用性の Ray Serve アプリケーション

目標

このガイドは、生成 AI をご利用のお客様、GKE の新規または既存のユーザー、ML エンジニア、MLOps（DevOps）エンジニア、プラットフォーム管理者で、Ray を使用してモデルを提供するために Kubernetes コンテナオーケストレーション機能を使用することに関心のある方を対象としています。

GPU ノードプールを含む GKE クラスタを作成します。
RayCluster カスタムリソースを使用して Ray クラスタを作成します。
Ray Serve アプリケーションを実行します。
RayService カスタムリソースをデプロイします。

費用

このドキュメントでは、課金対象である次の Google Cloudコンポーネントを使用します。

料金計算ツールを使うと、予想使用量に基づいて費用の見積もりを生成できます。

新規の Google Cloud ユーザーは無料トライアルをご利用いただける場合があります。

このドキュメントに記載されているタスクの完了後、作成したリソースを削除すると、それ以上の請求は発生しません。詳細については、クリーンアップをご覧ください。

始める前に

Cloud Shell には、kubectl、gcloud CLI など、このチュートリアルに必要なソフトウェアがプリインストールされています。Cloud Shell を使用しない場合は、gcloud CLI をインストールする必要があります。

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

注: すでに gcloud CLI をインストールしている場合は、gcloud components update を実行して、最新バージョンがインストールされていることを確認してください。

外部 ID プロバイダ（IdP）を使用している場合は、まずフェデレーション ID を使用して gcloud CLI にログインする必要があります。

gcloud CLI を初期化するには、次のコマンドを実行します。

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the GKE API:

gcloud services enable container.googleapis.com

Install the Google Cloud CLI.

外部 ID プロバイダ（IdP）を使用している場合は、まずフェデレーション ID を使用して gcloud CLI にログインする必要があります。

gcloud CLI を初期化するには、次のコマンドを実行します。

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the GKE API:

gcloud services enable container.googleapis.com

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
```
- Replace PROJECT_ID with your project ID.
- Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.
- Replace ROLE with each individual role.

環境を準備する

環境の準備手順は次のとおりです。

Google Cloud コンソールで（Cloud Shell をアクティブにする）をクリックして、 Google Cloud コンソールから Cloud Shell セッションを起動します。 Google Cloud コンソールの下部ペインでセッションが起動します。
環境変数を設定します。
```
export PROJECT_ID=PROJECT_ID
export CLUSTER_NAME=rayserve-cluster
export COMPUTE_REGION=us-central1
export COMPUTE_ZONE=us-central1-c
export CLUSTER_VERSION=CLUSTER_VERSION
export TUTORIAL_HOME=`pwd`
```
次のように置き換えます。
- PROJECT_ID: Google Cloudのプロジェクト ID。
- CLUSTER_VERSION: 使用する GKE のバージョン。1.30.1 以降にする必要があります。

GitHub リポジトリのクローンを作成します。

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

作業ディレクトリを変更します。

cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion

Python 仮想環境を作成します。
venv
```
python -m venv myenv && \
source myenv/bin/activate
```
Conda
1. Conda をインストールします。
2. 次のコマンドを実行します。
  conda create -c conda-forge python=3.9.19 -n myenv && \ conda activate myenv
serve run を使用して Serve アプリケーションをデプロイする場合、Ray はローカルクライアントの Python バージョンが Ray クラスタで使用されているバージョンと一致することを想定しています。rayproject/ray:2.37.0 イメージは Python 3.9 を使用します。別のクライアントバージョンを実行している場合は、適切な Ray イメージを選択します。
Serve アプリケーションの実行に必要な依存関係をインストールします。
```
pip install ray[serve]==2.37.0
pip install torch
pip install requests
```

クラスタと GPU ノードプールを作成する

GPU ノードプールを含む Autopilot または Standard GKE クラスタを作成します。

Autopilot

Autopilot クラスタを作成します。

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

Standard

Standard クラスタを作成します。

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --cluster-version=${CLUSTER_VERSION}  \
    --machine-type=c3d-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1

GPU ノードプールを作成します。

gcloud container node-pools create gpu-pool \
    --cluster=${CLUSTER_NAME} \
    --machine-type=g2-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1 \
    --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest

RayCluster リソースをデプロイする

RayCluster リソースをデプロイするには:

次のマニフェストを確認します。

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: stable-diffusion-cluster
spec:
  rayVersion: '2.37.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      metadata:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.37.0
          ports:
          - containerPort: 6379
            name: gcs
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          resources:
            limits:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
            requests:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
        nodeSelector:
          cloud.google.com/machine-family: c3d
  workerGroupSpecs:
  - replicas: 1
    minReplicas: 1
    maxReplicas: 4
    groupName: gpu-group
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.37.0-gpu
          resources:
            limits:
              cpu: 4
              memory: "16Gi"
              nvidia.com/gpu: 1
            requests:
              cpu: 3
              memory: "16Gi"
              nvidia.com/gpu: 1
        nodeSelector:
          cloud.google.com/gke-accelerator: nvidia-l4

このマニフェストでは、RayCluster リソースを記述しています。

マニフェストをクラスタに適用します。
```
kubectl apply -f ray-cluster.yaml
```

RayCluster リソースの準備ができていることを確認します。

kubectl get raycluster

出力は次のようになります。

NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s

この出力の STATUS 列の ready は、RayCluster リソースの準備が完了したことを示します。

RayCluster リソースに接続する

RayCluster リソースに接続するには:

GKE が RayCluster Service を作成したことを確認します。

kubectl get svc stable-diffusion-cluster-head-svc

出力は次のようになります。

NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s

Ray ヘッドへのポート転送セッションを確立します。

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &

Ray クライアントが localhost を使用して Ray クラスタに接続できることを確認します。

ray list nodes --address http://localhost:8265

出力は次のようになります。

======== List: 2024-06-19 15:15:15.707336 ========
Stats:
------------------------------
Total: 3

Table:
------------------------------
    NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
# Several lines of output omitted

Ray Serve アプリケーションを実行する

Ray Serve アプリケーションを実行するには:

Stable Diffusion Ray Serve アプリケーションを実行します。

serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001

出力は次のようになります。

2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.

Ray Serve ポート（8000）へのポート転送セッションを確立します。

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &

Python スクリプトを実行します。
```
python generate_image.py
```
このスクリプトは、output.png という名前のファイルにイメージを生成します。イメージは次の例のようになります。

RayService をデプロイする

RayService カスタムリソースは、RayCluster リソースと Ray Serve アプリケーションのライフサイクルを管理します。

RayService の詳細については、Ray のドキュメントで Ray Serve アプリケーションをデプロイすると本番環境ガイドをご覧ください。

RayService リソースをデプロイする手順は次のとおりです。

次のマニフェストを確認します。

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: stable-diffusion
spec:
  serveConfigV2: |
    applications:
      - name: stable_diffusion
        import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
        runtime_env:
          working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
          pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
  rayClusterConfig:
    rayVersion: '2.37.0'
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image:  rayproject/ray:2.37.0
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
              requests:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
          nodeSelector:
            cloud.google.com/machine-family: c3d
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 4
      groupName: gpu-group
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray:2.37.0-gpu
            resources:
              limits:
                cpu: 4
                memory: "16Gi"
                nvidia.com/gpu: 1
              requests:
                cpu: 3
                memory: "16Gi"
                nvidia.com/gpu: 1
          nodeSelector:
            cloud.google.com/gke-accelerator: nvidia-l4

このマニフェストでは、RayService カスタムリソースを記述しています。

マニフェストをクラスタに適用します。
```
kubectl apply -f ray-service.yaml
```

Service が準備できたことを確認します。

kubectl get svc stable-diffusion-serve-svc

出力は次のようになります。

NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE

stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m

Ray Serve Service へのポート転送を構成します。

kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &

前のセクションの Python スクリプトを実行します。
```
python generate_image.py
```
このスクリプトにより、前のセクションで生成されたイメージと同様のイメージが生成されます。

クリーンアップ

プロジェクトを削除する

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用すると、プロジェクトの割り当て上限を超えないようにすることができます。

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

リソースを個別に削除する

クラスタを削除するには、次のように入力します。

gcloud container clusters delete ${CLUSTER_NAME}

次のステップ

Google Cloud に関するリファレンスアーキテクチャ、図、ベストプラクティスを確認する。Cloud アーキテクチャセンターをご覧ください。

Google Kubernetes Engine（GKE）に Stable Diffusion モデルを使用して Ray Serve アプリケーションをデプロイする コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

Ray と Ray Serve について

目標

費用

始める前に

環境を準備する

venv

Conda

クラスタと GPU ノードプールを作成する

Autopilot

Standard

RayCluster リソースをデプロイする

RayCluster リソースに接続する

Ray Serve アプリケーションを実行する

RayService をデプロイする

クリーンアップ

プロジェクトを削除する

リソースを個別に削除する

次のステップ

Google Kubernetes Engine（GKE）に Stable Diffusion モデルを使用して Ray Serve アプリケーションをデプロイする