Questa pagina è stata tradotta dall'API Cloud Translation.

Esegui il deployment di un'applicazione Ray Serve con un modello Stable Diffusion su Google Kubernetes Engine (GKE)

Autopilot Standard

Questa guida fornisce un esempio di come eseguire il deployment e pubblicare un modello Stable Diffusion su Google Kubernetes Engine (GKE) utilizzando Ray Serve e il componente aggiuntivo Ray Operator come implementazione di esempio.

Informazioni su Ray e Ray Serve

Ray è un framework di calcolo scalabile open source per applicazioni AI/ML. Ray Serve è una libreria di distribuzione di modelli per Ray utilizzata per scalare e distribuire modelli in un ambiente distribuito. Per ulteriori informazioni, consulta Ray Serve nella documentazione di Ray.

Puoi utilizzare una risorsa RayCluster o RayService per eseguire il deployment delle tue applicazioni Ray Serve. Devi utilizzare una risorsa RayService in produzione per i seguenti motivi:

Aggiornamenti sul posto per le applicazioni RayService
Upgrade senza tempi di inattività per le risorse RayCluster
Applicazioni Ray Serve ad alta affidabilità

Obiettivi

Questa guida è destinata ai clienti di Generative AI, agli utenti nuovi o esistenti di GKE, agli ingegneri ML, agli ingegneri MLOps (DevOps) o agli amministratori della piattaforma interessati a utilizzare le funzionalità di orchestrazione dei container Kubernetes per la gestione dei modelli utilizzando Ray.

Crea un cluster GKE con un pool di nodi GPU.
Crea un cluster Ray utilizzando la risorsa personalizzata RayCluster.
Esegui un'applicazione Ray Serve.
Esegui il deployment di una risorsa personalizzata RayService.

Costi

In questo documento, utilizzi i seguenti componenti fatturabili di Google Cloud:

Per generare una stima dei costi in base all'utilizzo previsto, utilizza il calcolatore prezzi.

I nuovi Google Cloud utenti potrebbero avere diritto a una prova gratuita.

Al termine delle attività descritte in questo documento, puoi evitare l'addebito di ulteriori costi eliminando le risorse che hai creato. Per ulteriori informazioni, vedi Pulizia.

Prima di iniziare

Cloud Shell è preinstallato con il software necessario per questo tutorial, tra cui kubectl e gcloud CLI. Se non utilizzi Cloud Shell, devi installare gcloud CLI.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

Note: If you installed the gcloud CLI previously, make sure you have the latest version by running gcloud components update.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the GKE API:

gcloud services enable container.googleapis.com

Install the Google Cloud CLI.

Note: If you installed the gcloud CLI previously, make sure you have the latest version by running gcloud components update.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the GKE API:

gcloud services enable container.googleapis.com

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/container.clusterAdmin, roles/container.admin
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
```
- Replace PROJECT_ID with your project ID.
- Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.
- Replace ROLE with each individual role.

prepara l'ambiente

Per preparare l'ambiente:

Avvia una sessione di Cloud Shell dalla console Google Cloud facendo clic su Attiva Cloud Shell nella consoleGoogle Cloud . Viene avviata una sessione nel riquadro inferiore della console Google Cloud .

Imposta le variabili di ambiente:

export PROJECT_ID=PROJECT_ID
export CLUSTER_NAME=rayserve-cluster
export COMPUTE_REGION=us-central1
export COMPUTE_ZONE=us-central1-c
export CLUSTER_VERSION=CLUSTER_VERSION
export TUTORIAL_HOME=`pwd`

Sostituisci quanto segue:

PROJECT_ID: il tuo Google Cloud ID progetto.
CLUSTER_VERSION: la versione di GKE da utilizzare. Deve essere 1.30.1 o successiva.

Clona il repository GitHub:

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

Passa alla directory di lavoro:

cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion

Crea un ambiente virtuale Python:
venv
```
python -m venv myenv && \
source myenv/bin/activate
```
Conda
1. Installa Conda.
2. Esegui questi comandi:
  conda create -c conda-forge python=3.9.19 -n myenv && \ conda activate myenv
Quando esegui il deployment di un'applicazione Serve con serve run, Ray si aspetta che la versione Python del client locale corrisponda a quella utilizzata nel cluster Ray. L'immagine rayproject/ray:2.37.0 utilizza Python 3.9. Se utilizzi una versione diversa del client, seleziona l'immagine Ray appropriata.

Installa le dipendenze richieste per eseguire l'applicazione Serve:

pip install ray[serve]==2.37.0
pip install torch
pip install requests

Crea un cluster con un pool di nodi GPU

Crea un cluster GKE Autopilot o Standard con un pool di nodi GPU:

Autopilot

Crea un cluster Autopilot:

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

Standard

Crea un cluster Standard:

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --cluster-version=${CLUSTER_VERSION}  \
    --machine-type=c3d-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1

Crea un pool di nodi GPU:

gcloud container node-pools create gpu-pool \
    --cluster=${CLUSTER_NAME} \
    --machine-type=g2-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1 \
    --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest

Esegui il deployment di una risorsa RayCluster

Per eseguire il deployment di una risorsa RayCluster:

Esamina il seguente manifest:

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: stable-diffusion-cluster
spec:
  rayVersion: '2.37.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      metadata:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.37.0
          ports:
          - containerPort: 6379
            name: gcs
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          resources:
            limits:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
            requests:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
        nodeSelector:
          cloud.google.com/machine-family: c3d
  workerGroupSpecs:
  - replicas: 1
    minReplicas: 1
    maxReplicas: 4
    groupName: gpu-group
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.37.0-gpu
          resources:
            limits:
              cpu: 4
              memory: "16Gi"
              nvidia.com/gpu: 1
            requests:
              cpu: 3
              memory: "16Gi"
              nvidia.com/gpu: 1
        nodeSelector:
          cloud.google.com/gke-accelerator: nvidia-l4

Questo manifest descrive una risorsa RayCluster.

Applica il manifest al cluster:
```
kubectl apply -f ray-cluster.yaml
```

Verifica che la risorsa RayCluster sia pronta:

kubectl get raycluster

L'output è simile al seguente:

NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s

In questo output, ready nella colonna STATUS indica che la risorsa RayCluster è pronta.

Connettiti alla risorsa RayCluster

Per connetterti alla risorsa RayCluster:

Verifica che GKE abbia creato il servizio RayCluster:

kubectl get svc stable-diffusion-cluster-head-svc

L'output è simile al seguente:

NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s

Stabilisci sessioni di port forwarding all'intestazione Ray:

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &

Verifica che il client Ray possa connettersi al cluster Ray utilizzando localhost:

ray list nodes --address http://localhost:8265

L'output è simile al seguente:

======== List: 2024-06-19 15:15:15.707336 ========
Stats:
------------------------------
Total: 3

Table:
------------------------------
    NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
# Several lines of output omitted

Esegui un'applicazione Ray Serve

Per eseguire un'applicazione Ray Serve:

Esegui l'applicazione Stable Diffusion Ray Serve:

serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001

L'output è simile al seguente:

2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.

Stabilisci una sessione di port forwarding alla porta Ray Serve (8000):

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &

Esegui lo script Python:
```
python generate_image.py
```
Lo script genera un'immagine in un file denominato output.png. L'immagine è simile alla seguente:

Esegui il deployment di un RayService

La risorsa personalizzata RayService gestisce il ciclo di vita di una risorsa RayCluster e dell'applicazione Ray Serve.

Per ulteriori informazioni su RayService, consulta Deploy Ray Serve Applications e Production Guide nella documentazione di Ray.

Per eseguire il deployment di una risorsa RayService:

Esamina il seguente manifest:

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: stable-diffusion
spec:
  serveConfigV2: |
    applications:
      - name: stable_diffusion
        import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
        runtime_env:
          working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
          pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
  rayClusterConfig:
    rayVersion: '2.37.0'
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image:  rayproject/ray:2.37.0
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
              requests:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
          nodeSelector:
            cloud.google.com/machine-family: c3d
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 4
      groupName: gpu-group
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray:2.37.0-gpu
            resources:
              limits:
                cpu: 4
                memory: "16Gi"
                nvidia.com/gpu: 1
              requests:
                cpu: 3
                memory: "16Gi"
                nvidia.com/gpu: 1
          nodeSelector:
            cloud.google.com/gke-accelerator: nvidia-l4

Questo manifest descrive una risorsa personalizzata RayService.

Applica il manifest al cluster:
```
kubectl apply -f ray-service.yaml
```

Verifica che il servizio sia pronto:

kubectl get svc stable-diffusion-serve-svc

L'output è simile al seguente:

NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE

stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m

Configura il port forwarding al servizio Ray Serve:

kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &

Esegui lo script Python della sezione precedente:
```
python generate_image.py
```
Lo script genera un'immagine simile a quella generata nella sezione precedente.

Esegui la pulizia

Elimina il progetto

Attenzione: l'eliminazione di un progetto ha i seguenti effetti:

L'intero contenuto del progetto viene eliminato. Se hai utilizzato un progetto esistente per le attività descritte in questo documento, quando lo elimini, elimini anche tutto il lavoro che hai svolto nel progetto.
Gli ID progetto personalizzati non sono più disponibili. Quando hai creato questo progetto, potresti aver creato un ID progetto personalizzato che vuoi utilizzare in futuro. Per conservare gli URL che utilizzano l'ID progetto, ad esempio un URL appspot.com, elimina le risorse selezionate all'interno del progetto anziché eliminare l'intero progetto.

Se intendi esplorare più architetture, tutorial o guide rapide, puoi riutilizzare i progetti ed evitare così di superare i limiti di quota.

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

Elimina singole risorse

Per eliminare il cluster, digita:

gcloud container clusters delete ${CLUSTER_NAME}

Passaggi successivi

Esplora architetture di riferimento, diagrammi e best practice su Google Cloud. Consulta il nostro Cloud Architecture Center.

Esegui il deployment di un'applicazione Ray Serve con un modello Stable Diffusion su Google Kubernetes Engine (GKE) Mantieni tutto organizzato con le raccolte Salva e classifica i contenuti in base alle tue preferenze.

Informazioni su Ray e Ray Serve

Obiettivi

Costi

Prima di iniziare

prepara l'ambiente

venv

Conda

Crea un cluster con un pool di nodi GPU

Autopilot

Standard

Esegui il deployment di una risorsa RayCluster

Connettiti alla risorsa RayCluster

Esegui un'applicazione Ray Serve

Esegui il deployment di un RayService

Esegui la pulizia

Elimina il progetto

Elimina singole risorse

Passaggi successivi

Esegui il deployment di un'applicazione Ray Serve con un modello Stable Diffusion su Google Kubernetes Engine (GKE)