This tutorial shows you how to fine-tune a Gemma 3 model by using the Ray framework on a multi-node GKE cluster. The cluster uses two A4 virtual machine (VM) instances, each with eight NVIDIA B200 GPUs attached.
The content of this tutorial is divided into two parts:
- Preparing the Ray Cluster on top of a GKE Autopilot cluster.
- Running distributed training job, utilizing 2 A4 instances, with 8 B200 GPUs each.
This tutorial is intended for machine learning (ML) engineers, researchers, platform administrators and operators, and for data and AI specialists who are interested in distributing an AI workload across multiple nodes and GPUs.
Objectives
Access a Gemma 3 model by using Hugging Face.
Prepare your environment.
Create a GKE Autopilot cluster with the Ray Operator installed on it.
Configure the Ray Cluster on the GKE cluster to accept Ray Jobs.
Configure and run a Ray Job that tunes the Gemma 3 model based on visual input.
Monitor your workload.
Clean up.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator
(
roles/resourcemanager.projectCreator
), which contains theresourcemanager.projects.create
permission. Learn how to grant roles.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_ID
with a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_ID
with your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the required API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin
), which contains theserviceusage.services.enable
permission. Learn how to grant roles.gcloud services enable gcloud services enable compute.googleapis.com logging.googleapis.com cloudresourcemanager.googleapis.com servicenetworking.googleapis.com container.googleapis.com
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator
(
roles/resourcemanager.projectCreator
), which contains theresourcemanager.projects.create
permission. Learn how to grant roles.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_ID
with a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_ID
with your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the required API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin
), which contains theserviceusage.services.enable
permission. Learn how to grant roles.gcloud services enable gcloud services enable compute.googleapis.com logging.googleapis.com cloudresourcemanager.googleapis.com servicenetworking.googleapis.com container.googleapis.com
-
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/compute.admin, roles/iam.serviceAccountUser, roles/file.editor, roles/storage.admin, roles/container.clusterAdmin, roles/serviceusage.serviceUsageAdmin
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
Replace the following:
PROJECT_ID
: your project ID.USER_IDENTIFIER
: the identifier for your user account—for example,myemail@example.com
.ROLE
: the IAM role that you grant to your user account.
- Enable the default service account for your Google Cloud project:
gcloud iam service-accounts enable PROJECT_NUMBER-compute@developer.gserviceaccount.com \ --project=PROJECT_ID
Replace PROJECT_NUMBER with your project number. To review your project number, see Get an existing project.
- Grant the Editor role (
roles/editor
) to the default service account:gcloud projects add-iam-policy-binding PROJECT_ID \ --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" \ --role=roles/editor
- Create local authentication credentials for your user account:
gcloud auth application-default login
- Sign in to or create a Hugging Face account.
Access Gemma 3 by using Hugging Face
To use Hugging Face to access Gemma 3, do the following:
Copy and save the
read access
token value. You use it later in this tutorial.
Prepare your environment
Prepare your environment by configuring the necessary settings and setting the environment variables.
Run the following:
gcloud config set billing/quota_project $PROJECT_ID
export RESERVATION=RESERVATION_URL
export REGION=REGION
export CLUSTER_NAME=CLUSTER_NAME
export HF_TOKEN=HF_TOKEN
export NETWORK=default
export GCS_BUCKET=GCS_BUCKET
Replace the following:
RESERVATION_URL
: the URL of the reservation that you want to use to create your cluster. Based on the project in which the reservation exists, specify one of the following values:- The reservation exists in your project:
RESERVATION_NAME
- The reservation exists in a different project, and your project can use the
reservation:
projects/RESERVATION_PROJECT_ID/reservations/RESERVATION_NAME
. Both full and partial URLs are accepted. For example, you can useprojects/RESERVATION_PROJECT_ID/reservations/RESERVATION_NAME
.
- The reservation exists in your project:
REGION
: the region where you want to create your GKE cluster. You can only create the cluster in the region where your reservation exists.CLUSTER_NAME
: the name of the GKE cluster to create.HF_TOKEN
: the Hugging Face token that you created in an earlier step.GCS_BUCKET
: the name of the bucket where you store the results from the training checkpoint.
Create a GKE cluster in Autopilot mode
To create a GKE cluster in Autopilot mode, run the following:
gcloud container clusters create-auto $CLUSTER_NAME \
--enable-ray-operator \
--enable-ray-cluster-monitoring \
--enable-ray-cluster-logging \
--location=$REGION
It might take some time to complete the creation of the GKE cluster. To verify if Google Cloud has finished creating your cluster, go to Kubernetes clusters on the Google Cloud console.
Create a Kubernetes secret for Hugging Face credentials
In Cloud Shell, to create a Kubernetes secret for Hugging Face credentials, do the following:
Configure
kubectl
to connect to your cluster:gcloud container clusters get-credentials $CLUSTER_NAME \ --region=$REGION
Create a Kubernetes secret to store your Hugging Face token:
kubectl create secret generic hf-secret \ --from-literal=hf_api_token=${HF_TOKEN} \ --dry-run=client -o yaml | kubectl apply -f -
Create the Google Cloud Storage bucket
If you want to use a new bucket to store your training artifacts, run the following:
gcloud storage buckets create gs://$GCS_BUCKET --location=$REGION
If you want to use an existing bucket, you can skip this step. However, you must ensure that your bucket is in the same region as your cluster.
Save your training code as a ConfigMap
To avoid the need to embed your training script into a container image, you store it as a ConfigMap in your cluster. This ConfigMap is mounted onto the Pod file systems, which lets you update the training script without having to recreate the entire Ray cluster.
Navigate to the
code
folder and create a new file.Copy the following
code/vision_train.py
code into this new file:import argparse import datetime import ray import ray.train.huggingface.transformers import torch from PIL import Image from datasets import load_dataset from peft import LoraConfig from ray.train import ScalingConfig, RunConfig from ray.train.torch import TorchTrainer from transformers import AutoProcessor, AutoModelForImageTextToText, BitsAndBytesConfig from trl import SFTConfig from trl import SFTTrainer # System message for the assistant system_message = "You are an expert product description writer for Amazon." # User prompt that combines the user query and the schema user_prompt = """Create a Short Product description based on the provided <PRODUCT> and <CATEGORY> and image. Only return description. The description should be SEO optimized and for a better mobile search experience. <PRODUCT> {product} </PRODUCT> <CATEGORY> {category} </CATEGORY> """ def get_args(): parser = argparse.ArgumentParser() parser.add_argument("--model_id", type=str, default="google/gemma-3-4b-it", help="Hugging Face model ID") # parser.add_argument("--hf_token", type=str, default=None, help="Hugging Face token for private models") parser.add_argument("--dataset_name", type=str, default="philschmid/amazon-product-descriptions-vlm", help="Hugging Face dataset name") parser.add_argument("--output_dir", type=str, default="gemma-3-4b-seo-optimized", help="Directory to save model checkpoints") parser.add_argument("--gcs_bucket", type=str, required=True, help="GCS bucket name used to synchronize tasks and save checkpoints") parser.add_argument("--push_to_hub", help="Push model to Hugging Face hub", action="store_true") # LoRA arguments parser.add_argument("--lora_r", type=int, default=16, help="LoRA attention dimension") parser.add_argument("--lora_alpha", type=int, default=16, help="LoRA alpha scaling factor") parser.add_argument("--lora_dropout", type=float, default=0.05, help="LoRA dropout probability") # SFTConfig arguments parser.add_argument("--max_seq_length", type=int, default=512, help="Maximum sequence length") parser.add_argument("--num_train_epochs", type=int, default=3, help="Number of training epochs") parser.add_argument("--per_device_train_batch_size", type=int, default=1, help="Batch size per device during training") parser.add_argument("--gradient_accumulation_steps", type=int, default=4, help="Gradient accumulation steps") parser.add_argument("--learning_rate", type=float, default=2e-4, help="Learning rate") parser.add_argument("--logging_steps", type=int, default=10, help="Log every X steps") parser.add_argument("--save_strategy", type=str, default="epoch", help="Checkpoint save strategy") parser.add_argument("--save_steps", type=int, default=100, help="Save checkpoint every X steps") return parser.parse_args() # Convert dataset to OAI messages def format_data(sample): return { "messages": [ { "role": "system", "content": [{"type": "text", "text": system_message}], }, { "role": "user", "content": [ { "type": "text", "text": user_prompt.format( product=sample["Product Name"], category=sample["Category"], ), }, { "type": "image", "image": sample["image"], }, ], }, { "role": "assistant", "content": [{"type": "text", "text": sample["description"]}], }, ], } def process_vision_info(messages: list[dict]) -> list[Image.Image]: image_inputs = [] # Iterate through each conversation for msg in messages: # Get content (ensure it's a list) content = msg.get("content", []) if not isinstance(content, list): content = [content] # Check each content element for images for element in content: if isinstance(element, dict) and ("image" in element or element.get("type") == "image"): # Get the image and convert to RGB if "image" in element: image = element["image"] else: image = element image_inputs.append(image.convert("RGB")) return image_inputs def train(args): # Load dataset from the hub dataset = load_dataset(args.dataset_name, split="train", streaming=True) # Convert dataset to OAI messages # need to use list comprehension to keep Pil.Image type, .mape convert image to bytes dataset = [format_data(sample) for sample in dataset] # Hugging Face model id model_id = args.model_id # Check if GPU benefits from bfloat16 if torch.cuda.get_device_capability()[0] < 8: raise ValueError("GPU does not support bfloat16, please use a GPU that supports bfloat16.") # Define model init arguments model_kwargs = dict( attn_implementation="eager", # Use "flash_attention_2" when running on Ampere or newer GPU torch_dtype=torch.bfloat16, # What torch dtype to use, defaults to auto # device_map="auto", # Let torch decide how to load the model ) # BitsAndBytesConfig int-4 config model_kwargs["quantization_config"] = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=model_kwargs["torch_dtype"], bnb_4bit_quant_storage=model_kwargs["torch_dtype"], ) # Load model and tokenizer model = AutoModelForImageTextToText.from_pretrained(model_id, **model_kwargs) processor = AutoProcessor.from_pretrained(model_id, use_fast=True) peft_config = LoraConfig( lora_alpha=args.lora_alpha, lora_dropout=args.lora_dropout, r=args.lora_r, bias="none", target_modules="all-linear", task_type="CAUSAL_LM", modules_to_save=[ "lm_head", "embed_tokens", ], ) args = SFTConfig( output_dir=args.output_dir, # directory to save and repository id num_train_epochs=args.num_train_epochs, # number of training epochs per_device_train_batch_size=args.per_device_train_batch_size, # batch size per device during training gradient_accumulation_steps=args.gradient_accumulation_steps, # number of steps before performing a backward/update pass gradient_checkpointing=True, # use gradient checkpointing to save memory optim="adamw_torch_fused", # use fused adamw optimizer logging_steps=args.logging_steps, # log every N steps save_strategy=args.save_strategy, # save checkpoint every epoch learning_rate=args.learning_rate, # learning rate, based on QLoRA paper bf16=True, # use bfloat16 precision max_grad_norm=0.3, # max gradient norm based on QLoRA paper warmup_ratio=0.03, # warmup ratio based on QLoRA paper lr_scheduler_type="constant", # use constant learning rate scheduler push_to_hub=args.push_to_hub, # push model to hub report_to="tensorboard", # report metrics to tensorboard gradient_checkpointing_kwargs={ "use_reentrant": False }, # use reentrant checkpointing dataset_text_field="", # need a dummy field for collator dataset_kwargs={"skip_prepare_dataset": True}, # important for collator ) args.remove_unused_columns = False # important for collator # Create a data collator to encode text and image pairs def collate_fn(examples): texts = [] images = [] for example in examples: image_inputs = process_vision_info(example["messages"]) text = processor.apply_chat_template( example["messages"], add_generation_prompt=False, tokenize=False ) texts.append(text.strip()) images.append(image_inputs) # Tokenize the texts and process the images batch = processor(text=texts, images=images, return_tensors="pt", padding=True) # The labels are the input_ids, and we mask the padding tokens and image tokens in the loss computation labels = batch["input_ids"].clone() # Mask image tokens image_token_id = [ processor.tokenizer.convert_tokens_to_ids( processor.tokenizer.special_tokens_map["boi_token"] ) ] # Mask tokens for not being used in the loss computation labels[labels == processor.tokenizer.pad_token_id] = -100 labels[labels == image_token_id] = -100 labels[labels == 262144] = -100 batch["labels"] = labels return batch trainer = SFTTrainer( model=model, args=args, train_dataset=dataset, peft_config=peft_config, processing_class=processor, data_collator=collate_fn, ) callback = ray.train.huggingface.transformers.RayTrainReportCallback() trainer.add_callback(callback) trainer = ray.train.huggingface.transformers.prepare_trainer(trainer) # Start training, the model will be automatically saved to the Hub and the output directory trainer.train() # Save the final model again to the Hugging Face Hub trainer.save_model() if __name__ == "__main__": args = get_args() print("Starting training task!") training_name = f"gemma_vision_train_{datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')}" gcs_bucket = args.gcs_bucket if not gcs_bucket.startswith("gs://"): gcs_bucket = "gs://" + gcs_bucket run_config = RunConfig( storage_path=gcs_bucket, name=training_name, ) scaling_config = ScalingConfig(num_workers=16, use_gpu=True, accelerator_type="B200") ray_trainer = TorchTrainer(train, train_loop_config=args, scaling_config=scaling_config, run_config=run_config) print("Commencing training!") result = ray_trainer.fit()
Save the file.
Create a ConfigMap object in your cluster:
kubectl create cm ray-job-cm --from-file=code -o yaml --dry-run=client | kubectl apply -f -
To update the training script, you rerun the preceding command. It might take a minute before any changes propagate to all the pods.
Configure Ray Cluster
To create a Ray Cluster in your GKE cluster, save the following YAML as
ray_cluster.yaml
file.apiVersion: ray.io/v1 kind: RayCluster metadata: name: gemma3-tuning spec: rayVersion: '2.48.0' headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: metadata: spec: containers: - name: ray-head image: rayproject/ray:2.48.0 ports: - containerPort: 6379 name: gcs - containerPort: 8265 name: dashboard - containerPort: 10001 name: client resources: limits: cpu: "24" ephemeral-storage: "9Gi" memory: "64Gi" requests: cpu: "24" ephemeral-storage: "9Gi" memory: "64Gi" env: - name: HF_TOKEN valueFrom: secretKeyRef: name: hf-secret key: hf_api_token volumeMounts: - name: job-code mountPath: /code/ - mountPath: /mnt/local-ssd/ name: local-storage volumes: - name: job-code configMap: name: ray-job-cm - name: local-storage emptyDir: { } workerGroupSpecs: - replicas: 2 minReplicas: 1 maxReplicas: 5 groupName: gpu-group rayStartParams: {} template: spec: containers: - name: ray-worker image: rayproject/ray:2.48.0-gpu resources: limits: nvidia.com/gpu: "8" requests: nvidia.com/gpu: "8" env: - name: HF_TOKEN valueFrom: secretKeyRef: name: hf-secret key: hf_api_token volumeMounts: - name: job-code mountPath: /code/ - mountPath: /mnt/local-ssd/ name: local-storage volumes: - name: job-code configMap: name: ray-job-cm - name: local-storage emptyDir: { } nodeSelector: cloud.google.com/gke-accelerator: nvidia-b200 cloud.google.com/reservation-name: $RESERVATION cloud.google.com/reservation-affinity: "specific" cloud.google.com/gke-gpu-driver-version: latest
Apply this YAML definition to your cluster using the following command:
envsubst < ray_cluster.yaml | kubectl apply -f -
The
$RESERVATION
flag is automatically replaced with the name you configured as environment variable.Ray Operator creates the raylet pods, which in turn triggers autoscaling of the cluster to provide those pods with the appropriate nodes. Three pods are created in your cluster: one head node, and two worker nodes. The worker nodes are equipped with the B200 GPUs.
To verify that all three of the pods are ready, run the following:
kubectl get pods
The pod list of a ready Ray Cluster is similar to the following:
NAME READY STATUS RESTARTS AGE gemma3-tuning-gpu-group-worker-s4h8f 2/2 Running 0 16m gemma3-tuning-gpu-group-worker-stg5f 2/2 Running 0 5m34s gemma3-tuning-head-zbdvp 2/2 Running 0 16m
Schedule a training job
Save the following as a
ray_job.yaml
file:apiVersion: ray.io/v1 kind: RayJob metadata: name: test-ray-job spec: entrypoint: python /code/vision_train.py --gcs_bucket $GCS_BUCKET runtimeEnvYAML: | pip: - torch==2.8.0 - torchvision==0.23.0 - ray==2.48.0 - transformers==4.55.2 - datasets==4.0.0 - evaluate==0.4.5 - accelerate==1.10.0 - pillow==11.3.0 - bitsandbytes==0.47.0 - trl==0.21.0 - peft==0.17.0 clusterSelector: ray.io/cluster: gemma3-tuning
Submit the RayJob definition to your RayCluster:
envsubst < ray_job.yaml | kubectl apply -f -
Check that a new Pod is in your cluster:
kubectl get pods
Make a note of the full name of the
test-ray-job-
Pod that you see in the output. This name is unique to your job.Inspect the progress of your training. Replace
gemma-training-ray-job-UNIQUE_ID
with the unique Pod name that you noted in the previous step.kubectl logs -f <gemma-training-ray-job-UNIQUE_ID>
The output that you see is similar to the following:
2025-08-20 08:29:34,966 INFO cli.py:41 -- Job submission server address: http://gemma3-tuning-head-svc.default.svc.cluster.local:8265 2025-08-20 08:29:34,991 SUCC cli.py:65 -- ----------------------------------------------- 2025-08-20 08:29:34,991 SUCC cli.py:66 -- Job 'test-ray-job-82mm7' submitted successfully 2025-08-20 08:29:34,991 SUCC cli.py:67 -- ----------------------------------------------- 2025-08-20 08:29:34,992 INFO cli.py:291 -- Next steps 2025-08-20 08:29:34,992 INFO cli.py:292 -- Query the logs of the job: 2025-08-20 08:29:34,992 INFO cli.py:294 -- ray job logs test-ray-job-82mm7 2025-08-20 08:29:34,992 INFO cli.py:296 -- Query the status of the job: 2025-08-20 08:29:34,992 INFO cli.py:298 -- ray job status test-ray-job-82mm7 2025-08-20 08:29:34,992 INFO cli.py:300 -- Request the job to be stopped: 2025-08-20 08:29:34,992 INFO cli.py:302 -- ray job stop test-ray-job-82mm7 2025-08-20 08:29:35,003 INFO cli.py:312 -- Tailing logs until the job exits (disable with --no-wait): 2025-08-20 08:29:34,982 INFO job_manager.py:531 -- Runtime env is setting up. Starting training task! Commencing training! 2025-08-20 08:30:08,498 INFO worker.py:1606 -- Using address 10.76.0.17:6379 set in the environment variable RAY_ADDRESS 2025-08-20 08:30:08,506 INFO worker.py:1747 -- Connecting to existing Ray cluster at address: 10.76.0.17:6379... 2025-08-20 08:30:08,527 INFO worker.py:1918 -- Connected to Ray cluster. View the dashboard at 10.76.0.17:8265 2025-08-20 08:30:08,701 INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `<FrameworkTrainer>(...)`. 2025-08-20 08:30:08,951 WARNING tune_controller.py:2132 -- The maximum number of pending trials has been automatically set to the number of available cluster CPUs, which is high (519 CPUs/pending trials). If you're running an experiment with a large number of trials, this could lead to scheduling overhead. In this case, consider setting the `TUNE_MAX_PENDING_TRIALS_PG` environment variable to the desired maximum number of concurrent pending trials. 2025-08-20 08:30:08,953 WARNING tune_controller.py:2132 -- The maximum number of pending trials has been automatically set to the number of available cluster CPUs, which is high (519 CPUs/pending trials). If you're running an experiment with a large number of trials, this could lead to scheduling overhead. In this case, consider setting the `TUNE_MAX_PENDING_TRIALS_PG` environment variable to the desired maximum number of concurrent pending trials. View detailed results here: YOUR_GCS_BUCKET/gemma_vision_train_2025_08_20_08_30_07 To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2025-08-20_04-43-14_215096_1/artifacts/2025-08-20_08-30-08/gemma_vision_train_2025_08_20_08_30_07/driver_artifacts` Training started with configuration: ╭──────────────────────────────────────────────────────────────────────╮ │ Training config │ ├──────────────────────────────────────────────────────────────────────┤ │ train_loop_config/dataset_name ...-descriptions-vlm │ │ train_loop_config/gcs_bucket ...-bucket-yooo-west │ │ train_loop_config/gradient_accumulation_steps 4 │ │ train_loop_config/learning_rate 0.0002 │ │ train_loop_config/logging_steps 10 │ │ train_loop_config/lora_alpha 16 │ │ train_loop_config/lora_dropout 0.05 │ │ train_loop_config/lora_r 16 │ │ train_loop_config/max_seq_length 512 │ │ train_loop_config/model_id google/gemma-3-4b-it │ │ train_loop_config/num_train_epochs 3 │ │ train_loop_config/output_dir ...-4b-seo-optimized │ │ train_loop_config/per_device_train_batch_size 1 │ │ train_loop_config/push_to_hub False │ │ train_loop_config/save_steps 100 │ │ train_loop_config/save_strategy epoch │ ╰──────────────────────────────────────────────────────────────────────╯ (RayTrainWorker pid=45455, ip=10.76.0.71) Setting up process group for: env:// [rank=0, world_size=16] (TorchTrainer pid=45197, ip=10.76.0.71) Started distributed worker processes: (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=4c934ab2f646a578b03cc335586f30b943e811b645526a74c50bfca1, ip=10.76.0.71, pid=45455) world_rank=0, local_rank=0, node_rank=0 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=4c934ab2f646a578b03cc335586f30b943e811b645526a74c50bfca1, ip=10.76.0.71, pid=45450) world_rank=1, local_rank=1, node_rank=0 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=4c934ab2f646a578b03cc335586f30b943e811b645526a74c50bfca1, ip=10.76.0.71, pid=45454) world_rank=2, local_rank=2, node_rank=0 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=4c934ab2f646a578b03cc335586f30b943e811b645526a74c50bfca1, ip=10.76.0.71, pid=45448) world_rank=3, local_rank=3, node_rank=0 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=4c934ab2f646a578b03cc335586f30b943e811b645526a74c50bfca1, ip=10.76.0.71, pid=45453) world_rank=4, local_rank=4, node_rank=0 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=4c934ab2f646a578b03cc335586f30b943e811b645526a74c50bfca1, ip=10.76.0.71, pid=45452) world_rank=5, local_rank=5, node_rank=0 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=4c934ab2f646a578b03cc335586f30b943e811b645526a74c50bfca1, ip=10.76.0.71, pid=45451) world_rank=6, local_rank=6, node_rank=0 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=4c934ab2f646a578b03cc335586f30b943e811b645526a74c50bfca1, ip=10.76.0.71, pid=45449) world_rank=7, local_rank=7, node_rank=0 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=c0db52b44f891f3d6a1cedcbea4c6beb2c8434c66ef414dc15e65743, ip=10.76.0.135, pid=45729) world_rank=8, local_rank=0, node_rank=1 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=c0db52b44f891f3d6a1cedcbea4c6beb2c8434c66ef414dc15e65743, ip=10.76.0.135, pid=45726) world_rank=9, local_rank=1, node_rank=1 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=c0db52b44f891f3d6a1cedcbea4c6beb2c8434c66ef414dc15e65743, ip=10.76.0.135, pid=45728) world_rank=10, local_rank=2, node_rank=1 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=c0db52b44f891f3d6a1cedcbea4c6beb2c8434c66ef414dc15e65743, ip=10.76.0.135, pid=45727) world_rank=11, local_rank=3, node_rank=1 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=c0db52b44f891f3d6a1cedcbea4c6beb2c8434c66ef414dc15e65743, ip=10.76.0.135, pid=45725) world_rank=12, local_rank=4, node_rank=1 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=c0db52b44f891f3d6a1cedcbea4c6beb2c8434c66ef414dc15e65743, ip=10.76.0.135, pid=45724) world_rank=13, local_rank=5, node_rank=1 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=c0db52b44f891f3d6a1cedcbea4c6beb2c8434c66ef414dc15e65743, ip=10.76.0.135, pid=45723) world_rank=14, local_rank=6, node_rank=1 (TorchTrainer pid=45197, ip=10.76.0.71) - (node_id=c0db52b44f891f3d6a1cedcbea4c6beb2c8434c66ef414dc15e65743, ip=10.76.0.135, pid=45722) world_rank=15, local_rank=7, node_rank=1 ... Training finished iteration 3 at 2025-08-20 08:40:43. Total running time: 10min 34s ╭─────────────────────────────────────────╮ │ Training result │ ├─────────────────────────────────────────┤ │ checkpoint_dir_name checkpoint_000002 │ │ time_this_iter_s 152.6374 │ │ time_total_s 525.88585 │ │ training_iteration 3 │ │ epoch 2.75294 │ │ grad_norm 47.27161 │ │ learning_rate 0.0002 │ │ loss 22.5275 │ │ mean_token_accuracy 0.90325 │ │ num_tokens 1583017. │ │ step 60 │ ╰─────────────────────────────────────────╯ ... Training completed after 3 iterations at 2025-08-20 08:40:52. Total running time: 10min 43s 2025-08-20 08:40:53,113 INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to 'YOUR_GCS_BUCKET/gemma_vision_train_2025_08_20_08_30_07' in 0.1663s. 2025-08-20 08:40:58,304 SUCC cli.py:65 -- ---------------------------------- 2025-08-20 08:40:58,305 SUCC cli.py:66 -- Job 'test-ray-job-82mm7' succeeded 2025-08-20 08:40:58,305 SUCC cli.py:67 -- ----------------------------------
Monitor your workload
You can use the dashboard in Ray to monitor the workloads that are scheduled in your cluster.
To access this dashboard, you need to set up port-forwarding to your cluster by running the following command in a new terminal window:
kubectl port-forward service/gemma3-tuning-head-svc 8265:8265 > fwd.log 2>&1 &
Open the following link in your browser:
[http://localhost:8265](http://localhost:8265)
.Optionally, if you're using Cloud Shell, after you run the command in the previous step, you can click the Web Preview button, as shown in the following image:
Select the Change port option, enter
8265
, and then click Change and Preview. The Ray Dashboard opens in a new tab.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete your project
Delete a Google Cloud project:
gcloud projects delete PROJECT_ID
Delete your resources
To delete the Ray Cluster and release the GPU-powered node, run the following:
kubectl delete -f ray_cluster.yaml
GKE automatically scales down your cluster and releases the A4 machines used by Ray.
To delete the entire GKE cluster, run the following:
gcloud container clusters delete $CLUSTER_NAME \ --region=$REGION