Develop an application on the Ray on Vertex AI cluster

You can connect to a Ray on Vertex AI cluster and develop an application using the following methods:

  • Connect to the Ray on Vertex AI cluster using the Ray on Vertex AI SDK. Use this option if you prefer an interactive Python development environment.

    • Use the Ray on Vertex AI SDK within the Colab Enterprise notebook in the console.

    • Use the Ray on Vertex AI SDK within a Python session, shell, or Jupyter notebook.

  • Write a Python script and submit the script to the Ray on Vertex AI cluster using the Ray Jobs API. Use this option if you would rather submit jobs programmatically.

Before you begin, make sure to read the Ray on Vertex AI overview and set up all the prerequisite tools you need.

Develop an application using the Vertex AI SDK

To connect to the Ray on Vertex AI cluster using the Ray on Vertex AI SDK, the connecting environment must be on the same peered VPC network.

Console

  1. In the Google Cloud console, go to the Ray on Vertex AI page.

    Go to the Ray on Vertex AI page

  2. In the row for the cluster you created, Click Open in Colab Enterprise.

  3. The Colab Enterprise notebook opens. Follow the instructions on how to use the Ray on Vertex AI SDK to connect to the Ray on Vertex AI cluster.

    • If a pop-up screen asks you to enable APIs, click Enable.

    • Click Connect if you're connecting to the cluster for the first time, or Re-connect if you're re-connecting to the cluster. The notebook takes a few minutes to connect to the Runtime.

    • Run the Getting started code cell to import the Vertex AI SDK and connect to the Ray on Vertex AI cluster.

Python

From an interactive Python environment:

import ray

# Necessary even if aiplatform.* symbol is not directly used in your program.
from google.cloud import aiplatform

# The CLUSTER_RESOURCE_NAME is the one returned from vertex_ray.create_ray_cluster.
CLUSTER_RESOURCE_NAME='projects/{}/locations/{}/persistentResources/{}'.format(PROJECT_NUMBER, REGION, CLUSTER_NAME)

ray.init('vertex_ray://{}'.format(CLUSTER_RESOURCE_NAME))

Where:

  • REGION: The region you specified for your Ray on Vertex AI cluster.

  • PROJECT_NUMBER: Your Google Cloud project number.

  • CLUSTER_NAME: The name of your Ray on Vertex AI cluster, specified when you created the cluster.

You should get output similar to the following:

Python version:  3.10.12
Ray version: 2.4
Vertex SDK version: 1.34.0
Dashboard: xxxx-dot-us-central1.aiplatform-training.googleusercontent.com
Interactive Terminal Uri: yyyy-dot-us-central1.aiplatform-training.googleusercontent.com
Cluster Name: ray-cluster-zzzz

You can use the Dashboard url to access the Ray dashboard from a browser. The URI is in the format of https://xxxx-dot-us-central1.aiplatform-training.googleusercontent.com/. The dashboard shows submitted jobs, the number of GPU or CPUs, and disk space of each machine in the cluster.

Once you're connected to the Ray on Vertex AI cluster you can develop a Ray program the same way you would develop it for a normal OSS Ray backend.

@ray.remote
def square(x):
  print(x)
  return x * x

# Launch four parallel square tasks.
futures = [square.remote(i) for i in range(4)]

print(ray.get(futures))
# Returns [0, 1, 4, 9]

Develop an application using the Ray Jobs API

This section describes how to submit a Python program to the Ray on Vertex AI cluster using the Ray Jobs API.

Write a Python script

Develop your application as a Python script in any text editor. For example, place the following script in a my_script.py file:

import ray
import time

@ray.remote
def hello_world():
    return "hello world"

@ray.remote
def square(x):
    print(x)
    time.sleep(100)
    return x * x

ray.init()  # No need to specify address="vertex_ray://...."
print(ray.get(hello_world.remote()))
print(ray.get([square.remote(i) for i in range(4)]))

Submit a Ray job using the Ray Jobs API

You can submit a Ray job using Python, the Ray Jobs CLI, or the public Ray dashboard address.

Python

Within the VPC peered network, submit a Ray job using a Python environment:

import ray
from ray.job_submission import JobSubmissionClient
from google.cloud import aiplatform  # Necessary even if aiplatform.* symbol is not directly used in your program.

CLUSTER_RESOURCE_NAME='projects/{}/locations/REGION/persistentResources/{}'.format(PROJECT_NUMBER, CLUSTER_NAME)

client = JobSubmissionClient("vertex_ray://{}".format(CLUSTER_RESOURCE_NAME))

job_id = client.submit_job(
  # Entrypoint shell command to execute
  entrypoint="python my_script.py",
  # Path to the local directory that contains the my_script.py file.
  runtime_env={
    "working_dir": "./directory-containing-my-script",
    "pip": ["numpy",
            "xgboost==1.7.6", # specific versions can be pinned or bounded
            "ray==2.4.0", # pin the Ray version to prevent it from being overwritten
           ]
  }
)

# Ensure that the Ray job has been created.
print(job_id)

Where:

  • REGION: The region you specified for your Ray on Vertex AI cluster.

  • PROJECT_NUMBER: Your Google Cloud project number.

  • CLUSTER_NAME: The name of your Ray on Vertex AI cluster, specified when you created the cluster.

Ray Jobs CLI

Note that you can only use the Ray Jobs CLI commands within the peered VPC network.

$ ray job submit --working-dir ./ --address vertex_ray://{CLUSTER_RESOURCE_NAME} -- python my_script.py

Ray dashboard

The Ray dashboard address is accessible from outside the VPC, including the public internet. Note that vertex_ray is required to obtain authentication automatically.

from ray.job_submission import JobSubmissionClient
import vertex_ray

DASHBOARD_ADDRESS=DASHBOARD_ADDRESS

client = JobSubmissionClient(
  "vertex_ray://{}".format(DASHBOARD_ADDRESS),
)

job_id = client.submit_job(
  # Entrypoint shell command to execute
  entrypoint="python my_script.py",
  # Path to the local directory that contains the my_script.py file
  runtime_env={
    "working_dir": "./directory-containing-my-script",
    "pip": ["numpy",
            "xgboost==1.7.6", # specific versions can be pinned or bounded
            "ray==2.4.0", # pin the Ray version to prevent it from being overwritten
           ]
  }
)
print(job_id)

Where:

DASHBOARD_ADDRESS: The Ray dashboard address for your cluster. You can find the dashboard address using the Vertex AI SDK.

What's next