Create a Ray cluster on Vertex AI

You can create a Ray cluster on Vertex AI using the console or using the Vertex AI SDK for Python.

Before you begin, make sure to read the Ray on Vertex AI overview and set up all the prerequisite tools you need.

Console

  1. In the Google Cloud console, go to the Ray on Vertex AI page.

    Go to the Ray on Vertex AI page

  2. Click Create Cluster to open the Create Cluster panel.

  3. For each step in the Create Cluster panel, review or replace the default cluster information. Click Continue to complete each step:

    1. For Name and region, specify a Name and choose a Region for your cluster.

    2. For Compute settings, specify the configuration of the Ray cluster on the Vertex AI's head node, including its machine type, accelerator type and count, disk type and size, and replica count. Under Advanced options, you can specify the encryption key.

    3. For Networking, specify the VPC peering network you want to use with Ray on Vertex AI.

      If you haven't already set up private services access connection for your VPC network, click Set up connection. In the Create a private services access connection panel, complete and click Continue for each of the following steps:

      1. Enable the Service Networking API.

      2. For Allocate an IP range, you can select, create, or allow Google to automatically allocate an IP range.

      3. For Create a connection, review the Network and Allocated IP Range information.

      4. Click Create connection.

  4. Click Create.

Ray on Vertex AI SDK

From an interactive Python environment within the VPC network, use the following to create the Ray cluster on Vertex AI:

import ray
import vertex_ray
from google.cloud import aiplatform
from vertex_ray import Resources

# Define a default CPU cluster, machine_type is n1-standard-8, 1 head node and 1 worker node
head_node_type = Resources()
worker_node_types = [Resources()]

# Or define a GPU cluster.
head_node_type = Resources(
  machine_type="n1-standard-8",
  node_count=1,
)

worker_node_types = [Resources(
  machine_type="n1-standard-8",
  node_count=2,  # Can be > 1
  accelerator_type="NVIDIA_TESLA_K80",
  accelerator_count=1,
)]

aiplatform.init()
# Initialize Vertex AI to retrieve projects for downstream operations.
# Create the Ray cluster on Vertex AI
CLUSTER_RESOURCE_NAME = vertex_ray.create_ray_cluster(
  head_node_type=head_node_type,
  network=NETWORK,
  worker_node_types=worker_node_types,
  python_version="3.10",  # Optional
  ray_version="2.9",  # Optional
  cluster_name = CLUSTER_NAME
)

Where:

  • CLUSTER_NAME: A name for the Ray cluster on Vertex AI that must be unique across your project.

  • NETWORK is the full name of your peered VPC network, in the format of projects/PROJECT_NUMBER/global/networks/VPC_NAME.

  • PROJECT_NUMBER is your Google Cloud project number.

You should see the following output until the status changes to RUNNING:

[Ray on Vertex AI]: Cluster State = State.PROVISIONING
Waiting for cluster provisioning; attempt 1; sleeping for 0:02:30 seconds
...
[Ray on Vertex AI]: Cluster State = State.RUNNING

Note the following:

  • The first node is used as the Head node.

  • TPU machine types are not supported.

What's next