Scale Ray clusters on Vertex AI

As your workloads surge, or decrease, on your Ray clusters on Vertex AI, you can manually scale the number of replicas to match demand. For example, if you have excess capacity you can scale down your worker pools to save costs. This page describes how to change the number of replicas for existing worker pools.


When you scale clusters, you can change only the number of replicas in your existing worker pools. You can't, for example, add or remove worker pools from your cluster or change the machine type of your worker pools. Also, the number of replicas for your worker pools can't be lower than one.

If you are using a VPC peering connection to connect to your clusters, there's a limitation on the maximum number of nodes. The maximum number of nodes depends on the number of nodes the cluster had when it was created. For more information, see Max number of nodes calculation. This maximum number includes not just your worker pools but also your head node. If you use the default network configuration, the number of nodes cannot exceed the maximums as described in the create clusters documentation.

Max number of nodes calculation

If you're using private services access (VPC peering) to connect to your nodes, use the following formulas to check that you don't exceed the maximum number of nodes (M), assuming f(x) = min(29, (32 - ceiling(log2(x))):

  • f(2 * M) = f(2 * N)
  • f(64 * M) = f(64 * N)
  • f(max(32, 16 + M)) = f(max(32, 16 + N))

The maximum total number of nodes in the Ray on Vertex AI cluster you can scale up to (M) depends on the initial total number of nodes you set up (N). After you create the Ray on Vertex AI cluster, you can scale the total number of nodes to any amount between P and M inclusive, where P is the number of pools in your cluster.

Update replica count

You can use the Google Cloud console or Vertex AI SDK for Python to update your worker pool's replica count. If your cluster includes multiple worker pools, you can individually change each of their replica counts in a single request.


  1. In the Google Cloud console, go to the Ray on Vertex AI page.

    Go to the Ray on Vertex AI page

  2. From the list of clusters, click the cluster to modify.

  3. On the Cluster details page, click Edit cluster.

  4. In the Edit cluster pane, select the worker pool to update and then modify the replica count.

  5. Click Update.

    Wait a few minutes for your cluster to update. When the update is complete, you can see the updated replica count on the Cluster details page.

Ray on Vertex AI SDK

import vertexai
import vertex_ray

cluster = vertex_ray.get_ray_cluster("CLUSTER_NAME")

# Get the resource name.
cluster_resource_name = cluster.cluster_resource_name

# Create the new worker pools
new_worker_node_types = []
for worker_node_type in cluster.worker_node_types:
 worker_node_type.node_count = REPLICA_COUNT # new worker pool size

# Make update call
updated_cluster_resource_name = vertex_ray.update_ray_cluster(