Ray is an open-source framework for scaling AI and Python applications. Ray provides the infrastructure to perform distributed computing and parallel processing for your machine learning (ML) workflow.
If you already use Ray, you can use the same open source Ray code to write programs and develop applications on Vertex AI with minimal changes. You can then use Vertex AI's integrations with other Google Cloud services such as Vertex AI Prediction and BigQuery as part of your machine learning workflow.
If you already use Vertex AI and need a simpler way to manage compute resources, you can use Ray code to scale training.
Workflow for using Ray on Vertex AI
Use Colab Enterprise and Vertex AI SDK for Python to connect to the Ray Cluster.
Steps | Description |
---|---|
1. Set up for Ray on Vertex AI | Set up your Google project, install the version of the Vertex AI SDK for Python that includes the functionality of the Ray Client, and set up a VPC peering network, which is optional. |
2. Create a Ray cluster on Vertex AI | Create a Ray cluster on Vertex AI. The Vertex AI Administrator role is required. |
3. Develop a Ray application on Vertex AI | Connect to a Ray cluster on Vertex AI and develop an application. Vertex AI User role is required. |
4. (Optional) Use Ray on Vertex AI with BigQuery | Read, write, and transform data with BigQuery. |
5. (Optional) Deploy a model on Vertex AI and get predictions | Deploy a model to a Vertex AI online endpoint and get predictions. |
6. Monitor your Ray cluster on Vertex AI | Monitor generated logs in Cloud Logging and metrics in Cloud Monitoring. |
7. Delete a Ray cluster on Vertex AI | Delete a Ray cluster on Vertex AI to avoid unnecessary billing. |
Overview
Ray clusters are built in to ensure capacity availability for critical ML workloads or during peak seasons. Unlike custom jobs, where the training service releases the resource after job completion, Ray clusters remain available until deleted.
Note: Use long running Ray clusters in these scenarios:
- If you are submitting the same Ray job multiple times and can benefit from data and image caching by running the jobs on the same long running Ray clusters.
- If you run many short-lived Ray jobs where the actual processing time is shorter than the job startup time, it may be beneficial to have a long-running cluster.
Ray clusters on Vertex AI can be set up either with public or private connectivity. The following diagrams show the architecture and workflow for Ray on Vertex AI. See Public or private connectivity for more information.
Architecture with public connectivity
Create the Ray cluster on Vertex AI using the following options:
a. Use the Google Cloud console to create the Ray cluster on Vertex AI.
b. Create the Ray cluster on Vertex AI using the Vertex AI SDK for Python.
Connect to the Ray cluster on Vertex AI for interactive development using the following options:
a. Use Colab Enterprise in the Google Cloud console for seamless connection.
b. Use any Python environment accessible to the public internet.
Develop your application and train your model on the Ray cluster on Vertex AI:
Use the Vertex AI SDK for Python in your preferred environment (Colab Enterprise or any Python notebook).
Write a Python script using your preferred environment.
Submit a Ray Job to the Ray cluster on Vertex AI using the Vertex AI SDK for Python, Ray Job CLI, or Ray Job Submission API.
Deploy the trained model to an online Vertex AI endpoint for live prediction.
Use BigQueryto manage your data.
Architecture with VPC
The following diagram shows the architecture and workflow for Ray on Vertex AI after you set up your Google Cloud project and VPC network, which is optional:
Set up your (a) Google project and (b) VPC network.
Create the Ray cluster on Vertex AI using the following options:
a. Use the Google Cloud console to create the Ray cluster on Vertex AI.
b. Create the Ray cluster on Vertex AI using the Vertex AI SDK for Python.
Connect to the Ray cluster on Vertex AI through a VPC peered network using the following options:
Use Colab Enterprise in the Google Cloud console.
Use a Vertex AI Workbench notebook.
Develop your application and train your model on the Ray cluster on Vertex AI using the following options:
Use the Vertex AI SDK for Python in your preferred environment (Colab Enterprise or a Vertex AI Workbench notebook).
Write a Python script using your preferred environment. Submit a Ray Job to the Ray cluster on Vertex AI using the Vertex AI SDK for Python, Ray Job CLI, or Ray dashboard.
Deploy the trained model to an online Vertex AI endpoint for predictions.
Use BigQuery to manage your data.
Pricing
Pricing for Ray on Vertex AI is calculated as follows:
The compute resources you use are charged based on the machine configuration you select when creating your Ray cluster on Vertex AI. For Ray on Vertex AI pricing, see the pricing page.
Regarding Ray clusters, you are only charged during RUNNING and UPDATING states. No other states are charged. The amount charged is based on the actual cluster size at the moment.
When you perform tasks using the Ray cluster on Vertex AI, logs are automatically generated and charged based on Cloud Logging pricing.
If you deploy your model to an endpoint for online predictions, see the "Prediction and explanation" section of the Vertex AI pricing page.
If you use BigQuery with Ray on Vertex AI, see BigQuery pricing.