This page describes how to provision graphics processing unit (GPU) resources on a container to run artificial intelligence (AI) and machine learning (ML) workloads in a GPU environment. This page also walks you through enabling the Vertex AI pre-trained APIs on Google Distributed Cloud (GDC) air-gapped so you can start implementing Vertex AI capabilities.
Most tasks to configure GPU resources and enable or deactivate Vertex AI pre-trained APIs require administrator access. If you lack the necessary permissions, ask your administrator to enable GPUs and the Vertex AI pre-trained APIs on your behalf.
Vertex AI on Distributed Cloud includes three APIs, one for each of its pre-trained models. To learn more about these pre-trained models, see the following documentation:
- Optical Character Recognition (OCR): Learn about character recognition features.
- Speech-to-Text: Learn about speech recognition features.
- Vertex AI Translation: Learn about translation features.
Vertex AI on GDC also includes the following services, which provide their own APIs:
- Online Prediction: Learn about online predictions.
- Vertex AI Workbench: Learn about Vertex AI Workbench.
Use the GDC console to enable, deactivate, and view the endpoints of the Vertex AI pre-trained APIs.
Before you begin
To get the permissions that you need to provision GPUs and enable pre-trained APIs, ask your Organization IAM Admin or Project IAM Admin to grant you the following roles in your project namespace:
- To create a Kubernetes cluster with GPUs, obtain the User Cluster Admin
(
user-cluster-admin
) role. - To enable Vertex AI pre-trained APIs, obtain the AI Platform
Admin (
ai-platform-admin
) role in the project namespace.
For information about these roles, see Prepare IAM permissions. To learn how to grant permissions to a subject, see Grant and revoke access.
Follow these steps to provision GPUs before enabling the pre-trained APIs:
- Set up the GDC domain name system (DNS). If you haven't set up the DNS, work with your Infrastructure Operator (IO) to complete this prerequisite.
- Set up a project to use Vertex AI.
- Ensure that your project has the adequate ingress communication configured. For more information, see Configure a project network policy.
- Create a cluster that supports GPU container workloads. GPU resources on a Kubernetes cluster let developers run AI and ML models.
- Allocate GPU machines for the correct cluster types.
- Configure your containers to use GPU resources for your workloads.
- Sign in to the GDC console. If you can't sign in, see Connect to an identity provider.
Enable pre-trained APIs
You can enable the OCR, Speech-to-Text, and Vertex AI Translation pre-trained APIs using the GDC console.
After meeting the prerequisites, follow these steps to enable the pre-trained APIs:
- Sign in to the GDC console.
- In the navigation menu, click Vertex AI > Pre-trained APIs.
On the Pre-trained APIs page, click Enable on a specific service to enable that API.
In the confirmation dialog, click Enable. A progress message displays.
The enablement duration varies. It might take between 15 and 45 minutes to finish, depending on the state of the cluster.
If you want to view the status of the pre-trained APIs, view the service status and endpoints.
The VAI-A0001 alert (Enabling State Time Limit Reached
) triggers if the
services take a long time to be enabled. In this case, your IO must review the
VAI-R0001 runbook for details.
Deactivate pre-trained APIs
You can deactivate the OCR, Speech-to-Text, and Vertex AI Translation pre-trained APIs using the GDC console.
After meeting the prerequisites, follow these steps to deactivate the pre-trained APIs:
- Sign in to the GDC console.
- In the navigation menu, click Vertex AI > Pre-trained APIs.
On the Pre-trained APIs page, click Disable on a specific service to deactivate that API.
In the confirmation dialog, enter
disable
in the text field to confirm that you want to take that action. Then, click Disable. A progress message displays.
If you want to view the status of the pre-trained APIs, view the service status and endpoints.