Overview of getting predictions on Vertex AI

A prediction is the output of a trained machine learning model. This page provides an overview of the workflow for getting predictions from your models on Vertex AI.

Vertex AI offers two methods for getting prediction:

Online predictions are synchronous requests made to a model that is deployed to an endpoint. Therefore, before sending a request, you must first deploy the Model resource to an endpoint. This associates compute resources with the model so that it can serve online predictions with low latency. Use online predictions when you are making requests in response to application input or in situations that require timely inference.
Batch predictions are asynchronous requests made to a model that isn't deployed to an endpoint. You send the request (as a BatchPredictionsJob resource) directly to the Model resource. Use batch predictions when you don't require an immediate response and want to process accumulated data by using a single request.

Get predictions from custom trained models

To get predictions, you must first import your model. After it's imported, it becomes a Model resource that is visible in Vertex AI Model Registry.

Then, read the following documentation to learn how to get predictions:

Get batch predictions

Or
Deploy model to endpoint and get online predictions.

Get predictions from AutoML models

Unlike custom trained models, AutoML models are automatically imported into the Vertex AI Model Registry after training.

Other than that, the workflow for AutoML models is similar, but varies slightly based on your data type and model objective. The documentation for getting AutoML predictions is located alongside the other AutoML documentation. Here are links to the documentation:

Image

Learn how to get predictions from the following types of image AutoML models:

Tabular

Learn how to get predictions from the following types of tabular AutoML models:

Tabular classification and regression models
- Online predictions
- Batch predictions
Tabular forecasting models (batch predictions only)

Text

Learn how to get predictions from the following types of text AutoML models:

Video

Learn how to get predictions from the following types of video AutoML models:

Video action recognition models (batch predictions only)
Video classification models (batch predictions only)
Video object tracking models (batch predictions only)

Get predictions from BigQuery ML models

There are two ways to get predictions from BigQuery ML models:

You can request batch predictions directly from the model in BigQuery ML.
You can register the models directly with the Model Registry, without exporting them from BigQuery ML or importing them into the Model Registry.