Online predictions
Online predictions are synchronous requests made to a model endpoint. Use online predictions when you are making requests in response to application input or in situations that require timely inference.
Model deployment
You must deploy a model to an endpoint before that model can be used to serve online predictions. Deploying a model associates physical resources with the model so it can serve online predictions with low latency.
You can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. For more information about options and use cases for deploying models, see Considerations for deploying models.
To learn how to deploy an AutoML model, see the Get predictions from AutoML models section of this page and select the page that's relevant to your model.
To learn how to deploy a custom trained model, see Get predictions from a custom trained model.
Batch predictions
Batch predictions are asynchronous requests. You request batch predictions directly from the model resource without needing to deploy the model to an endpoint. Use batch predictions when you don't require an immediate response and want to process accumulated data by using a single request.
Get predictions from AutoML models
You can get online or batch predictions from AutoML models by using the Google Cloud console or the Vertex AI API. The instructions for how to do this slightly vary based on your data type and model objective:
Image
Learn how to get predictions from the following types of image AutoML models:
Tabular
Learn how to get predictions from the following types of tabular AutoML models:
Tabular classification/regression models
Tabular forecasting models (batch predictions only)
Text
Learn how to get predictions from the following types of text AutoML models:
Video
Learn how to get predictions from the following types of video AutoML models:
- Video action recognition models (batch predictions only)
- Video classification models (batch predictions only)
- Video object tracking models (batch predictions only)
Get predictions from custom trained models
The instructions on how to get online and batch predictions from your custom trained model are the same, regardless of your data type or model objective.
For details, see Get predictions from a custom trained model.