Model inference overview

This document describes the types of batch inference that BigQuery ML supports, which include:

Batch prediction
Online prediction

Machine learning inference is the process of running data points into a machine learning model to calculate an output such as a single numerical score. This process is also referred to as "operationalizing a machine learning model" or "putting a machine learning model into production."

Batch prediction

The following sections describe the available ways of performing prediction in BigQuery ML.

Inference using BigQuery ML trained models

Prediction in BigQuery ML is used not only for supervised learning models, but also unsupervised learning models.

BigQuery ML supports prediction functionalities through the ML.PREDICT function, with the following models:

Model Category	Model Types	What `ML.PREDICT` does
Supervised Learning	Linear & logistic regression Boosted trees Random forest Deep Neural Networks Wide-and-Deep AutoML Tables	Predict the label, either a numerical value for regression tasks or a categorical value for classification tasks.
Unsupervised Learning	K-means	Assign the cluster to the entity.
	PCA	Apply dimensionality reduction to the entity by transforming it into the space spanned by the eigenvectors.
	Autoencoder	Transform the entity into the embedded space.

Inference using imported models

With this approach, you create and train a model outside of BigQuery, import it by using the CREATE MODEL statement, and then run inference on it by using the ML.PREDICT function. All inference processing occurs in BigQuery, using data from BigQuery. Imported models can perform supervised or unsupervised learning.

BigQuery ML supports the following types of imported models:

Open Neural Network Exchange (ONNX) for models trained in PyTorch, scikit-learn, and other popular ML frameworks.
TensorFlow
TensorFlow Lite
XGBoost

Use this approach to make use of custom models developed with a range of ML frameworks while taking advantage of BigQuery ML's inference speed and co-location with data.

To learn more, try one of the following tutorials:

Inference using remote models

With this approach, you can create a reference to a model hosted in Vertex AI Prediction by using the CREATE MODEL statement, and then run inference on it by using the ML.PREDICT function. All inference processing occurs in Vertex AI, using data from BigQuery. Remote models can perform supervised or unsupervised learning.

Use this approach to run inference against large models that require the GPU hardware support provided by Vertex AI. If most of your models are hosted by Vertex AI, this also lets you run inference against these models by using SQL, without having to manually build data pipelines to take data to Vertex AI and bring prediction results back to BigQuery.

For step-by-step instructions, see Make predictions with remote models on Vertex AI.

Online prediction

The built-in inference capability of BigQuery ML is optimized for large-scale use cases, such as batch prediction. While BigQuery ML delivers low latency inference results when handling small input data, you can achieve faster online prediction through seamless integration with Vertex AI.

You can manage BigQuery ML models within the Vertex AI environment, which eliminates the need to export models from BigQuery ML before deploying them as Vertex AI endpoints. By managing models within Vertex AI, you get access to all of the Vertex AI MLOps capabilities, and also to features such as Vertex AI Feature Store.

Additionally, you have the flexibility to export BigQuery ML models to Cloud Storage for availability on other model hosting platforms.

What's next

For more information about using Vertex AI models to generate text and embeddings, see Generative AI overview.
For more information about using Cloud AI APIs to perform AI tasks, see AI application overview.
For information about supported model types and SQL functions for each type of inference, see the End-to-end user journey for each model.