Model inference overview

This document describes the types of batch inference that BigQuery ML supports, which include:

Machine learning inference is the process of running data points into a machine learning model to calculate an output such as a single numerical score. This process is also referred to as "operationalizing a machine learning model" or "putting a machine learning model into production."


The following sections describe the available ways of performing prediction in BigQuery ML.

Inference using BigQuery ML trained models

Prediction in BigQuery ML is used not only for supervised learning models, but also unsupervised learning models. It is used only for models trained with independent and identically distributed (IID) data. For time series data that is non-IID, the term forecasting is used. See the forecasting section below.

BigQuery ML supports prediction functionalities through the ML.PREDICT function, with the following models:

Model Category Model Types What ML.PREDICT does
Supervised Learning Linear & logistic regression

Boosted trees

Random forest

Deep Neural Networks


AutoML Tables
Predict the label, either a numerical value for regression tasks or a categorical value for classification tasks.
Unsupervised Learning K-means Assign the cluster to the entity.
PCA Apply dimensionality reduction to the entity by transforming it into the space spanned by the eigenvectors.
Autoencoder Transform the entity into the embedded space.

For Matrix Factorization model inference, see Recommendation.

Inference using imported models

With this approach, you create and train a model outside of BigQuery, import it by using the CREATE MODEL statement, and then run inference on it by using the ML.PREDICT function. All inference processing occurs in BigQuery, using data from BigQuery. Imported models can perform supervised or unsupervised learning.

BigQuery ML supports the following types of imported models:

Use this approach to make use of custom models developed with a range of ML frameworks while taking advantage of BigQuery ML's inference speed and co-location with data.

To learn more, try one of the following tutorials:

Inference using remote models

With this approach, you can create a reference to a model hosted in Vertex AI Prediction by using the CREATE MODEL statement, and then run inference on it by using the ML.PREDICT function. All inference processing occurs in Vertex AI, using data from BigQuery. Remote models can perform supervised or unsupervised learning.

Use this approach to run inference against large models that require the GPU hardware support provided by Vertex AI. If most of your models are hosted by Vertex AI, this also lets you run inference against these models by using SQL, without having to manually build data pipelines to take data to Vertex AI and bring prediction results back to BigQuery.

For step-by-step instructions, see Make predictions with remote models on Vertex AI.


Forecasting is a technique that uses historical data as inputs to make informed estimates into the future. In BigQuery ML, forecasting is applied to time series data. For IID data, see prediction.

BigQuery ML supports forecasting functionalities through the ML.FORECAST function, with the ARIMA_PLUS and ARIMA_PLUS_XREG models. The time series model is not a single model, but a time series modeling pipeline that includes multiple models and algorithms. See the time series modeling pipeline for more details.


Recommender systems are one of the most successful and widespread applications of machine learning technologies for businesses. A recommendation system helps users find compelling content in a large body of work. For example, Google Play Store provides millions of apps, while YouTube provides billions of videos, with more apps and videos added every day. To find new compelling new content users can use search, but a recommendation engine can display content that users might not have thought to search for on their own. See the Recommendation Systems Overview for more information.

Machine learning algorithms in recommender systems are typically classified into two categories: content-based and collaborative filtering methods.

content-based filtering Uses similarity between items to recommend items similar to what the user likes. If user A watches two cute cat videos, then the system can recommend cute animal videos to that user.
collaborative filtering Uses similarities between queries and items simultaneously to provide recommendations. If user A is similar to user B, and user B likes video 1, then the system can recommend video 1 to user A (even if user A hasn't seen any videos similar to video 1).

The Matrix Factorization model is widely used as a collaborative filtering method for recommendation systems. BigQuery ML supports the ML.RECOMMEND function to facilitate using Matrix Factorization for recommendation purposes. For more information about applying Matrix Factorization to recommendation, see Matrix Factorization.

In modern recommendation engines, Deep neural network (DNN) models, including Wide-and-Deep models, are widely used. It can be viewed as an extension of Matrix Factorization based collaborative filtering. It can incorporate query features and item features to improve the relevance of recommendations. For more background, read Recommendation using Deep Neural Network Models, Deep Neural Networks for YouTube Recommendations, or Wide & Deep Learning for Recommender Systems. It is also worthwhile to call out that any supervised learning models can be used for recommendation tasks.

Anomaly detection

Anomaly detection is a step in data mining that identifies data points, events, and observations that deviate from a dataset's normal behavior. Anomalous data can indicate critical incidents such as technical issues or opportunities like changes in consumer behavior.

One challenge with anomaly detection is identifying and defining the anomaly. Labeled data with known anomalies let you choose between supervised machine learning model types that are already supported in BigQuery ML. Without either a known anomaly type or labeled data, you can still use unsupervised machine learning to help detect anomalies. Depending upon whether or not your training data is time series, you can detect anomalies in training data or in new input data using the ML.DETECT_ANOMALIES function with the following models:

Data Type Model Types What ML.DETECT_ANOMALIES does
Time series ARIMA_PLUS Detect the anomalies in the time series.
Independent and identically distributed random variables (IID) K-means Detect anomalies based on the shortest distance among the normalized distances from the input data to each cluster centroid. For a definition of normalized distances, see ML.DETECT_ANOMALIES.
Autoencoder Detect anomalies based upon the reconstruction loss in terms of mean squared error. For more information, see ML.RECONSTRUCTION_LOSS. ML.RECONSTRUCTION_LOSS can retrieve all types of reconstruction loss.
PCA Detect anomalies based upon the reconstruction loss in terms of mean squared error.

Online predictions

Only batch inference can be performed in BigQuery ML. To make models available for online predictions, you can manage BigQuery ML models in Vertex AI. This lets you deploy models as Vertex AI endpoints without having to export them from BigQuery ML. Managing models in Vertex AI lets you use Vertex AI MLOps capabilities, and gives you access to Vertex AI features such as Vertex AI Feature Store. You can also export models to Cloud Storage to make them available to other model hosting platforms.

What's next