BigQuery ML model evaluation overview
This document describes how BigQuery ML supports machine learning (ML) model evaluation.
Overview of model evaluation
You can use ML model evaluation metrics for the following purposes:
- To assess the quality of the fit between the model and the data.
- To compare different models.
- To predict how accurately you can expect each model to perform on a specific dataset, in the context of model selection.
Supervised and unsupervised learning model evaluations work differently:
- For supervised learning models, model evaluation is well-defined. An evaluation set, which is data that hasn't been analyzed by the model, is typically excluded from the training set and then used to evaluate model performance. We recommend that you don't use the training set for evaluation because this causes the model to perform poorly when generalizing the prediction results for new data. This outcome is known as overfitting.
- For unsupervised learning models, model evaluation is less defined and typically varies from model to model. Because unsupervised learning models don't reserve an evaluation set, the evaluation metrics are calculated using the whole input dataset.
For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.
Model evaluation offerings
BigQuery ML provides the following functions to calculate evaluation metrics for ML models:
Model category | Model types | Model evaluation functions | What the function does |
---|---|---|---|
Supervised learning | Linear regression Boosted trees regressor Random forest regressor DNN regressor Wide-and-deep regressor AutoML Tables regressor |
ML.EVALUATE |
Reports the following metrics:
|
Logistic regression Boosted trees classifier Random forest classifier DNN classifier Wide-and-deep classifier AutoML Tables classifier |
ML.EVALUATE |
Reports the following metrics:
|
|
ML.CONFUSION_MATRIX |
Reports the confusion matrix. | ||
ML.ROC_CURVE |
Reports metrics for different threshold values, including the following:
Only applies to binary-class classification models. |
||
Unsupervised learning | K-means | ML.EVALUATE |
Reports the Davies-Bouldin index, and the mean squared distance between data points and the centroids of the assigned clusters. |
Matrix factorization | ML.EVALUATE |
For explicit feedback-based models, reports the following metrics:
|
|
For implicit feedback-based models, reports the following metrics:
|
|||
PCA | ML.EVALUATE |
Reports the total explained variance ratio. | |
Autoencoder | ML.EVALUATE |
Reports the following metrics:
|
|
Time series | ARIMA_PLUS | ML.EVALUATE
| Reports the following metrics:
This function requires new data as input. |
ML.ARIMA_EVALUATE
| Reports the following metrics for all ARIMA candidate models
characterized by different (p, d, q, has_drift) tuples:
It also reports other information about seasonality, holiday effects, and spikes-and-dips outliers. This function doesn't require new data as input. |
Automatic evaluation in CREATE MODEL
statements
BigQuery ML supports automatic evaluation during model creation. Depending on the model type, the data split training options, and whether you're using hyperparameter tuning, the evaluation metrics are calculated upon the reserved evaluation dataset, the reserved test dataset, or the entire input dataset.
For k-means, PCA, autoencoder, and ARIMA_PLUS models, BigQuery ML uses all of the input data as training data, and evaluation metrics are calculated against the entire input dataset.
For linear and logistic regression, boosted tree, random forest, DNN, Wide-and-deep, and matrix factorization models, evaluation metrics are calculated against the dataset that's specified by the following
CREATE MODEL
options:When you train these types of models using hyperparameter tuning, the
DATA_SPLIT_TEST_FRACTION
option also helps define the dataset that the evaluation metrics are calculated against. For more information, see Data split.For AutoML Tables models, see how data splits are used for training and evaluation.
To get evaluation metrics calculated during model creation, use evaluation
functions such as ML.EVALUATE
on the model with no input data specified.
For an example, see
ML.EVALUATE
with no input data specified.
Evaluation with a new dataset
After model creation, you can specify new datasets for evaluation. To provide
a new dataset, use evaluation functions like ML.EVALUATE
on the model with
input data specified. For an example, see
ML.EVALUATE
with a custom threshold and input data.