The ML.EVALUATE function
This document describes the ML.EVALUATE
function, which lets you
evaluate model metrics.
Syntax
ML.EVALUATE( MODEL `project_id.dataset.model` [, { TABLE `project_id.dataset.table` | (query_statement) }] STRUCT( [threshold_value AS threshold] [, perform_aggregation AS perform_aggregation] [, horizon_value AS horizon] [, confidence_level AS confidence_level] [, trial_id AS trial_id]))
Arguments
ML.EVALUATE
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.table
: The name of the input table that contains the evaluation data.If
table
is specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name provided during training. This value is provided using theinput_label_cols
option. Ifinput_label_cols
is unspecified, the column namedlabel
in the training data is used.query_statement
: a GoogleSQL query that is used to generate the evaluation data. For the supported SQL syntax of thequery_statement
clause in GoogleSQL, see Query syntax.If
query_statement
is specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name provided during training. This value is provided using theinput_label_cols
option. Ifinput_label_cols
is unspecified, the column namedlabel
in the training data is used. The extra columns are ignored.If you used the
TRANSFORM
clause in theCREATE MODEL
statement that created the model, then only the input columns present in theTRANSFORM
clause must appear inquery_statement
.threshold
: aFLOAT64
value that specifies a custom threshold for the binary-class classification model to use for evaluation. The default value is0.5
.A
0
value for precision or recall means that the selected threshold produced no true positive labels. ANaN
value for precision means that the selected threshold produced no positive labels, neither true positives nor false positives.If both
table_name
andquery_statement
are unspecified, you can't use a threshold.You can only use
threshold
with binary-class classification models.perform_aggregation
: aBOOL
value that indicates the level of evaluation for forecasting accuracy. If you specifyTRUE
, then the forecasting accuracy is on the time series level. If you specifyFALSE
, the forecasting accuracy is on the timestamp level. The default value isTRUE
.horizon
: anINT64
value that specifies the number of forecasted time points against which the evaluation metrics are computed. The default value is the horizon value specified in theCREATE MODEL
statement for the time series model, or1000
if unspecified. When evaluating multiple time series at the same time, this parameter applies to each time series.You can only use
horizon
when the model type isARIMA_PLUS
and eithertable_name
orquery_statement
is specified.confidence_level
: aFLOAT64
value that specifies the percentage of the future values that fall in the prediction interval. The default value is0.95
. The valid input range is[0, 1)
.You can only use
confidence_level
when the model type isARIMA_PLUS
, eithertable_name
orquery_statement
is specified, andperform_aggregation
is set toFALSE
. The value ofconfidence_level
affects theupper_bound
andlower_bound
values in the output.trial_id
: anINT64
value that identifies the hyperparameter tuning trial that you want the function to evaluate. The function uses the optimal trial by default. Only specify this argument if you ran hyperparameter tuning when creating the model.
Output
ML.EVALUATE
returns a single row containing metrics applicable to the
type of model specified.
precision
, recall
, f1_score
, log_loss
, and roc_auc
are macro-averaged
for all of the class labels. For a macro-average, metrics are calculated for each
label and then an unweighted average is taken of those values. accuracy
is
computed as a global total or micro-average. For a micro-average, the metric is
calculated globally by counting the total number of correctly predicted rows.
Regression models
Regression models include the following:
- Linear regression
- Boosted tree regressor
- Random forest regressor
- Deep neural network (DNN) regressor
- Wide & Deep regressor
- AutoML Tables regressor
ML.EVALUATE
returns the following columns for regression models:
trial_id
: anINT64
value that identifies the hyperparameter tuning trial. This column is only returned if you ran hyperparameter tuning when creating the model. This column doesn't apply for AutoML Tables models.mean_absolute_error
: aFLOAT64
value that contains the mean absolute error for the model.mean_squared_error
: aFLOAT64
value that contains the mean squared error for the model.mean_squared_log_error
: aFLOAT64
value that contains the mean squared logarithmic error for the model. The mean squared logarithmic error measures the distance between the actual and predicted values.median_absolute_error
: aFLOAT64
value that contains the median absolute error for the model.r2_score
: aFLOAT64
value that contains the R2 score for the model.explained_variance
: aFLOAT64
value that contains the explained variance for the model.
Classification models
Classification models include the following:
- Logistic regressor
- Boosted tree classifier
- Random forest classifier
- DNN classifier
- Wide & Deep classifier
- AutoML Tables classifier
ML.EVALUATE
returns the following columns for classification models:
trial_id
: anINT64
value that identifies the hyperparameter tuning trial. This column is only returned if you ran hyperparameter tuning when creating the model. This column doesn't apply for AutoML Tables models.precision
: aFLOAT64
value that contains the precision for the model.recall
: aFLOAT64
value that contains the recall for the model.accuracy
: aFLOAT64
value that contains the accuracy for the model.f1_score
: aFLOAT64
value that contains the F1 score for the model.log_loss
: aFLOAT64
value that contains the logistic loss for the model.roc_auc
: aFLOAT64
value that contains the area under the receiver operating characteristic curve for the model.
K-means models
ML.EVALUATE
returns the following columns for k-means models:
trial_id
: anINT64
value that identifies the hyperparameter tuning trial. This column is only returned if you ran hyperparameter tuning when creating the model.davies_bouldin_index
: aFLOAT64
value that contains the Davies-Bouldin Index for the model.mean_squared_distance
: aFLOAT64
value that contains the mean squared distance for the model, which is the average of the distances between training data points to their closest centroid.
Matrix factorization models
ML.EVALUATE
returns the following columns for matrix factorization models
with implicit feedback:
trial_id
: anINT64
value that identifies the hyperparameter tuning trial. This column is only returned if you ran hyperparameter tuning when creating the model.recall
: aFLOAT64
value that contains the recall for the model.mean_squared_error
: aFLOAT64
value that contains the mean squared error for the model.normalized_discounted_cumulative_gain
: aFLOAT64
value that contains the normalized discounted cumulative gain for the model.average_rank
: aFLOAT64
value that contains the average rank (PDF download) for the model.
ML.EVALUATE
returns the following columns for matrix factorization models
with explicit feedback:
trial_id
: anINT64
value that identifies the hyperparameter tuning trial. This column is only returned if you ran hyperparameter tuning when creating the model.mean_absolute_error
: aFLOAT64
value that contains the mean absolute error for the model.mean_squared_error
: aFLOAT64
value that contains the mean squared error for the model.mean_squared_log_error
: aFLOAT64
value that contains the mean squared logarithmic error for the model. The mean squared logarithmic error measures the distance between the actual and predicted values.mean_absolute_error
: aFLOAT64
value that contains the mean absolute error for the model.r2_score
: aFLOAT64
value that contains the R2 score for the model.explained_variance
: aFLOAT64
value that contains the explained variance for the model.
PCA models
ML.EVALUATE
returns the following column for PCA models:
total_explained_variance_ratio
: aFLOAT64
value that contains the percentage of the cumulative variance explained by all the returned principal components. For more information, see theML.PRINCIPAL_COMPONENT_INFO
function.
Time series models
ML.EVALUATE
returns the following columns for ARIMA_PLUS
or
ARIMA_PLUS_XREG
models when input data is provided and perform_aggregation
is FALSE
:
time_series_id_col
ortime_series_id_cols
: a value that contains the identifiers of a time series.time_series_id_col
can be anINT64
orSTRING
value.time_series_id_cols
can be anARRAY<INT64>
orARRAY<STRING>
value. Only present when forecasting multiple time series at once. The column names and types are inherited from theTIME_SERIES_ID_COL
option as specified in theCREATE MODEL
statement.ARIMA_PLUS_XREG
models don't support this column.time_series_timestamp_col
: aSTRING
value that contains the timestamp column for a time series. The column name and type are inherited from theTIME_SERIES_TIMESTAMP_COL
option as specified in theCREATE MODEL
statement.time_series_data_col
: aSTRING
value that contains the data column for a time series. The column name and type are inherited from theTIME_SERIES_DATA_COL
option as specified in theCREATE MODEL
statement.forecasted_time_series_data_col
: aSTRING
value that contains the same data astime_series_data_col
but withforecasted_
prefixed to the column name.lower_bound
: aFLOAT64
value that contains the lower bound of the prediction interval.upper_bound
: aFLOAT64
value that contains the upper bound of the prediction interval.absolute_error
: aFLOAT64
value that contains the absolute value of the difference between the forecasted value and the actual data value.absolute_percentage_error
: aFLOAT64
value that contains the absolute value of the absolute error divided by the actual value.
ML.EVALUATE
returns the following columns for ARIMA_PLUS
or
ARIMA_PLUS_XREG
models when input data is provided and perform_aggregation
is TRUE
:
time_series_id_col
ortime_series_id_cols
: the identifiers of a time series. Only present when forecasting multiple time series at once. The column names and types are inherited from theTIME_SERIES_ID_COL
option as specified in theCREATE MODEL
statement.ARIMA_PLUS_XREG
models don't support this column.mean_absolute_error
: aFLOAT64
value that contains the mean absolute error for the model.mean_squared_error
: aFLOAT64
value that contains the mean squared error for the model.root_mean_squared_error
: aFLOAT64
value that contains the root mean squared error for the model.mean_absolute_percentage_error
: aFLOAT64
value that contains the mean absolute percentage error for the model.symmetric_mean_absolute_percentage_error
: aFLOAT64
value that contains the symmetric mean absolute percentage error for the model.
ML.EVALUATE
returns the following columns for an ARIMA_PLUS
model when
input data isn't provided:
time_series_id_col
ortime_series_id_cols
: the identifiers of a time series. Only present when forecasting multiple time series at once. The column names and types are inherited from theTIME_SERIES_ID_COL
option as specified in theCREATE MODEL
statement.non_seasonal_p
: anINT64
value that contains the order for the autoregressive model. For more information, see Autoregressive integrated moving average.non_seasonal_d
: anINT64
that contains the degree of differencing for the non-seasonal model. For more information, see Autoregressive integrated moving average.non_seasonal_q
: anINT64
that contains the order for the moving-average model. For more information, see Autoregressive integrated moving average.has_drift
: aBOOL
value that indicates whether the model includes a linear drift term.log_likelihood
: aFLOAT64
value that contains the log likelihood for the model.aic
: aFLOAT64
value that contains the Akaike information criterion for the model.variance
: aFLOAT64
value that measures how far the observed value differs from the predicted value mean.seasonal_periods
: aSTRING
value that contains the seasonal period for the model.has_holiday_effect
: aBOOL
value that indicates whether the model includes any holiday effects.has_spikes_and_dips
: aBOOL
value that indicates whether the model performs automatic spikes and dips detection and cleanup.has_step_changes
: aBOOL
value that indicates whether the model has step changes.
Autoencoder models
ML.EVALUATE
returns the following columns for autoencoder models:
mean_absolute_error
: aFLOAT64
value that contains the mean absolute error for the model.mean_squared_error
: aFLOAT64
value that contains the mean squared error for the model.mean_squared_log_error
: aFLOAT64
value that contains the mean squared logarithmic error for the model. The mean squared logarithmic error measures the distance between the actual and predicted values.
Remote models
ML.EVALUATE
returns the following column for remote models:
remote_eval_metrics
: aJSON
column containing appropriate metrics for the model type.
Limitations
ML.EVALUATE
is subject to the following limitations:
ML.EVALUATE
doesn't support imported TensorFlow models.- For remote models,
ML.EVALUATE
fetches evaluation result from the Vertex AI endpoint and doesn't take any input data.
Examples
The following examples show how to use ML.EVALUATE
.
ML.EVALUATE
with no input data specified
The following query evaluates a model with no input data specified:
SELECT * FROM ML.EVALUATE(MODEL `mydataset.mymodel`)
ML.EVALUATE
with a custom threshold and input data
The following query evaluates s model with input data and a custom
threshold of 0.55
:
SELECT * FROM ML.EVALUATE(MODEL `mydataset.mymodel`, ( SELECT custom_label, column1, column2 FROM `mydataset.mytable`), STRUCT(0.55 AS threshold))
ML.EVALUATE
to calculate forecasting accuracy of a time series
The following query evaluates the 30-point forecasting accuracy for a time series model:
SELECT * FROM ML.EVALUATE(MODEL `mydataset.my_arima_model`, ( SELECT timeseries_date, timeseries_metric FROM `mydataset.mytable`), STRUCT(TRUE AS perform_aggregation, 30 AS horizon))
ML.EVALUATE
to calculate ARIMA_PLUS forecasting accuracy for each forecasted timestamp
The following query evaluates the forecasting accuracy for each of the 30
forecasted points of a time series model. It also computes the prediction
interval based on a confidence level of 0.9
.
SELECT * FROM ML.EVALUATE(MODEL `mydataset.my_arima_model`, ( SELECT timeseries_date, timeseries_metric FROM `mydataset.mytable`), STRUCT(FALSE AS perform_aggregation, 0.9 AS confidence_level, 30 AS horizon))
ML.EVALUATE
to calculate ARIMA_PLUS_XREG forecasting accuracy for each forecasted timestamp
The following query evaluates the forecasting accuracy for each of the 30
forecasted points of a time series model. It also computes the prediction
interval based on a confidence level of 0.9
. Note that you need to include the
side features for the evaluation data.
SELECT * FROM ML.EVALUATE(MODEL `mydataset.my_arima_xreg_model`, ( SELECT timeseries_date, timeseries_metric, feature1, feature2 FROM `mydataset.mytable`), STRUCT(FALSE AS perform_aggregation, 0.9 AS confidence_level, 30 AS horizon))
What's next
- For information about model evaluation, see BigQuery ML model evaluation overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.