The ML.EVALUATE function

Stay organized with collections Save and categorize content based on your preferences.

ML.EVALUATE function

Use the ML.EVALUATE function to evaluate model metrics.

For information about model evaluation in BigQuery ML, see Model evaluation overview.

For information about supported model types of each SQL statement and function, and all supported SQL statements and functions for each model type, read End-to-end user journey for each model.

ML.EVALUATE syntax

ML.EVALUATE(MODEL model_name
           [, {TABLE table_name | (query_statement)}]
           [, STRUCT<threshold FLOAT64,
                     perform_aggregation BOOL,
                     horizon INT64,
                     confidence_level FLOAT64> settings]))])

model_name

model_name is the name of the model you're evaluating. If you do not have a default project configured, prepend the project ID to the model name in following format: `[PROJECT_ID].[DATASET].[MODEL]` (including the backticks); for example, `myproject.mydataset.mymodel`.

table_name

(Optional) table_name is the name of the input table that contains the evaluation data. If you do not have a default project configured, prepend the project ID to the table name in following format: `[PROJECT_ID].[DATASET].[TABLE]` (including the backticks); for example, `myproject.mydataset.mytable`.

If table_name is specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name provided during training. This value is provided using the input_label_cols option. If input_label_cols is unspecified, the column named "label" in the training data is used.

query_statement

(Optional) The query_statement clause specifies the standard SQL query that is used to generate the evaluation data. See the Standard SQL Query Syntax page for the supported SQL syntax of the query_statement clause.

If query_statement is specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules. The input must have a column that matches the label column name provided during training. This value is provided using the input_label_cols option. If input_label_cols is unspecified, the column named "label" in the training data is used. The extra columns are ignored.

If the TRANSFORM clause was present in the CREATE MODEL statement that created model_name, then only the input columns present in the TRANSFORM clause must appear in query_statement.

threshold

(Optional) threshold is a custom threshold for the binary-class classification model to be used for evaluation. The default value is 0.5. The threshold value that is supplied must be of type STRUCT.

A zero value for precision or recall means that the selected threshold produced no true positive labels. A NaN value for precision means that the selected threshold produced no positive labels, neither true positives nor false positives.

If both table_name and query_statement are unspecified, you cannot use a threshold. Also, threshold can only be used with binary-class classification models.

perform_aggregation

(Optional) perform_aggregation identifies the level of evaluation for forecasting accuracy. If the value is true, the forecasting accuracy is on the level of time series. If the value is false, the forecasting accuracy is on the level of timestamp. The default value is true. The perform_aggregation value is part of the settings STRUCT.

horizon

(Optional) Horizon is the number of forecasted time points against which the evaluation metrics are computed. The horizon value is type INT64 and is part of the settings STRUCT. The default value is the horizon value specified in the CREATE MODEL statement for time-series models, or 1000 if unspecified. When evaluating multiple time series at the same time, this parameter applies to each time series.

confidence_level

(Optional) The percentage of the future values that fall in the prediction interval. The confidence_level value is type FLOAT64 and is part of the settings STRUCT. The default value is 0.95. The valid input range is [0, 1).

ML.EVALUATE output

The output of the ML.EVALUATE function is a single row containing common metrics applicable to the type of model supplied.

ML.EVALUATE returns the following columns for a regression model, which includes linear regression models, boosted tree regressor, random forest regressor, DNN regressor, deep-and-wide regressor, and AutoML Tables regressor:

ML.EVALUATE returns the following columns for a classification model, which includes logistic regression models, boosted tree classifier, random forest classifier, DNN classifier, deep-and-wide classifier, and AutoML Tables classifier:

  • precision
  • recall
  • accuracy
  • f1_score
  • log_loss
  • roc_auc

ML.EVALUATE returns the following columns for a k-means model:

ML.EVALUATE returns the following columns for a matrix factorization model with implicit feedback:

ML.EVALUATE returns the following columns for a matrix factorization model with explicit feedback:

ML.EVALUATE returns the following columns for a PCA model: total_explained_variance_ratio.

It is the percentage of the cumulative variance explained by all the returned principal components. For more information, see the ml.principal_component_info function.

The ML.EVALUATE function returns the following columns for a trained time-series ARIMA_PLUS model when the input data is provided and perform_aggregation is FALSE:

  • time_series_id_col or time_series_id_cols: the identifiers of a time series. Only present when forecasting multiple time series at once. The column names and types are inherited from the TIME_SERIES_ID_COL option as specified in the model creation query.
  • time_series_timestamp_col: The column name and type are inherited from the TIME_SERIES_TIMESTAMP_COL option as specified in the model creation query.
  • time_series_data_col: The column name and type are inherited from the TIME_SERIES_DATA_COL option as specified in the model creation query.
  • forecasted_time_series_data_col: Same as time_series_data_col but with "forecasted_" prefix in the column name.
  • lower_bound: The lower bound of the prediction interval.
  • upper_bound: The upper bound of the prediction interval.
  • absolute_error: The absolute value of the difference between the forecasted value and the actual data value.
  • absolute_percentage_error: The absolute value of the absolute error divided by the actual value.

The ML.EVALUATE function returns the following columns for a trained time-series ARIMA_PLUS model when the input data is provided and perform_aggregation is TRUE:

The ML.EVALUATE function returns the following columns for a trained time-series ARIMA_PLUS model when the input data is not provided:

  • time_series_id_col or time_series_id_cols: the identifiers of a time series. Only present when forecasting multiple time series at once. The column names and types are inherited from the TIME_SERIES_ID_COL option as specified in the model creation query.
  • non_seasonal_p
  • non_seasonal_d
  • non_seasonal_q
  • has_drift
  • log_likelihood
  • AIC
  • variance
  • seasonal_periods
  • has_holiday_effect
  • has_spikes_and_dips
  • has_step_changes

ML.EVALUATE returns the following columns for an Autoencoder model:

ML.EVALUATE limitations

The ML.EVALUATE function is subject to the following limitations:

ML.EVALUATE examples

ML.EVALUATE with no input data specified

The following query is used to evaluate a model with no input data specified.

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.mymodel`)

ML.EVALUATE with a custom threshold and input data

The following query evaluates the model by specifying input data and a custom threshold of 0.55.

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.mymodel`,
    (
    SELECT
      custom_label,
      column1,
      column2
    FROM
      `mydataset.mytable`),
    STRUCT(0.55 AS threshold))

ML.EVALUATE to calculate forecasting accuracy of a time series

The following query evaluates the 30-point forecasting accuracy for the given time series.

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.my_arima_model`,
    (
    SELECT
      timeseries_date,
      timeseries_metric
    FROM
      `mydataset.mytable`),
    STRUCT(TRUE AS perform_aggregation, 30 AS horizon))

ML.EVALUATE to calculate forecasting accuracy for each forecasted timestamp

The following query evaluates the forecasting accuracy for each of the 30 forecasted points of the given time series. It also computes the prediction interval based on a confidence level of 0.9.

SELECT
  *
FROM
  ML.EVALUATE(MODEL `mydataset.my_arima_model`,
    (
    SELECT
      timeseries_date,
      timeseries_metric
    FROM
      `mydataset.mytable`),
    STRUCT(FALSE AS perform_aggregation, 0.9 AS confidence_level,
    30 AS horizon))