The ML.EXPLAIN_FORECAST function
This document describes the ML.EXPLAIN_FORECAST
function, which lets you
generate forecasts that are based on a trained
time series model. It only works on ARIMA_PLUS
models with the
training option decompose_time_series
enabled or on ARIMA_PLUS_XREG
models.
The ML.EXPLAIN_FORECAST
function encompasses the ML.FORECAST
function
because its output is a superset of the results of ML.FORECAST
.
Syntax
# `ARIMA_PLUS` models: ML.EXPLAIN_FORECAST( MODEL `project_id.dataset.model_name`, STRUCT( [horizon AS horizon] [, confidence_level AS confidence_level])) # `ARIMA_PLUS_XREG` model: ML.EXPLAIN_FORECAST( MODEL `project_id.dataset.model_name`, STRUCT( [horizon AS horizon] [, confidence_level AS confidence_level]), { TABLE `project_id.dataset.table` | (query_statement) })
Arguments
ML.EXPLAIN_FORECAST
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.table
: The name of the input table that contains the data to be evaluated.If
table
is specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.If there are unused columns from the table, they are ignored.
The
table
argument is required for theARIMA_PLUS_XREG
model.query_statement
: The GoogleSQL query that is used to generate the evaluation data. For the supported SQL syntax for thequery_statement
clause in GoogleSQL, see Query syntax.If
query_statement
is specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.horizon
: anINT64
value that specifies the number of time points to forecast. The maximum value is the horizon value specified in theCREATE MODEL
statement for the time series model, or1000
if unspecified. The default value is3
. When forecasting multiple time series at the same time, this parameter applies to each time series.confidence_level
: aFLOAT64
value that specifies the percentage of the future values that fall in the prediction interval. The valid input range is[0, 1)
. The default value is0.95
.
Output
The ML.EXPLAIN_FORECAST
function returns the following columns:
time_series_id_col
ortime_series_id_cols
: a value that contains the identifiers of a time series.time_series_id_col
can be anINT64
orSTRING
value.time_series_id_cols
can be anARRAY<INT64>
orARRAY<STRING>
value. Only present when forecasting multiple time series at once. The column names and types are inherited from theTIME_SERIES_ID_COL
option as specified in theCREATE MODEL
statement.time_series_timestamp
: aTIMESTAMP
value that contains the timestamp of the time series. This column has a type ofTIMESTAMP
regardless of the type of the inputtime_series_timestamp_col
. For each time series, the output rows are sorted in chronological order by thetime_series_timestamp
value.time_series_type
: aSTRING
value that contains eitherhistory
orforecast
. The rows that have a value ofhistory
in this column are used in training, either directly from the training table, or from interpolation using the training data.time_series_data
: aFLOAT64
value that contains the data of the time series. For rows that have a value ofhistory
in thetime_series_type
column,time_series_data
is either the training data or the interpolated value using the training data. For rows that have a value offorecast
in thetime_series_type
column,time_series_data
is the forecast value.time_series_adjusted_data
: aFLOAT64
value that contains the adjusted data of the time series. For rows that have a value ofhistory
in thetime_series_type
column, this is the value after cleaning spikes and dips, adjusting the step changes, and removing the residuals. It is the aggregation of all the valid components: holiday effect, seasonal components, and trend. For rows that have a value offorecast
in thetime_series_type
column, this is the forecast value, which is the same as the value oftime_series_data
.standard_error
: aFLOAT64
value that contains the standard error of the residuals during the ARIMA fitting. The values are the same for all rows that have a value ofhistory
in thetime_series_type
column. For rows that have a value offorecast
in thetime_series_type
column, this value increases with time, as the forecast values become less reliable.confidence_level
: aFLOAT64
value that contains the user-specified confidence level or, if unspecified, the default value. This value is the same for all rows that have a value ofhistory
in thetime_series_type
column. This value isNULL
for all rows that have a value offorecast
in thetime_series_type
column.prediction_interval_lower_bound
: aFLOAT64
value that contains the lower bound of the prediction result. Only rows that have a value offorecast
in thetime_series_type
column have values other thanNULL
in this column.prediction_interval_upper_bound
: aFLOAT64
value that contains the upper bound of the prediction result. Only rows that have a value offorecast
in thetime_series_type
column have values other thanNULL
in this column.trend
: aFLOAT64
value that contains the long-term increase or decrease in the time series data.seasonal_period_yearly
: aFLOAT64
value that contains the time series data value affected by the time of the year. This value isNULL
if no yearly effect is found.seasonal_period_quarterly
: aFLOAT64
value that contains the time series data value affected by the time of the quarter. This value isNULL
if no quarterly effect is found.seasonal_period_monthly
: aFLOAT64
value that contains the time series data value affected by the time of the month. This value isNULL
if no monthly effect is found.seasonal_period_weekly
: aFLOAT64
value that contains the time series data value affected by the time of the week. This value isNULL
if no weekly effect is found.seasonal_period_daily
: aFLOAT64
value that contains the time series data value affected by the time of the day. This value isNULL
if no daily effect is found.holiday_effect
: aFLOAT64
value that contains the time series data value affected by different holidays. This is the sum of the maximum positive individual holiday effect and the minimum negative individual holiday effect. This is shown in the following formula, where \(H\) is the overall holiday effect and \(h(i)\) is the individual holiday effect:\[H=\max\limits_{h(i)>0} h(i) + \min\limits_{h(i)<0} h(i)\]
This value is
NULL
if no holiday effect is found.spikes_and_dips
: aFLOAT64
value that contains the unexpectedly high or low values of the time series. For rows that have a value ofhistory
in thetime_series_type
column, the value isNULL
if no spike or dip is found. For rows that have a value offorecast
in thetime_series_type
column, this value isNULL
.step_changes
: aFLOAT64
value that contains the abrupt or structural change in the distributional properties of the time series. For rows that have a value ofhistory
in thetime_series_type
column, this value isNULL
if no step change is found. For rows that have a value offorecast
in thetime_series_type
column, this value isNULL
.residual
: aFLOAT64
value that contains the difference between the actual time series and the fitted time series after model training. The residual value is only meaningful for historical data. For rows that have a value offorecast
in thetime_series_type
column, theresidual
value isNULL
.holiday_effect_holiday_name
: aFLOAT64
value that contains the time series data value affected by the holiday that's identified in holiday_name. If no holiday effect is found, this value isNULL
.There is one
holiday_effect_holiday_name
column for each holiday that's included in the model.attribution_feature_name
: aFLOAT64
value that contains the attribution of each feature to the final forecast. This only applies toARIMAX_PLUS_XREG
models. The value is calculated by multiplying the weight of the feature with the feature value. This is shown in the following formula, where \(\beta_{fn}\) is the weight of featurefn
in the linear regression and \(X_{fn}\) is the numericalized feature value:\[attribution_{fn}=\beta_{fn} * X_{fn}\]
Mathematical explanation
The mathematical relationship of the output columns is described in the following sections.
time_series_data
The time_series_data
value is decomposed into several components to get better
explainability. For ARIMA_PLUS
models, the component list includes the
following components for better explainability:
trend
seasonal_period_yearly
seasonal_period_quarterly
seasonal_period_monthly
seasonal_period_weekly
seasonal_period_daily
holiday_effect
spikes_and_dips
step_changes
residual
For ARIMA_PLUS_XREG
models, this list also includes the feature contribution
attribution_feature_name
. For future data,
the spikes_and_dips
, step_changes
, and residuals
values aren't applicable.
The following formulas show what components make up the time_series_data
value for historical and forecast data for time series models
For
ARIMA_PLUS
models:- Historical data:
time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + spikes_and_dips + step_changes + residual
- Forecast data:
time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect
For
ARIMA_PLUS_XREG
models:- Historical data:
time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + spikes_and_dips + step_changes + residual + (attribution_feature_1 + ... + attribution_feature_n)
- Forecast data:
time_series_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + (attribution_feature_1 + ... + attribution_feature_n)
time_series_adjusted_data
The time_series_adjusted_data
value is the value that remains after cleaning
spikes and dips, adjusting the step changes, and removing the residuals. Its
formula is the same for both historical and forecast data.
For
ARIMA_PLUS
models:time_series_adjusted_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect
For
ARIMA_PLUS_XREG
models:time_series_adjusted_data = trend + seasonal_period_yearly + seasonal_period_quarterly + seasonal_period_monthly + seasonal_period_weekly + seasonal_period_daily + holiday_effect + (attribution_feature_1 + ... + attribution_feature_n)
holiday_effect
The holiday_effect_holiday_name
value is a subcomponent
. The holiday_effect
value is the sum of all the
holiday_effect_holiday_name
values. For example, if you
specify holidays xmas
and mlk
, the formula is
holiday_effect = holiday_effect_xmas + holiday_effect_mlk
.
ARIMA_PLUS
example
The following example forecasts 30 time points with
a confidence level of 0.8
:
SELECT * FROM ML.EXPLAIN_FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level))
ARIMA_PLUS_XREG
example
The following example forecasts 30 time points with a
confidence level of 0.8
with future features:
SELECT * FROM ML.EXPLAIN_FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level), (SELECT * FROM `mydataset.mytable`))
What's next
- For information about Explainable AI, see BigQuery Explainable AI overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.