The ML.EXPLAIN_FORECAST function

The ML.EXPLAIN_FORECAST function generates forecasts based on a trained ARIMA_PLUS time-series model. It only works on ARIMA_PLUS models with the training option decompose_time_series enabled. The ML.EXPLAIN_FORECAST function encompasses ML.FORECAST because its output is a super set of the results of ML.FORECAST.

For information about Explainable AI, see Explainable AI Overview.

For information about supported model types of each SQL statement and function, and all supported SQL statements and functions for each model type, read End-to-end user journey for each model.

ML.EXPLAIN_FORECAST syntax

ML.EXPLAIN_FORECAST(MODEL model_name,
                   [, STRUCT<horizon INT64, confidence_level FLOAT64> settings])

model_name

model_name is the name of the model that you're using for forecasting. If you do not have a default project configured, then prepend the project ID to the model name in following format: `[project_id].[dataset].[model]` (including the backticks); for example, `myproject.mydataset.mymodel`.

horizon

(Optional) Horizon is the number of time points to forecast. The horizon value is type INT64 and is part of the settings STRUCT. The default value is 3, and the maximum value is the horizon value that's specified in the CREATE MODEL statement for time-series models, or 1000 if not specified. When forecasting multiple time series at the same time, this parameter applies to each time series.

confidence_level

(Optional) The percentage of the future values that fall in the prediction interval. The confidence_level value is type FLOAT64 and is part of the settings STRUCT. The default value is 0.95. The valid input range is \[0, 1).

ML.EXPLAIN_FORECAST output

The ML.EXPLAIN_FORECAST function returns the following columns:

  • time_series_id_col or time_series_id_cols: The identifiers of a time series. This column is only present when forecasting multiple time series in one model creation query by specifying the TIME_SERIES_ID_COL option. The column names and types are inherited from the TIME_SERIES_ID_COL option.

  • time_series_timestamp (TIMESTAMP): The timestamp of the time series. This column has a type of TIMESTAMP, regardless of the type of the input time_series_timestamp_col. For each time series, the output rows are sorted in chronological order of time_series_timestamp.

  • time_series_type (STRING): A value of either history or forecast. The rows with history in this column are used in training, either directly from the training table, or from interpolation using the training data.

  • time_series_data (FLOAT64): The data of the time series. For history rows, time_series_data is either the training data or the interpolated value using the training data. For forecast rows, time_series_data is the forecast value.

  • time_series_adjusted_data (FLOAT64): The adjusted data of the time series. For history rows, this is the value after cleaning spikes and dips, adjusting the step changes, and removing the residuals. It is the aggregation of all the valid components: holiday effect, seasonal components, and trend. For forecast rows, this is the forecast value, which is the same as the value of time_series_data.

  • standard_error (FLOAT64): The standard error of the residuals during the ARIMA fitting. The values are the same for all history rows. For forecast rows, this value increases with time, as the forecast values become less reliable.

  • confidence_level (FLOAT64): The user-specified confidence level or, if unspecified, the default value. This value is the same for forecast rows and NULL for history rows.

  • prediction_interval_lower_bound (FLOAT64): The lower bound of the prediction result. Only forecast rows have values other than NULL in this column.

  • prediction_interval_upper_bound (FLOAT64): The upper bound of the prediction result. Only forecast rows have values other than NULL in this column.

  • trend (FLOAT64): The long-term increase or decrease in the time series data.

  • seasonal_period_yearly (FLOAT64): The time series data value affected by the time of the year. This value is NULL if no yearly effect is found.

  • seasonal_period_quarterly (FLOAT64): The time series data value affected by the time of the quarter. This value is NULL if no quarterly effect is found.

  • seasonal_period_monthly (FLOAT64): The time series data value affected by the time of the month. This value is NULL if no monthly effect is found.

  • seasonal_period_weekly (FLOAT64): The time series data value affected by the time of the week. This value is NULL if no weekly effect is found.

  • seasonal_period_daily (FLOAT64): The time series data value affected by the time of the day. This value is NULL if no daily effect is found.

  • holiday_effect (FLOAT64): The time series data value affected by different holidays. This is the aggregation value of all the holiday effects. This value is NULL if no holiday effect is found.

  • spikes_and_dips (FLOAT64): The unexpectedly high or low values of the time series. For history rows, the value is NULL if no spike or dip is found. This value is NULL for forecast rows.

  • step_changes (FLOAT64): The abrupt or structural change in the distributional properties of the time series. For history rows, this value is NULL if no step change is found. This value is NULL for forecast rows.

ML.EXPLAIN_FORECAST example

The following query uses ML.EXPLAIN_FORECAST to forecast 30 time points with a confidence level of 0.8.

SELECT
  *
FROM
  ML.EXPLAIN_FORECAST(MODEL `mydataset.mymodel`,
                      STRUCT(30 AS horizon, 0.8 AS confidence_level))