The ML.FORECAST function
This document describes the ML.FORECAST
function, which you can use to
forecast a time series based on a trained ARIMA_PLUS
or ARIMA_PLUS_XREG
model.
Syntax
#ARIMA_PLUS
models: ML.FORECAST( MODEL `project_id.dataset.model` STRUCT( [, horizon AS horizon] [, confidence_level AS confidence_level])) #ARIMA_PLUS_XREG
model: ML.FORECAST( MODEL `project_id.dataset.model` { TABLE `project_id.dataset.table` | (query_statement) } STRUCT(horizon AS horizon, confidence_level AS confidence_level))
Arguments
ML.FORECAST
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.horizon
: anINT64
value that specifies the number of time points to forecast. The default value is3
, and the maximum value is the value of theHORIZON
option specified in theCREATE MODEL
statement for time-series models, or1000
if that option isn't specified. When forecasting multiple time series at the same time, this parameter applies to each time series.confidence_level
: aFLOAT64
value that specifies percentage of the future values that fall in the prediction interval. The default value is0.95
. The valid input range is[0, 1)
.table
: The name of the input table that contains the features.If
table
is specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.If there are unused columns from the table, they are ignored.
query_statement
: The GoogleSQL query that is used to generate the features. See the GoogleSQL query syntax page for the supported SQL syntax of thequery_statement
clause.If
query_statement
is specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.If there are unused columns from the table, they are ignored.
Output
ML.FORECAST
returns the following columns:
time_series_id_col
ortime_series_id_cols
: a value that contains the identifiers of a time series.time_series_id_col
can be anINT64
orSTRING
value.time_series_id_cols
can be anARRAY<INT64>
orARRAY<STRING>
value. Only present when forecasting multiple time series at once. The column names and types are inherited from theTIME_SERIES_ID_COL
option as specified in theCREATE MODEL
statement.forecast_timestamp
: aTIMESTAMP
value that contains the timestamps of a time series.forecast_value
: aFLOAT64
value that contains the average of theprediction_interval_lower_bound
andprediction_interval_upper_bound
values.standard_error
: aFLOAT64
value that contains the amount of variability in the estimated results.confidence_level
: aFLOAT64
value that contains theconfidence_level
value you specified in the function input, or0.95
if you didn't specify aconfidence_level
value. It is the same across all rows.prediction_interval_lower_bound
: aFLOAT64
value that contains the lower bound of the prediction interval for each forecasted point.prediction_interval_upper_bound
: aFLOAT64
value that contains the upper bound of the prediction interval for each forecasted point.confidence_interval_lower_bound
: aFLOAT64
value that contains the lower bound of the prediction interval for each forecasted point.confidence_interval_upper_bound
: aFLOAT64
value that contains the upper bound of the prediction interval for each forecasted point.
The output of ML.FORECAST
has the following properties:
- For each time series, the output rows are sorted in the chronological order of
forecast_timestamp
. forecast_timestamp
always has a type ofTIMESTAMP
, regardless of the type of the column specified in theTIME_SERIES_TIMESTAMP_COL
option of theCREATE MODEL
statement.
ARIMA_PLUS
example
The following example forecasts 30 time points with a
confidence level of 0.8
:
SELECT * FROM ML.FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level))
ARIMA_PLUS_XREG
example
The following example forecasts 30 time points with a
confidence level of 0.8
with future features:
SELECT * FROM ML.FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level), (SELECT * FROM `mydataset.mytable`))
Limitation
Applying any additional computation on top of ML.FORECAST
's result
columns might lead to an out of memory error if the model size is too large. If this happens, you might see errors like
Resources exceeded during query execution: The query could not be executed in the allotted memory
.
Examples of operations that might cause this issue are calculating minimum or maximum values, or adding to or subtracting
from a particular column. If you are trying to filter on the forecasted value,
we recommend that you use the forecast with limit option instead, because the algorithm it uses is less likely to cause an issue. If you keep getting out of memory errors, you can try working around this issue by
creating a new table for the ML.FORECAST
result, and then applying other computations in a different query that uses data from the new table.
What's next
- For information about model inference, see Model inference overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.