The ML.FORECAST function
This document describes the ML.FORECAST function, which you can use to
forecast a time series based on a trained ARIMA_PLUS or ARIMA_PLUS_XREG
model.
If you don't want to manage your own times series forecasting model, you can
use the
AI.FORECAST function
with BigQuery ML's built-in
TimesFM time series model
(Preview) to perform forecasting.
Syntax
#ARIMA_PLUSmodels: ML.FORECAST( MODEL `PROJECT_ID.DATASET.MODEL_NAME`, STRUCT( [, HORIZON AS horizon] [, CONFIDENCE_LEVEL AS confidence_level]) ) #ARIMA_PLUS_XREGmodel: ML.FORECAST( MODEL `PROJECT_ID.DATASET.MODEL_NAME`, [{ TABLE `PROJECT_ID.DATASET.TABLE` | (QUERY_STATEMENT) } ,] STRUCT( HORIZON AS horizon, CONFIDENCE_LEVEL AS confidence_level) )
Arguments
ML.FORECAST takes the following arguments:
PROJECT_ID: the project that contains the resource.DATASET: the dataset that contains the resource.MODEL: The name of the model.HORIZON: anINT64value that specifies the number of time points to forecast. The default value is3, and the maximum value is the value of theHORIZONoption specified in theCREATE MODELstatement for time-series models, or1000if that option isn't specified. When forecasting multiple time series at the same time, this parameter applies to each time series.CONFIDENCE_LEVEL: aFLOAT64value that specifies percentage of the future values that fall in the prediction interval. The default value is0.95. The valid input range is[0, 1).TABLE: The name of the input table that contains the features.If
TABLEis specified, the input column names in the table must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.If there are unused columns from the table, they are ignored.
QUERY_STATEMENT: The GoogleSQL query that is used to generate the features. See the GoogleSQL query syntax page for the supported SQL syntax of theQUERY_STATEMENTclause.If
QUERY_STATEMENTis specified, the input column names from the query must match the column names in the model, and their types should be compatible according to BigQuery implicit coercion rules.If there are unused columns from the query, they are ignored.
Output
ML.FORECAST returns the following columns:
time_series_id_colortime_series_id_cols: a value that contains the identifiers of a time series.time_series_id_colcan be anINT64orSTRINGvalue.time_series_id_colscan be anARRAY<INT64>orARRAY<STRING>value. Only present when forecasting multiple time series at once. The column names and types are inherited from theTIME_SERIES_ID_COLoption as specified in theCREATE MODELstatement.forecast_timestamp: aTIMESTAMPvalue that contains the timestamps of a time series.forecast_value: aFLOAT64value that contains the average of theprediction_interval_lower_boundandprediction_interval_upper_boundvalues.standard_error: aFLOAT64value that contains the amount of variability in the estimated results.confidence_level: aFLOAT64value that contains theconfidence_levelvalue you specified in the function input, or0.95if you didn't specify aconfidence_levelvalue. It is the same across all rows.prediction_interval_lower_bound: aFLOAT64value that contains the lower bound of the prediction interval for each forecasted point.prediction_interval_upper_bound: aFLOAT64value that contains the upper bound of the prediction interval for each forecasted point.confidence_interval_lower_bound: aFLOAT64value that contains the lower bound of the confidence interval for each forecasted point.confidence_interval_upper_bound: aFLOAT64value that contains the upper bound of the confidence interval for each forecasted point.
The output of ML.FORECAST has the following properties:
- For each time series, the output rows are sorted in the chronological order of
forecast_timestamp. forecast_timestampalways has a type ofTIMESTAMP, regardless of the type of the column specified in theTIME_SERIES_TIMESTAMP_COLoption of theCREATE MODELstatement.
ARIMA_PLUS example
The following example forecasts 30 time points with a
confidence level of 0.8:
SELECT * FROM ML.FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level))
ARIMA_PLUS_XREG example
The following example forecasts 30 time points with a
confidence level of 0.8 with future features:
SELECT * FROM ML.FORECAST(MODEL `mydataset.mymodel`, STRUCT(30 AS horizon, 0.8 AS confidence_level), (SELECT * FROM `mydataset.mytable`))
Limitation
Applying any additional computation on top of ML.FORECAST's result
columns might lead to an out of memory error if the model size is too large. If this happens, you might see errors like
Resources exceeded during query execution: The query could not be executed in the allotted memory.
Examples of operations that might cause this issue are calculating minimum or maximum values, or adding to or subtracting
from a particular column. If you are trying to filter on the forecasted value,
we recommend that you use the forecast with limit option instead, because the algorithm it uses is less likely to cause an issue. If you keep getting out of memory errors, you can try working around this issue by
creating a new table for the ML.FORECAST result, and then applying other computations in a different query that uses data from the new table.
What's next
- For information about model inference, see Model inference overview.
- For more information about supported SQL statements and functions for time series forecasting models, see End-to-end user journeys for time series forecasting models.