The AI.FORECAST function
This document describes the AI.FORECAST
function, which lets you
forecast a time series by using BigQuery ML's built-in
TimesFM model.
Using the AI.FORECAST
function with the built-in TimesFM model lets you
perform forecasting without having to create and train your own model, so you
can avoid the need for model management.
Syntax
SELECT * FROM AI.FORECAST( { TABLE TABLE | (QUERY_STATEMENT) }, data_col => 'DATA_COL', timestamp_col => 'TIMESTAMP_COL' [, model => 'MODEL'] [, id_cols => ID_COLS] [, horizon => HORIZON] [, confidence_level => CONFIDENCE_LEVEL] [, context_window => CONTEXT_WINDOW] )
Arguments
AI.FORECAST
takes the following arguments:
TABLE
: the name of the table that contains the data that you want to forecast. For example,`mydataset.mytable`
.If the table is in a different project, then you must prepend the project ID to the table name in the following format, including backticks:
`[PROJECT_ID].[DATASET].[TABLE]`
For example,
`myproject.mydataset.mytable`
.To prevent query errors, we recommend providing the fully qualified table name, including backticks. This is especially important if the project name contains characters other than letters, numbers, and underscores.
QUERY_STATEMENT
: the GoogleSQL query that generates the data that you want to forecast. See the GoogleSQL query syntax page for the supported SQL syntax of theQUERY_STATEMENT
clause.DATA_COL
: aSTRING
value that specifies the name of the data column. The data column contains the data to forecast. The data column must use one of the following data types:INT64
NUMERIC
BIGNUMERIC
FLOAT64
TIMESTAMP_COL
: aSTRING
value that specified the name of the timestamp column. The timestamp column must use one of the following data types:TIMESTAMP
DATE
DATETIME
MODEL
: aSTRING
value that specifies the name of the model to use.TimesFM 2.0
is the only supported value, and is the default value.ID_COLS
: anARRAY<STRING>
value that specifies the names of one or more ID columns. Each unique combination of IDs identifies a unique time series to forecast. Specify one or more values for this argument in order to forecast multiple time series using a single query. The columns that you specify must use one of the following data types:STRING
INT64
ARRAY<STRING>
ARRAY<INT64>
HORIZON
: anINT64
value that specifies the number of time series data points to forecast. The default value is10
. The valid input range is[1, 10,000]
.CONFIDENCE_LEVEL
: aFLOAT64
value that specifies the percentage of the future values that fall in the prediction interval. The default value is0.95
. The valid input range is[0, 1)
.CONTEXT_WINDOW
: anINT64
value that specifies the context window length used by BigQuery ML's built-in TimesFM model. The context window length determines how many of the most recent data points from the input time series are use by the model. For example, if your time series date range is March 1 to April 15, data points are selected starting at April 15 and working backwards. Valid values are as follows:64
128
256
512
1024
2048
If you don't specify a
CONTEXT_WINDOW
value, theAI.FORECAST
function automatically chooses the smallest possible context window length to use that is still large enough to cover the number of time series data points in your input data. The following table shows the mapping between the number of time series data points in the input data and the selected context window length:Number of time series data points Context window length (1, 64] 64 (65, 128] 128 (129, 256] 256 (257, 512] 512 (513, 1024] 1,024 (1025, 2048] 2,048 >2048 2,048 2,048 is the maximum number of time series data points that are passed to the model. Any additional time series data points in the input data are ignored.
Output
AI.FORECAST
returns the following columns:
id_cols
: one or more values that contain the identifiers of a time series.id_cols
can be anINT64
,STRING
,ARRAY<INT64>
orARRAY<STRING>
value. The column names and types are inherited from theID_COLS
argument value specified in the function input.confidence_level
: aFLOAT64
value that contains theconfidence_level
value that you specified in the function input, or0.95
if you didn't specify aconfidence_level
value. This value is the same across all rows.prediction_interval_lower_bound
: aFLOAT64
value that contains the lower bound of the prediction interval for each forecasted point.prediction_interval_upper_bound
: aFLOAT64
value that contains the upper bound of the prediction interval for each forecasted point.ai_forecast_status
: aSTRING
value that contains the forecast status. This value is empty if the operation was successful. If the operation wasn't successful, the value is the error string. A common error isThe time series data is too short.
This error indicates that there wasn't enough historical data in the time series to generate a forecast. A minimum of 3 data points is required.forecast_timestamp
: aTIMESTAMP
value that contains the timestamps of the time series.forecast_value
: aFLOAT64
value that contains the 50% quantile value for the forecasting output from the model. The 50% quantile value represents the median value of the forecasted data.
Example
The following example forecasts the daily number of bike trips for each different user type for the next 30 days.
WITH citibike_trips AS ( SELECT EXTRACT(DATE FROM starttime) AS date, usertype, COUNT(*) AS num_trips FROM `bigquery-public-data.new_york.citibike_trips` GROUP BY date, usertype ) SELECT * FROM AI.FORECAST( TABLE citibike_trips, data_col => 'num_trips', timestamp_col => 'date', id_cols => ['usertype'], horizon => 30);
Locations
AI.FORECAST
and the TimesFM model are available in all
supported BigQuery ML locations.
Pricing
AI.FORECAST
usage is billed at the evaluation, inspection, and prediction
rate documented in the BigQuery ML on-demand pricing section
of the BigQuery ML pricing page.
What's next
- Try using a TimesFM model with the
AI.FORECAST
function. - Evaluate forecasting results from the TimesFM model using the
AI.EVALUATE
function. - For information about forecasting in BigQuery ML, see Forecasting overview.
- For more information about supported SQL statements and functions for time series forecasting models, see End-to-end user journeys for time series forecasting models.