The AI.FORECAST function

This document describes the AI.FORECAST function, which lets you forecast a time series by using BigQuery ML's built-in TimesFM model.

Using the AI.FORECAST function with the built-in TimesFM model lets you perform forecasting without having to create and train your own model, so you can avoid the need for model management.

Syntax

SELECT
  *
FROM
  AI.FORECAST(
    { TABLE TABLE | (QUERY_STATEMENT) },
    data_col => 'DATA_COL',
    timestamp_col => 'TIMESTAMP_COL'
    [, model => 'MODEL']
    [, id_cols => ID_COLS]
    [, horizon => HORIZON]
    [, confidence_level => CONFIDENCE_LEVEL]
    [, context_window => CONTEXT_WINDOW]
  )

Arguments

AI.FORECAST takes the following arguments:

  • TABLE: the name of the table that contains the data that you want to forecast. For example, `mydataset.mytable`.

    If the table is in a different project, then you must prepend the project ID to the table name in the following format, including backticks:

    `[PROJECT_ID].[DATASET].[TABLE]`

    For example, `myproject.mydataset.mytable`.

    To prevent query errors, we recommend providing the fully qualified table name, including backticks. This is especially important if the project name contains characters other than letters, numbers, and underscores.

  • QUERY_STATEMENT: the GoogleSQL query that generates the data that you want to forecast. See the GoogleSQL query syntax page for the supported SQL syntax of the QUERY_STATEMENT clause.

  • DATA_COL: a STRING value that specifies the name of the data column. The data column contains the data to forecast. The data column must use one of the following data types:

    • INT64
    • NUMERIC
    • BIGNUMERIC
    • FLOAT64
  • TIMESTAMP_COL: a STRING value that specified the name of the timestamp column. The timestamp column must use one of the following data types:

    • TIMESTAMP
    • DATE
    • DATETIME
  • MODEL: a STRING value that specifies the name of the model to use. TimesFM 2.0 is the only supported value, and is the default value.

  • ID_COLS: an ARRAY<STRING> value that specifies the names of one or more ID columns. Each unique combination of IDs identifies a unique time series to forecast. Specify one or more values for this argument in order to forecast multiple time series using a single query. The columns that you specify must use one of the following data types:

    • STRING
    • INT64
    • ARRAY<STRING>
    • ARRAY<INT64>
  • HORIZON: an INT64 value that specifies the number of time series data points to forecast. The default value is 10. The valid input range is [1, 10,000].

  • CONFIDENCE_LEVEL: a FLOAT64 value that specifies the percentage of the future values that fall in the prediction interval. The default value is 0.95. The valid input range is [0, 1).

  • CONTEXT_WINDOW: an INT64 value that specifies the context window length used by BigQuery ML's built-in TimesFM model. The context window length determines how many of the most recent data points from the input time series are use by the model. For example, if your time series date range is March 1 to April 15, data points are selected starting at April 15 and working backwards. Valid values are as follows:

    • 64
    • 128
    • 256
    • 512
    • 1024
    • 2048

    If you don't specify a CONTEXT_WINDOW value, the AI.FORECAST function automatically chooses the smallest possible context window length to use that is still large enough to cover the number of time series data points in your input data. The following table shows the mapping between the number of time series data points in the input data and the selected context window length:

    Number of time series data points Context window length
    (1, 64] 64
    (65, 128] 128
    (129, 256] 256
    (257, 512] 512
    (513, 1024] 1,024
    (1025, 2048] 2,048
    >2048 2,048

    2,048 is the maximum number of time series data points that are passed to the model. Any additional time series data points in the input data are ignored.

Output

AI.FORECAST returns the following columns:

  • id_cols: one or more values that contain the identifiers of a time series. id_cols can be an INT64, STRING, ARRAY<INT64> or ARRAY<STRING> value. The column names and types are inherited from the ID_COLS argument value specified in the function input.
  • confidence_level: a FLOAT64 value that contains the confidence_level value that you specified in the function input, or 0.95 if you didn't specify a confidence_level value. This value is the same across all rows.
  • prediction_interval_lower_bound: a FLOAT64 value that contains the lower bound of the prediction interval for each forecasted point.
  • prediction_interval_upper_bound: a FLOAT64 value that contains the upper bound of the prediction interval for each forecasted point.
  • ai_forecast_status: a STRING value that contains the forecast status. This value is empty if the operation was successful. If the operation wasn't successful, the value is the error string. A common error is The time series data is too short. This error indicates that there wasn't enough historical data in the time series to generate a forecast. A minimum of 3 data points is required.
  • forecast_timestamp: a TIMESTAMP value that contains the timestamps of the time series.
  • forecast_value: a FLOAT64 value that contains the 50% quantile value for the forecasting output from the model. The 50% quantile value represents the median value of the forecasted data.

Example

The following example forecasts the daily number of bike trips for each different user type for the next 30 days.

WITH
  citibike_trips AS (
    SELECT EXTRACT(DATE FROM starttime) AS date, usertype, COUNT(*) AS num_trips
    FROM `bigquery-public-data.new_york.citibike_trips`
    GROUP BY date, usertype
  )
SELECT *
FROM
  AI.FORECAST(
    TABLE citibike_trips,
    data_col => 'num_trips',
    timestamp_col => 'date',
    id_cols => ['usertype'],
    horizon => 30);

Locations

AI.FORECAST and the TimesFM model are available in all supported BigQuery ML locations.

Pricing

AI.FORECAST usage is billed at the evaluation, inspection, and prediction rate documented in the BigQuery ML on-demand pricing section of the BigQuery ML pricing page.

What's next