Reduce forecasting bias with hierarchical aggregation

This page explains what hierarchical forecasting is, what its objectives are, and shows you some training strategies that you can employ to reduce bias in your forecasting models.

For detailed instructions on how to configure hierarchical forecasting when training your forecasting model using the API, see Train a forecast model.

What is hierarchical forecasting

Time series are often structured in a nested hierarchy. For example, the entire inventory of products that a retailer sells can be divided into categories of products. The categories can be further divided into individual products. When forecasting future sales, the forecasts for the products of a category should add up to the forecast for the category itself, and so forth up the hierarchy.

A hierarchy of products and categories.

Similarly, the time dimension of a single time series can also exhibit a hierarchy. For example, forecasted sales for an individual product at the day level should add up to the product's forecasted weekly sales. The following figure shows this group and temporal hierarchy as a matrix:

A time series matrix.

Hierarchical forecasting has three objectives:

  • Reduce overall bias to improve metrics over all time series (total sales).
  • Reduce temporal bias to improve metrics over the horizon (season sales).
  • Reduce group level bias to improve metrics over a group of time series (item sales).

In Vertex AI, hierarchical forecasting takes into account the hierarchical structure of time series by incorporating additional loss terms for aggregated predictions.

Hierarchical loss = (1 x loss) +
                    (temporal total weight x temporal total loss) +
                    (group total weight x group total loss) +
                    (group temporal total weight x group temporal total loss)

For example, if the hierarchical group is "category", the predictions at the "category" level is the sum of predictions for all "products" in the category. If the objective of the model is mean absolute error (MAE), the loss would include the MAE for predictions at both the "product" and "category" levels. This helps to improve the consistency of forecasts at different levels of the hierarchy, and in some cases, may even improve metrics at the lowest level.

Configure hierarchical aggregation for model training

You can configure hierarchical aggregation when training your forecast models by configuring AutoMLForecastingTrainingJob in the Vertex AI SDK or by configuring hierarchyConfig in the Vertex AI API.

Available parameters for AutoMLForecastingTrainingJob and hierarchyConfig include:

  • group_columns
  • group_total_weight
  • temporal_total_weight
  • group_temporal_total_weight

The parameters allow for different combinations of group and time aggregated losses. They also allow you to assign weights to increase the priority of minimizing the aggregated loss relative to the individual loss. For example, if the weight is set to 2.0, it will be weighted twice as much as the individual loss.

group_columns

Column names in your training input table that identify the grouping for the hierarchy level. The column(s) must be time_series_attribute_columns. If the group column is not set, all time series will be treated as part of the same group and is aggregated over all time series.

group_total_weight

Weight of the group aggregated loss relative to the individual loss. Disabled if set to 0.0 or is not set.

temporal_total_weight

Weight of the time aggregated loss relative to the individual loss. Disabled if set to 0.0 or is not set.

group_temporal_total_weight

Weight of the total (group x time) aggregated loss relative to the individual loss. Disabled if set to 0.0 or is not set. If the group column is not set, all time series will be treated as part of the same group and is aggregated over all time series.

Strategies to reduce bias

Consider starting with one type of aggregation (group or time) with a weight of 10.0, and then halve or double the value based on the results.

Reduce overall bias

In fine-grained forecasts for distributing stock across stores where weighted absolute percentage error (WAPE) at the product x store x date level are used as a forecasting metric, forecasts often underpredict at the aggregate levels. To compensate for this overall bias, you can try the following:

  • Set group_total_weight to 10.0.
  • Leave group_columns unset.
  • Leave other weights unset.

This aggregates over all time series and reduces overall bias.

Reduce temporal bias

In long term planning, forecasts may be made at a product x region x week level, but the relevant metrics may be measured with respect to seasonal totals. To compensate for this temporal bias, you can try the following:

  • Set temporal_total_weight to 10.0.
  • Leave group_columns unset.
  • Leave other weights unset.

This aggregates over all dates in the horizon of a time series, and reduces temporal bias.

Reduce group level bias

For forecasts that are multi-purpose in the replenishment process, fine grained forecasts at the product x store x date or week level may be aggregated up to product x distribution center x date levels for distribution or product category x date levels for materials orders. To do this, perform the following:

  • Set group_total_weight to 10.0.
  • Set group_columns, for example, ["region"] or ["region", "category"]. Setting multiple group columns uses their combined value to define the group. For best results, use group columns with 100 or fewer distinct combined values.
  • Leave other weights unset.

This aggregates over all time series in the same group for the same date, and reduces bias at the group level.

Limits

  • Only one level of time series aggregation is supported. If more than one grouping column is specified, such as "product, store", the time series is in the same group only if they share the same values of both "product" and "store".
  • We recommend using 100 or fewer groups.

What's next