Feature Attributions for Forecasting

Introduction

Vertex Explainable AI integrates feature attributions into Vertex AI. This page provides a brief conceptual overview of the feature attribution methods available with Vertex AI. For an in-depth technical discussion, refer to our AI Explanations Whitepaper.

Feature attributions for time series models indicate how much each feature in a model contributed to a prediction.

Feature attributions measure a feature's contribution to a prediction relative to an input baseline. For numerical features such as sales, the baseline input is the median sales. For categorical features such as the product name, the baseline input is the most common product name. The sum of all attributions is not the prediction, it is how much the prediction differs from the baseline prediction (i.e. all inputs are baseline inputs).

Generate local feature attributions

To generate feature attributions, set the generate_explanations parameter to true when performing a Batch Prediction job. All model preparation, such as generating baselines, is managed by the Vertex AI Forecast service.

Feature attributions are determined based on forecasts made for counterfactuals. An example forecast is as follows: What would be the forecast if the advertisement value of TRUE on 2020-11-21 was replaced with FALSE, the most common value? The required number of counterfactuals scales with the number of columns and the number of paths (service generated). The resulting number of predictions may be orders of magnitude larger than a normal batch prediction job and the expected run time scales accordingly.

Query feature attributions

Feature attributions are written to the BigQuery prediction output table as a struct. Note that CSV output is currently not supported.

You can access attributions in a query with:

predictions.explanation.attributions[OFFSET(0)].featureAttributions.advertisement

Note that the nested and repeated fields are standard for batch explanation jobs and are consistent with Vertex AI Tabular.

Example 1: Determine attributions for a single prediction

Consider the following question:

How much did an advertisement for a product increase predicted sales on November 24th at a given store?

The corresponding query is as follows:

SELECT
  * EXCEPT(explanation, predicted_sales),
  ROUND(predicted_sales.value, 2) AS predicted_sales,
  ROUND(
    explanation.attributions[OFFSET(0)].featureAttributions.advertisement,
    2
  ) AS attribution_advertisement
FROM
  `project.dataset.predictions`
WHERE
  product = 'product_0'
  AND store = 'store_0'
  AND date = '2019-11-24'

Example 2: Determine global feature importance

Consider the following question:

How much did each feature contribute to predicted sales overall?

The corresponding query is as follows:

WITH

/*
* Aggregate from (id, date) level attributions to global feature importance.
*/
attributions_aggregated AS (
 SELECT
   SUM(ABS(attributions.featureAttributions.date)) AS date,
   SUM(ABS(attributions.featureAttributions.advertisement)) AS advertisement,
   SUM(ABS(attributions.featureAttributions.holiday)) AS holiday,
   SUM(ABS(attributions.featureAttributions.sales)) AS sales,
   SUM(ABS(attributions.featureAttributions.store)) AS store,
   SUM(ABS(attributions.featureAttributions.product)) AS product,
 FROM
   project.dataset.predictions,
   UNNEST(explanation.attributions) AS attributions

),

/*
* Calculate the normalization constant for global feature importance.
*/
attributions_aggregated_with_total AS (
 SELECT
   *,
   date + advertisement + holiday + sales + store + product AS total
 FROM
   attributions_aggregated
)

/*
* Calculate the normalized global feature importance.
*/
SELECT
 ROUND(date / total, 2) AS date,
 ROUND(advertisement / total, 2) AS advertisement,
 ROUND(holiday / total, 2) AS holiday,
 ROUND(sales / total, 2) AS sales,
 ROUND(store / total, 2) AS store,
 ROUND(product / total, 2) AS product,
FROM
 attributions_aggregated_with_total

View explanation metadata and parameters

The Explanation Parameters and Metadata contain the following:

  • The baselines used to generate explanations.
    • Corresponding explanation parameter: static_value.
  • The number of paths, a factor in the amount of time it takes to generate feature attributions.
    • Corresponding explanation parameter: pathCount.
  • Columns available at forecast.
    • Corresponding explanation parameters: historical_values, prediction_values.
  • Columns unavailable at forecast.
    • Corresponding explanation parameter: historical_values.

The model can be viewed using the Vertex AI REST API and includes the explanation spec.

REST & CMD LINE

Before using any of the request data, make the following replacements:

  • LOCATION: Region where your model is stored
  • PROJECT: Your project ID.
  • MODEL_ID: The ID of the model resource

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID "

PowerShell

Execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/models/MODEL_ID " | Select-Object -Expand Content

You should see output similar to the following for a trained AutoML model.

Algorithm

Vertex AI provides feature attributions using Shapley Values, a cooperative game theory algorithm that assigns credit to each player in a game for a particular outcome. Applied to machine learning models, this means that each model feature is treated as a "player" in the game and credit is assigned in proportion to the outcome of a particular prediction. For structured data models, which are non-differentiable meta-ensembles of trees and neural networks, Vertex AI uses a sampling approximation of exact Shapley Values called Sampled Shapley.

For in-depth information on how the sampled Shapley method works, read the paper Bounding the Estimation Error of Sampling-based Shapley Value Approximation.

What's next

The following resources provide further useful educational material: