Introduction to AI Explanations for AI Platform

AI Explanations integrates feature attributions into AI Platform Prediction. This page provides a brief conceptual overview of the feature attribution methods available with AI Platform Prediction. For an in-depth technical discussion, refer to our AI Explanations Whitepaper.

AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result. You can then use this information to verify that the model is behaving as expected, recognize bias in your models, and get ideas for ways to improve your model and your training data.

Feature attributions

Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance. When you request predictions, you get predicted values as appropriate for your model. When you request explanations, you get the predictions along with feature attribution information.

Feature attributions work on tabular data, and include built-in visualization capabilities for image data. Consider the following examples:

  • A deep neural network is trained to predict the duration of a bike ride, based on weather data and previous ride sharing data. If you request only predictions from this model, you get predicted durations of bike rides in number of minutes. If you request explanations, you get the predicted bike trip duration, along with an attribution score for each feature in your explanations request. The attribution scores show how much the feature affected the change in prediction value, relative to the baseline value that you specify. Choose a meaningful baseline that makes sense for your model - in this case, the median bike ride duration. You can plot the feature attribution scores to see which features contributed most strongly to the resulting prediction:

    A feature attribution chart for one predicted bike ride duration

  • An image classification model is trained to predict whether a given image contains a dog or a cat. If you request predictions from this model on a new set of images, then you receive a prediction for each image ("dog" or "cat"). If you request explanations, you get the predicted class along with an overlay for the image, showing which pixels in the image contributed most strongly to the resulting prediction:

    A photo of a cat with feature attribution overlay
    A photo of a cat with feature attribution overlay
    A photo of a dog with feature attribution overlay
    A photo of a dog with feature attribution overlay
  • An image classification model is trained to predict the species of a flower in the image. If you request predictions from this model on a new set of images, then you receive a prediction for each image ("daisy" or "dandelion"). If you request explanations, you get the predicted class along with an overlay for the image, showing which areas in the image contributed most strongly to the resulting prediction:

    A photo of a daisy with feature attribution overlay
    A photo of a daisy with feature attribution overlay

Advantages and use cases

If you inspect specific instances, and also aggregate feature attributions across your training dataset, you can get deeper insight into how your model works. Consider the following advantages and use cases:

  • Debugging models: Feature attributions can help detect issues in the data that standard model evaluation techniques would usually miss. For example, an image pathology model achieved suspiciously good results on a test dataset of chest X-Ray images. Feature attributions revealed that the model's high accuracy depended on the radiologist's pen marks in the image.
  • Optimizing models: You can identify and remove features that are less important, which can result in more efficient models.

Conceptual limitations

Consider the following limitations of feature attributions:

  • Attributions are specific to individual predictions. Inspecting an attribution for an individual prediction may provide good insight, but the insight may not be generalizable to the entire class for that individual instance, or the entire model. To get more generalizable insight, you could aggregate attributions over subsets over your dataset, or the entire dataset.
  • Although feature attributions can help with model debugging, they do not always indicate clearly whether an issue arises from the model or from the data that the model is trained on. Use your best judgment, and diagnose common data issues to narrow the space of potential causes.
  • Feature attributions are subject to similar adversarial attacks as predictions in complex models.

For more information about limitations, refer to the high-level limitations list and the AI Explanations Whitepaper.

Comparing feature attribution methods

AI Explanations offers three methods to use for feature attributions: sampled Shapley, integrated gradients, and XRAI.

Method Basic explanation Recommended model types Example use cases
Integrated gradients A gradients-based method to efficiently compute feature attributions with the same axiomatic properties as the Shapley value. Differentiable models, such as neural networks. Recommended especially for models with large feature spaces.
Recommended for low-contrast images, such as X-rays.
  • Classification and regression on tabular data
  • Classification on image data
XRAI (eXplanation with Ranked Area Integrals) Based on the integrated gradients method, XRAI assesses overlapping regions of the image to create a saliency map, which highlights relevant regions of the image rather than pixels. Models that accept image inputs. Recommended especially for natural images, which are any real-world scenes that contain multiple objects.
  • Classification on image data
Sampled Shapley Assigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling approximation of exact Shapley values. Non-differentiable models, such as ensembles of trees and neural networks1
  • Classification and regression on tabular data

Understanding feature attribution methods

Each feature attribution method is based on Shapley values - a cooperative game theory algorithm that assigns credit to each player in a game for a particular outcome. Applied to machine learning models, this means that each model feature is treated as a "player" in the game - and AI Explanations assigns proportional credit to each feature for the outcome of a particular prediction.

AI Explanations lets you "choose" your players, so to speak, by selecting the exact features for your explanations request.

Sampled Shapley method

The sampled Shapley method provides a sampling approximation of exact Shapley values.

Integrated gradients method

In the integrated gradients method, the gradient of the prediction output is calculated with respect to the features of the input, along an integral path.

  1. The gradients are calculated at different intervals of a scaling parameter. (For image data, imagine this scaling parameter as a "slider" that is scaling all pixels of the image to black.)
  2. The gradients are "integrated":
    1. The gradients are averaged together.
    2. The element-wise product of the averaged gradients and the original input is calculated.

For an intuitive explanation of this process as applied to images, refer to the blog post, "Attributing a deep network's prediction to its input features". The authors of the original paper about integrated gradients (Axiomatic Attribution for Deep Networks) show in the preceding blog post what the images look like at each step of the process.

XRAI method

The XRAI method combines the integrated gradients method with additional steps to determine which regions of the image contribute the most to a given class prediction.

  1. Pixel-level attribution: XRAI performs pixel-level attribution for the input image. In this step, XRAI uses the integrated gradients method with a black baseline and a white baseline.
  2. Oversegmentation: Independently of pixel-level attribution, XRAI oversegments the image to create a patchwork of small regions. XRAI uses Felzenswalb's graph-based method to create the image segments.
  3. Region selection: XRAI aggregates the pixel-level attribution within each segment to determine its attribution density. Using these values, XRAI ranks each segment and then orders the segments from most to least positive. This determines which areas of the image are most salient, or contribute most strongly to a given class prediction.

Images that show the steps of the XRAI algorithm

Differentiable and non-differentiable models

In differentiable models, you can calculate the derivative of all the operations in your TensorFlow graph. This property helps to make backpropagation possible in such models. For example, neural networks are differentiable. To get feature attributions for differentiable models, use the integrated gradients method.

Non-differentiable models include non-differentiable operations in the TensorFlow graph, such as operations that perform decoding and rounding tasks. For example, a model built as an ensemble of trees and neural networks is non-differentiable. To get feature attributions for non-differentiable models, use the sampled Shapley method. Sampled Shapley also works on differentiable models, but in that case, it is more computationally expensive than necessary.

References

The implementations of sampled Shapley, integrated gradients and XRAI are based on the following references, respectively:

Learn more about the implementation of AI Explanations by reading the AI Explanations Whitepaper.

Educational resources

The following resources provide further useful educational material:

What's next