AI Explanations integrates feature attributions into AI Platform Prediction. This page provides a brief conceptual overview of the feature attribution methods available with AI Platform Prediction. For an in-depth technical discussion, refer to our AI Explanations Whitepaper.
AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result. You can then use this information to verify that the model is behaving as expected, recognize bias in your models, and get ideas for ways to improve your model and your training data.
Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance. When you request predictions, you get predicted values as appropriate for your model. When you request explanations, you get the predictions along with feature attribution information.
Feature attributions work on tabular data, and include built-in visualization capabilities for image data. Consider the following examples:
A deep neural network is trained to predict the duration of a bike ride, based on weather data and previous ride sharing data. If you request only predictions from this model, you get predicted durations of bike rides in number of minutes. If you request explanations, you get the predicted bike trip duration, along with an attribution score for each feature in your explanations request. The attribution scores show how much the feature affected the change in prediction value, relative to the baseline value that you specify. Choose a meaningful baseline that makes sense for your model - in this case, the median bike ride duration. You can plot the feature attribution scores to see which features contributed most strongly to the resulting prediction:
An image classification model is trained to predict whether a given image contains a dog or a cat. If you request predictions from this model on a new set of images, then you receive a prediction for each image ("dog" or "cat"). If you request explanations, you get the predicted class along with an overlay for the image, showing which pixels in the image contributed most strongly to the resulting prediction:
An image classification model is trained to predict the species of a flower in the image. If you request predictions from this model on a new set of images, then you receive a prediction for each image ("daisy" or "dandelion"). If you request explanations, you get the predicted class along with an overlay for the image, showing which areas in the image contributed most strongly to the resulting prediction:
Advantages and use cases
If you inspect specific instances, and also aggregate feature attributions across your training dataset, you can get deeper insight into how your model works. Consider the following advantages and use cases:
- Debugging models: Feature attributions can help detect issues in the data that standard model evaluation techniques would usually miss. For example, an image pathology model achieved suspiciously good results on a test dataset of chest X-Ray images. Feature attributions revealed that the model's high accuracy depended on the radiologist's pen marks in the image.
- Optimizing models: You can identify and remove features that are less important, which can result in more efficient models.
Consider the following limitations of feature attributions:
- Attributions are specific to individual predictions. Inspecting an attribution for an individual prediction may provide good insight, but the insight may not be generalizable to the entire class for that individual instance, or the entire model. To get more generalizable insight, you could aggregate attributions over subsets over your dataset, or the entire dataset.
- Although feature attributions can help with model debugging, they do not always indicate clearly whether an issue arises from the model or from the data that the model is trained on. Use your best judgment, and diagnose common data issues to narrow the space of potential causes.
- Feature attributions are subject to similar adversarial attacks as predictions in complex models.
Comparing feature attribution methods
AI Explanations offers three methods to use for feature attributions: sampled Shapley, integrated gradients, and XRAI.
|Method||Basic explanation||Recommended model types||Example use cases|
|Integrated gradients||A gradients-based method to efficiently compute feature attributions with the same axiomatic properties as the Shapley value.||Differentiable models, such as neural networks. Recommended especially
for models with large feature spaces.
Recommended for low-contrast images, such as X-rays.
|XRAI (eXplanation with Ranked Area Integrals)||Based on the integrated gradients method, XRAI assesses overlapping regions of the image to create a saliency map, which highlights relevant regions of the image rather than pixels.||Models that accept image inputs. Recommended especially for natural images, which are any real-world scenes that contain multiple objects.||
|Sampled Shapley||Assigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling approximation of exact Shapley values.||Non-differentiable models, such as ensembles of trees and neural networks1||
Understanding feature attribution methods
Each feature attribution method is based on Shapley values - a cooperative game theory algorithm that assigns credit to each player in a game for a particular outcome. Applied to machine learning models, this means that each model feature is treated as a "player" in the game - and AI Explanations assigns proportional credit to each feature for the outcome of a particular prediction.
AI Explanations lets you "choose" your players, so to speak, by selecting the exact features for your explanations request.
Sampled Shapley method
The sampled Shapley method provides a sampling approximation of exact Shapley values.
Integrated gradients method
In the integrated gradients method, the gradient of the prediction output is calculated with respect to the features of the input, along an integral path.
- The gradients are calculated at different intervals of a scaling parameter. (For image data, imagine this scaling parameter as a "slider" that is scaling all pixels of the image to black.)
- The gradients are "integrated":
- The gradients are averaged together.
- The element-wise product of the averaged gradients and the original input is calculated.
For an intuitive explanation of this process as applied to images, refer to the blog post, "Attributing a deep network's prediction to its input features". The authors of the original paper about integrated gradients (Axiomatic Attribution for Deep Networks) show in the preceding blog post what the images look like at each step of the process.
The XRAI method combines the integrated gradients method with additional steps to determine which regions of the image contribute the most to a given class prediction.
- Pixel-level attribution: XRAI performs pixel-level attribution for the input image. In this step, XRAI uses the integrated gradients method with a black baseline and a white baseline.
- Oversegmentation: Independently of pixel-level attribution, XRAI oversegments the image to create a patchwork of small regions. XRAI uses Felzenswalb's graph-based method to create the image segments.
- Region selection: XRAI aggregates the pixel-level attribution within each segment to determine its attribution density. Using these values, XRAI ranks each segment and then orders the segments from most to least positive. This determines which areas of the image are most salient, or contribute most strongly to a given class prediction.
Differentiable and non-differentiable models
In differentiable models, you can calculate the derivative of all the operations in your TensorFlow graph. This property helps to make backpropagation possible in such models. For example, neural networks are differentiable. To get feature attributions for differentiable models, use the integrated gradients method.
Non-differentiable models include non-differentiable operations in the TensorFlow graph, such as operations that perform decoding and rounding tasks. For example, a model built as an ensemble of trees and neural networks is non-differentiable. To get feature attributions for non-differentiable models, use the sampled Shapley method. Sampled Shapley also works on differentiable models, but in that case, it is more computationally expensive than necessary.
The implementations of sampled Shapley, integrated gradients and XRAI are based on the following references, respectively:
- Bounding the Estimation Error of Sampling-based Shapley Value Approximation
- Axiomatic Attribution for Deep Networks
- XRAI: Better Attributions Through Regions
Learn more about the implementation of AI Explanations by reading the AI Explanations Whitepaper.
The following resources provide further useful educational material:
- Interpretable Machine Learning: Shapley values
- Ankur Taly's Integrated Gradients GitHub repository.
- The SHAP (SHapley Additive exPlanations) library
- Introduction to Shapley values