Introduction to AI Explanations for AI Platform

AI Explanations integrates feature attributions into AI Platform Prediction. This page provides a brief conceptual overview of the feature attribution methods available with AI Platform Prediction. For an in-depth technical discussion, refer to our AI Explainability Whitepaper.

AI Explanations helps you understand your model's outputs for classification and regression tasks. Whenever you request a prediction on AI Platform, AI Explanations tells you how much each feature in the data contributed to the predicted result. You can then use this information to verify that the model is behaving as expected, recognize bias in your models, and get ideas for ways to improve your model and your training data.

Feature attributions

Feature attributions indicate how much each feature in your model contributed to the predictions for each given instance. When you request predictions, you get predicted values as appropriate for your model. When you request explanations, you get the predictions along with feature attribution information.

Feature attributions work on tabular data, and include built-in visualization capabilities for image data. Consider the following examples:

  • A deep neural network is trained to predict the duration of a bike ride, based on weather data and previous ride sharing data. If you request only predictions from this model, you get predicted durations of bike rides in number of minutes. If you request explanations, you get the predicted bike trip duration, along with an attribution score for each feature in your explanations request. The attribution scores show how much the feature affected the change in prediction value, relative to the baseline value that you specify. Choose a meaningful baseline that makes sense for your model - in this case, the median bike ride duration. You can plot the feature attribution scores to see which features contributed most strongly to the resulting prediction:

    A feature attribution chart for one predicted bike ride duration

  • An image classification model is trained to predict whether a given image contains a dog or a cat. If you request predictions from this model on a new set of images, then you receive a prediction for each image ("dog" or "cat"). If you request explanations, you get the predicted class along with an overlay for the image, showing which pixels in the image contributed most strongly to the resulting prediction:

    A photo of a cat with feature attribution overlay
    A photo of a cat with feature attribution overlay
    A photo of a dog with feature attribution overlay
    A photo of a dog with feature attribution overlay

Advantages and use cases

If you inspect specific instances, and also aggregate feature attributions across your training dataset, you can get deeper insight into how your model works. Consider the following advantages and use cases:

  • Debugging models: Feature attributions can help detect issues in the data that standard model evaluation techniques would usually miss. For example, an image pathology model achieved suspiciously good results on a test dataset of chest X-Ray images. Feature attributions revealed that the model's high accuracy depended on the radiologist's pen marks in the image.
  • Optimizing models: You can identify and remove features that are less important, which can result in more efficient models.

Conceptual limitations

Consider the following limitations of feature attributions:

  • Attributions are specific to individual predictions. Inspecting an attribution for an individual prediction may provide good insight, but the insight may not be generalizable to the entire class for that individual instance, or the entire model. To get more generalizable insight, you could aggregate attributions over subsets over your dataset, or the entire dataset.
  • Although feature attributions can help with model debugging, they do not always indicate clearly whether an issue arises from the model or from the data that the model is trained on. Use your best judgment, and diagnose common data issues to narrow the space of potential causes.
  • Feature attributions are subject to similar adversarial attacks as predictions in complex models.

For more information about limitations, refer to the high-level limitations list and the AI Explainability Whitepaper.

Service limitations

Consider the following limitations of the AI Explanations feature attributions in AI Platform Prediction:

See the example code showing how to save a model, and try an example notebook for more details.

Comparing feature attribution methods

AI Explanations offers two methods to use for feature attributions: sampled Shapley and integrated gradients.

Method Basic explanation Recommended model types Example use cases
Integrated gradients A gradients-based method to efficiently compute feature attributions with the same axiomatic properties as the Shapley value. Differentiable models, such as neural networks. Recommended especially for models with large feature spaces.
  • Classification and regression on tabular data
  • Classification on image data
Sampled Shapley Assigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling approximation of exact Shapley values. Non-differentiable models, such as ensembles of trees and neural networks1
  • Classification and regression on tabular data

Understanding feature attribution methods

Both feature attribution methods are based on Shapley values - a cooperative game theory algorithm that assigns credit to each player in a game for a particular outcome. Applied to machine learning models, this means that each model feature is treated as a "player" in the game - and AI Explanations assigns proportional credit to each feature for the outcome of a particular prediction.

AI Explanations lets you "choose" your players, so to speak, by selecting the exact features for your explanations request.

In the integrated gradients method, the gradient of the prediction output is calculated with respect to the features of the input, along an integral path.

  1. The gradients are calculated at different intervals of a scaling parameter. (For image data, imagine this scaling parameter as a "slider" that is scaling all pixels of the image to black.)
  2. The gradients are "integrated":
    1. The gradients are averaged together.
    2. The element-wise product of the averaged gradients and the original input is calculated.

For an intuitive explanation of this process as applied to images, refer to the blog post, "Attributing a deep network's prediction to its input features". The authors of the original paper about integrated gradients (Axiomatic Attribution for Deep Networks) show in the preceding blog post what the images look like at each step of the process.

Differentiable and non-differentiable models

In differentiable models, you can calculate the derivative of all the operations in your TensorFlow graph. This property helps to make backpropagation possible in such models. For example, neural networks are differentiable. To get feature attributions for differentiable models, use the integrated gradients method.

Non-differentiable models include non-differentiable operations in the TensorFlow graph, such as operations that perform decoding and rounding tasks. For example, a model built as an ensemble of trees and neural networks is non-differentiable. To get feature attributions for non-differentiable models, use the sampled Shapley method. Sampled Shapley also works on differentiable models, but in that case, it is more computationally expensive than necessary.


The implementations of sampled Shapley and integrated gradients are based on the following references, respectively:

Learn more about the implementation of AI Explanations by reading the AI Explainability Whitepaper.

Educational resources

The following resources provide further useful educational material:

What's next