As you consider the explanations returned from the service, you should keep in mind these high-level limitations. For an in-depth explanation, refer to the AI Explainability Whitepaper.
Meaning and scope of feature attributions
Consider the following when analyzing feature attributions provided by AI Explanations:
- Each attribution only shows how much the feature affected the prediction for that particular example. A single attribution might not reflect the overall behavior of the model. To understand approximate model behavior on an entire dataset, aggregate attributions over the entire dataset.
- The attributions depend entirely on the model and data used to train the model. They can only tell the patterns the model found in the data, and can't detect any fundamental relationships in the data. So, the presence or absence of a strong attribution to a certain feature doesn't mean there is or is not a relationship between that feature and the target. The attribution merely shows that the model is or is not using the feature in its predictions.
- Attributions alone cannot tell if your model is fair, unbiased, or of sound quality. Carefully evaluate your training dataset, procedure, and evaluation metrics in addition to the attributions.
Improving feature attributions
The following factors have the highest impact on feature attributions:
The attribution methods approximate the Shapley value. You can increase the precision of the approximation by:
- Increasing the number of integral steps for the integrated gradients or XRAI methods.
- Increasing the number of integral paths for the sampled Shapley method.
As a result, the attributions could change dramatically.
The attributions only express how much the feature affected the change in prediction value, relative to the baseline value. Take care to choose a meaningful baseline, relevant to the question you're asking of the model. Attribution values and their interpretation might change significantly as you switch baselines.
For integrated gradients and XRAI, using two baselines can improve your results. For example, you can specify baselines that represent an entirely black image and an entirely white image.
Limitations for image data
The two attribution methods that support image data are integrated gradients and XRAI.
Integrated gradients is a pixel-based attribution method that highlights important areas in the image regardless of contrast, making this method ideal for non-natural images such as X-rays. However, the granular output can make it difficult to assess the relative importance of areas. The default output highlights areas in the image that have high positive attributions by drawing outlines, but these outlines are not ranked and may span across objects.
XRAI works best on natural, higher-contrast images containing multiple objects. Because this method produces region-based attributions, it produces a smoother, more human-readable heatmap of regions that are most salient for a given image classification.
Currently, XRAI does not work well on the following types of image input:
- Low-contrast images that are all one shade, such as X-rays.
- Very tall or very wide images, such as panoramas.
- Very large images, which may slow down overall runtime.