Improving explanations

When you are working with custom-trained models, you can configure specific parameters to improve your explanations. This guide describes how to inspect the explanations that you get from Explainable AI for error, and it describes how to adjust your Explainable AI configuration to mitigate error.

If you want to use Explainable AI with an AutoML tabular model, then you don't need to perform any configuration; AI Platform automatically configures the model for Explainable AI. Skip this document and read Getting explanations.

The Explainable AI feature attribution methods are all based on variants of Shapley values. Because Shapley values are very computationally expensive, Explainable AI provides approximations instead of the exact values.

You can reduce the approximation error and get closer to the exact values by changing the following inputs:

  • Increasing the number of integral steps or number of paths.
  • Changing the input baseline(s) you select.
  • Adding more input baselines. With the integrated gradients and XRAI methods, using additional baselines increases latency. Using additional baselines with the sampled Shapley method does not increase latency.

Inspecting explanations for error

After you have requested and received explanations from Explainable AI, you can check the explanations for approximation error. If the explanations have high approximation error, then the explanations might not be reliable. This section describes several ways to check for error.

Checking the approximationError field

For each Attribution, Explainable AI returns approximation error in the approximationError field. If your approximation error exceeds 0.05, consider adjusting your Explainable AI configuration.

Checking the difference between predictions and baseline output

For each Attribution, Explainable AI returns an instanceOutputValue, which represents the part of the prediction output that feature attributions are for, and a baselineOutputValue, which represents what this part of the prediction output would be if the prediction was performed on an input baseline rather than the actual input instance.

If the difference between instanceOutputValue and baselineOutputValue is less than 0.05 for any attributions, then you might need to change your input baselines.

Adjusting your configuration

The following sections describe ways to adjust your Explainable AI configuration to reduce error. To make any of the following changes, you must configure a new Model resource with an updated ExplanationSpec or override the ExplanationSpec of your existing Model by redeploying it to an Endpoint resource or by getting new batch predictions.

Increasing steps or paths

To reduce approximation error, you can increase:

Adjusting baselines

Input baselines represent a feature that provides no additional information. Baselines for tabular models can be median, minimum, maximum, or random values in relation to your training data. Similarly, for image models, your baselines can be a black image, a white image, a gray image, or an image with random pixel values.

When you configure Explainable AI, you can optionally specify the input_baselines field. Otherwise, AI Platform chooses input baselines for you. If you are encountering the problems described in previous sections of this guide, then you might want to adjust the input_baselines for each input of your Model.

In general:

  • Start with one baseline representing median values.
  • Change this baseline to one representing random values.
  • Try two baselines, representing the minimum and maximum values.
  • Add another baseline representing random values.

Example for tabular data

The following Python code creates a ExplanationMetadata message for a hypothetical TensorFlow model trained on tabular data.

Notice that input_baselines is a list where you can specify multiple baselines. This example sets just one baseline. The baseline is a list of median values for the training data (train_data in this example).

explanation_metadata = {
    "inputs": {
        "FEATURE_NAME": {
            "input_tensor_name": "FEATURE_TENSOR_NAME",
            "input_baselines": [train_data.median().values.tolist()],
            "encoding": "bag_of_features",
            "index_feature_mapping": train_data.columns.tolist()
        }
    },
    "outputs": {
        "OUTPUT_NAME": {
            "output_tensor_name": "OUTPUT_TENSOR_NAME"
        }
    }
}

See Configuring explanations for custom-trained models for more context on how to use this ExplanationMetadata

To set two baselines representing minimum and maximum values, set input_baselines as follows: [train_data.min().values.tolist(), train_data.max().values.tolist()]

Example for image data

The following Python code creates a ExplanationMetadata message for a hypothetical TensorFlow model trained on image data.

Notice that input_baselines is a list where you can specify multiple baselines. This example sets just one baseline. The baseline is a list of random values. Using random values for an image baseline is a good approach if the images in your training dataset contain a lot of black and white.

Otherwise, set input_baselines to [0, 1] to represent black and white images.

random_baseline = np.random.rand(192,192,3)

explanation_metadata = {
    "inputs": {
        "FEATURE_NAME": {
            "input_tensor_name": "FEATURE_TENSOR_NAME",
            "modality": "image",
            "input_baselines": [random_baseline.tolist()]
        }
    },
    "outputs": {
        "OUTPUT_NAME": {
            "output_tensor_name": "OUTPUT_TENSOR_NAME"
        }
    }
}

What's next