AI Explanations feature attributions demonstrate how input features influence the output of a model. When requesting feature attributions with AI Platform Prediction, you must submit an explanation metadata file to identify the inputs and outputs of your TensorFlow model. This lets you select particular features for your explanations request, without requiring you to alter your model.

This guide focuses on various ways to identify the names of your input and output tensors, in order to help you prepare your explanation metadata file.

## Before you begin

- Learn how to export a TensorFlow Estimator to the SavedModel format.
- Learn how to inspect the signature definition of a TensorFlow model using the SavedModel Command Line Interface (CLI). The example notebooks demonstrate how to do all of these steps.

## Inputs and outputs in explanation metadata

To prepare your explanation metadata, you must specify the inputs and outputs
for your model in a file named `explanation_metadata.json`

:

```
{
"inputs": {
string <input feature key>: {
"input_tensor_name": string,
},
"outputs": {
string <output value key>: {
"output_tensor_name": string
},
},
"framework": "tensorflow"
}
```

Within the file's `inputs`

and `outputs`

objects, you must provide the names of
the input and output tensors for your explanations request.

- The keys for each input and output ("input feature key" and "output value key"
in the preceding example) allow you to give meaningful names to each tensor. In
the sample below, the input feature key is
`degrees_celsius`

, and the output value key is`probabilities`

. - For the values in each metadata
`input`

and`output`

, you must provide the actual name of the tensor as the`input_tensor_name`

or`output_tensor_name`

. In the sample below, the`input_tensor_name`

is`x:0`

and the`output_tensor_name`

is`dense/Softmax:0`

.

```
{
"inputs": {
"degrees_celsius": {
"input_tensor_name": "x:0",
}
},
"outputs": {
"probabilities": {
"output_tensor_name": "dense/Softmax:0"
}
},
"framework": "tensorflow"
}
```

The actual tensor names are formatted as `name:index`

.

## Finding input and output tensors

After training a TensorFlow model, export it as a SavedModel. The TensorFlow
SavedModel contains your trained TensorFlow model, along with serialized
signatures, variables, and other assets needed to run the graph. Each
`SignatureDef`

identifies a function in your graph that accepts tensor
inputs and produces tensor outputs. Similarly, your explanation metadata file
defines the inputs and outputs of your graph for your feature attribution
request to AI Explanations.

Often, the input and output tensors you specify in your explanation metadata file map exactly to the signatures you define when you save your model. If so, finding your input and output tensor names is relatively straightforward. However, in some cases, the inputs or the outputs you want to explain could be different than the ones that you define when you save your model.

Your inputs and outputs for explanations are the same as the ones you set in
your serving `SignatureDef`

if:

- Your inputs are not in serialized form
- Each input to the
`SignatureDef`

contains the value of the feature directly (can be either numeric values or strings) - The outputs are numeric values, treated as numeric data. This excludes class IDs, which are considered categorical data.

For these cases, you can get the names of the input and output tensors while
building the model. Alternatively, you can inspect your SavedModel's
`SignatureDef`

with the
SavedModel CLI
to find the names of your input and output tensors.

For any case that does *not* fit the preceding criteria, there are
other approaches you can take to find
the right input and output tensors.

### Getting tensor names during training

It is easiest to access the input and output tensor names during training. You
can save these values to your explanation metadata file while your program
or environment still has access to the variables you set when building the
model. In this example, the Keras layer's `name`

field produces the underlying
tensor name you need for your explanation metadata:

```
bow_inputs = tf.keras.layers.Input(shape=(2000,))
merged_layer = tf.keras.layers.Dense(256, activation="relu")(bow_inputs)
predictions = tf.keras.layers.Dense(10, activation="sigmoid")(merged_layer)
model = keras.Model(inputs=bow_inputs, outputs=predictions)
print('input_tensor_name:', bow_inputs.name)
print('output_tensor_name:', predictions.name)
```

```
input_tensor_name: input_1:0
output_tensor_name: dense_1/Sigmoid:0
```

For a full working example, refer to the example notebooks.

### Getting tensor names from signature definitions

Given that `SignatureDef`

s and explanation metadata both identify tensor inputs
and outputs, you can use the `SignatureDef`

to prepare your explanation
metadata file - provided it meets the
previously mentioned criteria.

Consider the following example `SignatureDef`

:

```
The given SavedModel SignatureDef contains the following input(s):
inputs['my_numpy_input'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: x:0
The given SavedModel SignatureDef contains the following output(s):
outputs['probabilities'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: dense/Softmax:0
Method name is: tensorflow/serving/predict
```

The graph has an input tensor named `x:0`

and an output tensor named
`dense/Softmax:0`

. Both tensors also have meaningful names:
`my_numpy_input`

and `probabilities`

respectively. To request explanations for
`probabilities`

with respect to `my_numpy_input`

, you can create an
explanation metadata file as follows:

```
{
"inputs": {
"my_numpy_input": {
"input_tensor_name": "x:0",
}
},
"outputs": {
"probabilities": {
"output_tensor_name": "dense/Softmax:0"
}
},
"framework": "tensorflow"
}
```

To inspect the `SignatureDef`

of your SavedModel, you can use the SavedModel
CLI. Learn more about how to use the SavedModel CLI.

## Handling input and output discrepancies

There are a few common cases where the input and output tensors in your
explanation metadata should *not* be the same as those in your serving
`SignatureDef`

:

- You have serialized inputs
- Your graph includes preprocessing operations
- Your serving outputs are not probabilities, logits or other types of floating point tensors

In these cases, you should use different approaches to find the right input and output tensors. The overall goal is to find the tensors pertaining to feature values you want to explain for inputs and tensors pertaining to logits (pre-activation), probabilities (post-activation), or any other representation for outputs.

### Input discrepancies

The inputs in your explanation metadata differ from those in your serving
`SignatureDef`

if you use a serialized input to feed the model, or if your
graph includes preprocessing operations.

#### Serialized inputs

TensorFlow SavedModels can accept a variety of complex inputs, including:

- Serialized tf.Example messages
- JSON strings
- Encoded Base64 strings (to represent image data)

If your model accepts serialized inputs like these, using these tensors directly as input for your explanations will not work, or could produce nonsensical results. Instead, you want to locate subsequent input tensors that are feeding into feature columns within your model.

When you export your model, you can add a parsing operation to your TensorFlow graph by calling a parsing function in your serving input function. You can find parsing functions listed in the tf.io module. These parsing functions usually return tensors as a response, and these tensors are better selections for your explanation metadata.

For example, you could use tf.parse_example() when exporting your model. It takes a serialized tf.Example message and outputs a dictionary of tensors feeding to feature columns. You can use its output to fill in your explanation metadata. If some of these outputs are tf.SparseTensor, which is a named tuple consisting of 3 tensors, then you should get the names of indices, values and dense_shape tensors and fill the corresponding fields in the metadata.

The following example shows how to get name of the input tensor after a decoding operation:

```
float_pixels = tf.map_fn(
lambda img_string: tf.io.decode_image(
img_string,
channels=color_depth,
dtype=tf.float32
),
features,
dtype=tf.float32,
name='input_convert'
)
print(float_pixels.name)
```

#### Preprocessing inputs

If your model graph contains some preprocessing operations, you might want to
get explanations on the tensors *after* the preprocessing step. In this case,
you can get the names of those tensors by using `name`

property of tf.Tensor and
put them in the explanation metadata:

```
item_one_hot = tf.one_hot(item_indices, depth,
on_value=1.0, off_value=0.0,
axis=-1, name="one_hot_items:0")
print(item_one_hot.name)
```

The decoded tensor name becomes `input_pixels:0`

.

### Output discrepancies

In most cases, the outputs in your serving `SignatureDef`

are either
probabilities or logits.

If your model is attributing probabilities but you want to explain the logit values instead, you have to find the appropriate output tensor names that correspond to the logits.

If your serving `SignatureDef`

has outputs that are not probabilities or
logits, you should refer to the probabilities operation in the training graph.
This scenario is unlikely for Keras models. If this happens, you can use
TensorBoard (or other graph visualization tools) to help locate
the right output tensor names.

## Additional considerations for integrated gradients

AI Explanations provides two feature attribution methods: sampled Shapley and integrated gradients. Using the integrated gradients method requires you to make sure that your inputs are differentiable with respect to the output, so you must keep this in mind when you prepare your explanation metadata. You don't need to ensure that your inputs are differentiable if you use the sampled Shapley feature attribution method. Learn more about the feature attribution methods supported in AI Explanations.

The explanation metadata logically separates a model's features from its inputs. When using integrated gradients with an input tensor that is not differentiable with respect to the output tensor, you need to provide the encoded (and differentiable) version of that feature as well.

Use the following approach if you have non-differentiable input tensors, or if you have non-differentiable operations in your graph:

- Encode the non-differentiable inputs as differentiable inputs.
- Set
`input_tensor_name`

to the name of the original, non-differentiable input tensor,*and*set`encoded_tensor_name`

to the name of its encoded, differentiable version.

### Explanation metadata file with encoding

For example, consider a model that has a categorical feature with an input
tensor named `zip_codes:0`

. Because the input data includes zip codes as
strings, the input tensor `zip_codes:0`

is non-differentiable. If the model also
preprocesses this data to get a one-hot encoding representation of the zip
codes, then the input tensor after preprocessing is differentiable. To
distinguish it from the original input tensor, you could name it
`zip_codes_embedding:0`

.

To use the data from both input tensors in your explanations request, set the
metadata `inputs`

as follows:

- Set the input feature key to a meaningful name, such as
`zip_codes`

. - Set
`input_tensor_name`

to the name of the original tensor,`zip_codes:0`

. - Set
`encoded_tensor_name`

to the name of the tensor after one-hot encoding,`zip_codes_embedding:0`

. - Set
`encoding`

to`combined_embedding`

.

```
{
"inputs": {
"zip_codes": {
"input_tensor_name": "zip_codes:0",
"encoded_tensor_name": "zip_codes_embedding:0",
"encoding": "combined_embedding"
}
},
"outputs": {
"probabilities": {
"output_tensor_name": "dense/Softmax:0"
}
},
"framework": "tensorflow"
}
```

Alternatively, you could set `input_tensor_name`

to the name of
the encoded, differentiable input tensor and omit the original,
non-differentiable tensor. The benefit of providing both tensors is that
attributions can be made to individual zip code values rather than on its
one-hot encoding representation. In this example, you would exclude the
original tensor (`zip_codes:0`

) and set `input_tensor_name`

to
`zip_codes_embedding:0`

. This approach is *not* recommended, because the
resulting feature attributions would be difficult to reason about.

### Encoding

To enable encoding in your explanations request, you specify encoding settings as shown in the preceding example.

The encoding feature helps reverse the process from encoded data to input data
for attributions, which eliminates the need to post-process the returned
attributions manually. Currently, AI Explanations supports `combined_embedding`

,
where a variable length feature is combined into an embedding. An example
operation that matches this `combined_embedding`

is
`tf.nn.embedding_lookup_sparse`

.

For the `combined_embedding`

:

The input tensor is encoded into a 1D array. For example:

- Input:
`["This", "is", "a", "test"]`

- Encoded:
`[0.1, 0.2, 0.3, 0.4]`

## What's next

- Try the example notebooks
- Learn how to deploy a model with explanations