Explaining predictions

This page describes how you can use feature importance to gain visibility into how the model makes its predictions.

For more information about AI Explanations, see Introduction to AI Explanations for AI Platform.

Introduction

When you use a machine learning model to make business decisions, it's important to understand how your training data contributed to the final model, and how the model arrived at individual predictions. This understanding helps you ensure that your model is fair and accurate.

AutoML Tables provides feature importance, sometimes called feature attributions, which enables you to see which features contributed the most to model training (model feature importance) and individual predictions (local feature importance).

AutoML Tables computes feature importance using the Sampled Shapley method. For more information about model explainability, see Introduction to AI Explanations.

Model feature importance

Model feature importance helps you ensure that the features that informed model training make sense for your data and business problem. All of the features with a high feature importance value should represent a valid signal for prediction, and be able to be consistently included in your prediction requests.

Model feature importance is provided as a percentage for each feature: the higher the percentage, the more strongly that feature impacted model training.

Getting model feature importance

Console

To see your model's feature importance values using the Google Cloud console:

  1. Go to the AutoML Tables page in the Google Cloud console.

    Go to the AutoML Tables page

  2. Select the Models tab in the left navigation pane, and select the model you want to get the evaluation metrics for.

  3. Open the Evaluate tab.

  4. Scroll down to see the Feature importance graph.

AutoML Tables evaluate page

REST

To get the feature importance values for a model, you use the model.get method.

Before using any of the request data, make the following replacements:

  • endpoint: automl.googleapis.com for the global location, and eu-automl.googleapis.com for the EU region.
  • project-id: your Google Cloud project ID.
  • location: the location for the resource: us-central1 for Global or eu for the European Union.
  • model-id: the ID of the model you want get the feature importance information for. For example, TBL543.

HTTP method and URL:

GET https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
"https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id" | Select-Object -Expand Content
The feature importance values for each column are returned in the TablesModelColumnInfo object.
{
  "name": "projects/292381/locations/us-central1/models/TBL543",
  "displayName": "Quickstart_Model",
  ...
  "tablesModelMetadata": {
    "targetColumnSpec": {
    ...
    },
    "inputFeatureColumnSpecs": [
    ...
    ],
    "optimizationObjective": "MAXIMIZE_AU_ROC",
    "tablesModelColumnInfo": [
      {
        "columnSpecName": "projects/292381/locations/us-central1/datasets/TBL543/tableSpecs/246/columnSpecs/331",
        "columnDisplayName": "Contact",
        "featureImportance": 0.093201876
      },
      {
        "columnSpecName": "projects/292381/locations/us-central1/datasets/TBL543/tableSpecs/246/columnSpecs/638",
        "columnDisplayName": "Month",
        "featureImportance": 0.215029223
      },
      ...
    ],
    "trainBudgetMilliNodeHours": "1000",
    "trainCostMilliNodeHours": "1000",
    "classificationType": "BINARY",
    "predictionSampleRows": [
    ...
    ],
    "splitPercentageConfig": {
    ...
    }
  },
  "creationState": "CREATED",
  "deployedModelSizeBytes": "1160941568"
}

Java

If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.


import com.google.cloud.automl.v1beta1.AutoMlClient;
import com.google.cloud.automl.v1beta1.Model;
import com.google.cloud.automl.v1beta1.ModelName;
import com.google.cloud.automl.v1beta1.TablesModelColumnInfo;
import io.grpc.StatusRuntimeException;
import java.io.IOException;
import java.text.DateFormat;
import java.text.SimpleDateFormat;

public class TablesGetModel {

  public static void main(String[] args) throws IOException, StatusRuntimeException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String region = "YOUR_REGION";
    String modelId = "YOUR_MODEL_ID";
    getModel(projectId, region, modelId);
  }

  // Demonstrates using the AutoML client to get model details.
  public static void getModel(String projectId, String computeRegion, String modelId)
      throws IOException, StatusRuntimeException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {

      // Get the full path of the model.
      ModelName modelFullId = ModelName.of(projectId, computeRegion, modelId);

      // Get complete detail of the model.
      Model model = client.getModel(modelFullId);

      // Display the model information.
      System.out.format("Model name: %s%n", model.getName());
      System.out.format(
          "Model Id: %s\n", model.getName().split("/")[model.getName().split("/").length - 1]);
      System.out.format("Model display name: %s%n", model.getDisplayName());
      System.out.format("Dataset Id: %s%n", model.getDatasetId());
      System.out.println("Tables Model Metadata: ");
      System.out.format(
          "\tTraining budget: %s%n", model.getTablesModelMetadata().getTrainBudgetMilliNodeHours());
      System.out.format(
          "\tTraining cost: %s%n", model.getTablesModelMetadata().getTrainBudgetMilliNodeHours());

      DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ");
      String createTime =
          dateFormat.format(new java.util.Date(model.getCreateTime().getSeconds() * 1000));
      System.out.format("Model create time: %s%n", createTime);

      System.out.format("Model deployment state: %s%n", model.getDeploymentState());

      // Get features of top importance
      for (TablesModelColumnInfo info :
          model.getTablesModelMetadata().getTablesModelColumnInfoList()) {
        System.out.format(
            "Column: %s - Importance: %.2f%n",
            info.getColumnDisplayName(), info.getFeatureImportance());
      }
    }
  }
}

Node.js

If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.

const automl = require('@google-cloud/automl');
const client = new automl.v1beta1.AutoMlClient();

/**
 * Demonstrates using the AutoML client to get model details.
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project";
// const computeRegion = '[REGION_NAME]' e.g., "us-central1";
// const modelId = '[MODEL_ID]' e.g., "TBL4704590352927948800";

// Get the full path of the model.
const modelFullId = client.modelPath(projectId, computeRegion, modelId);

// Get complete detail of the model.
client
  .getModel({name: modelFullId})
  .then(responses => {
    const model = responses[0];

    // Display the model information.
    console.log(`Model name: ${model.name}`);
    console.log(`Model Id: ${model.name.split('/').pop(-1)}`);
    console.log(`Model display name: ${model.displayName}`);
    console.log(`Dataset Id: ${model.datasetId}`);
    console.log('Tables model metadata: ');
    console.log(
      `\tTraining budget: ${model.tablesModelMetadata.trainBudgetMilliNodeHours}`
    );
    console.log(
      `\tTraining cost: ${model.tablesModelMetadata.trainCostMilliNodeHours}`
    );
    console.log(`Model deployment state: ${model.deploymentState}`);
  })
  .catch(err => {
    console.error(err);
  });

Python

The client library for AutoML Tables includes additional Python methods that simplify using the AutoML Tables API. These methods refer to datasets and models by name instead of id. Your dataset and model names must be unique. For more information, see the Client reference.

If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.

# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# model_display_name = 'MODEL_DISPLAY_NAME_HERE'

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

# Get complete detail of the model.
model = client.get_model(model_display_name=model_display_name)

# Retrieve deployment state.
if model.deployment_state == automl.Model.DeploymentState.DEPLOYED:
    deployment_state = "deployed"
else:
    deployment_state = "undeployed"

# get features of top importance
feat_list = [
    (column.feature_importance, column.column_display_name)
    for column in model.tables_model_metadata.tables_model_column_info
]
feat_list.sort(reverse=True)
if len(feat_list) < 10:
    feat_to_show = len(feat_list)
else:
    feat_to_show = 10

# Display the model information.
print(f"Model name: {model.name}")
print("Model id: {}".format(model.name.split("/")[-1]))
print(f"Model display name: {model.display_name}")
print("Features of top importance:")
for feat in feat_list[:feat_to_show]:
    print(feat)
print(f"Model create time: {model.create_time}")
print(f"Model deployment state: {deployment_state}")

Local feature importance

Local feature importance gives you visibility into how the individual features in a specific prediction request affected the resulting prediction.

To arrive at each local feature importance value, first the baseline prediction score is calculated. Baseline values are computed from the training data, using the median value for numeric features and the mode for categorical features. The prediction generated from the baseline values is the baseline prediction score.

For classification models, the local feature importance tells you how much each feature added to or subtracted from the probability assigned to the class with the highest score, as compared with the baseline prediction score. Score values are between 0.0 and 1.0, so local feature importance for classification models is always between -1.0 and 1.0 (inclusive).

For regression models, the local feature importance for a prediction tells you how much each feature added to or subtracted from the result as compared with the baseline prediction score.

Local feature importance is available for both online and batch predictions.

Getting local feature importance for online predictions

Console

To get local feature importance values for an online prediction using the Google Cloud console, follow the steps in Getting an online prediction, making sure you check the Generate feature importance checkbox.

AutoML Tables feature importance checkbox

REST

To get local feature importance for an online prediction request, you use the model.predict method, setting the feature_importance parameter to true.

Before using any of the request data, make the following replacements:

  • endpoint: automl.googleapis.com for the global location, and eu-automl.googleapis.com for the EU region.
  • project-id: your Google Cloud project ID.
  • location: the location for the resource: us-central1 for Global or eu for the European Union.
  • model-id: the ID of the model. For example, TBL543.
  • valueN: the values for each column, in the correct order.

HTTP method and URL:

POST https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:predict

Request JSON body:

{
  "payload": {
    "row": {
      "values": [
        value1, value2,...
      ]
    }
  }
  "params": {
    "feature_importance": "true"
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:predict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:predict" | Select-Object -Expand Content
The feature importance results are returned in the `tablesModelColumnInfo` object.
"tablesModelColumnInfo": [
  {
     "columnSpecName": "projects/2381/locations/us-central1/datasets/TBL8440/tableSpecs/766336/columnSpecs/4704",
     "columnDisplayName": "Promo",
     "featureImportance": 1626.5464
  },
  {
     "columnSpecName": "projects/2381/locations/us-central1/datasets/TBL8440/tableSpecs/766336/columnSpecs/6800",
     "columnDisplayName": "Open",
     "featureImportance": -7496.5405
  },
  {
     "columnSpecName": "projects/2381/locations/us-central1/datasets/TBL8440/tableSpecs/766336/columnSpecs/9824",
     "columnDisplayName": "StateHoliday"
  }
],

When a column has a feature importance value of 0, feature importance is not displayed for that column.

Java

If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.

import com.google.cloud.automl.v1beta1.AnnotationPayload;
import com.google.cloud.automl.v1beta1.ExamplePayload;
import com.google.cloud.automl.v1beta1.ModelName;
import com.google.cloud.automl.v1beta1.PredictRequest;
import com.google.cloud.automl.v1beta1.PredictResponse;
import com.google.cloud.automl.v1beta1.PredictionServiceClient;
import com.google.cloud.automl.v1beta1.Row;
import com.google.cloud.automl.v1beta1.TablesAnnotation;
import com.google.protobuf.Value;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

class TablesPredict {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String modelId = "YOUR_MODEL_ID";
    // Values should match the input expected by your model.
    List<Value> values = new ArrayList<>();
    // values.add(Value.newBuilder().setBoolValue(true).build());
    // values.add(Value.newBuilder().setNumberValue(10).build());
    // values.add(Value.newBuilder().setStringValue("YOUR_STRING").build());
    predict(projectId, modelId, values);
  }

  static void predict(String projectId, String modelId, List<Value> values) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (PredictionServiceClient client = PredictionServiceClient.create()) {
      // Get the full path of the model.
      ModelName name = ModelName.of(projectId, "us-central1", modelId);
      Row row = Row.newBuilder().addAllValues(values).build();
      ExamplePayload payload = ExamplePayload.newBuilder().setRow(row).build();

      // Feature importance gives you visibility into how the features in a specific prediction
      // request informed the resulting prediction. For more info, see:
      // https://cloud.google.com/automl-tables/docs/features#local
      PredictRequest request =
          PredictRequest.newBuilder()
              .setName(name.toString())
              .setPayload(payload)
              .putParams("feature_importance", "true")
              .build();

      PredictResponse response = client.predict(request);

      System.out.println("Prediction results:");
      for (AnnotationPayload annotationPayload : response.getPayloadList()) {
        TablesAnnotation tablesAnnotation = annotationPayload.getTables();
        System.out.format(
            "Classification label: %s%n", tablesAnnotation.getValue().getStringValue());
        System.out.format("Classification score: %.3f%n", tablesAnnotation.getScore());
        // Get features of top importance
        tablesAnnotation
            .getTablesModelColumnInfoList()
            .forEach(
                info ->
                    System.out.format(
                        "\tColumn: %s - Importance: %.2f%n",
                        info.getColumnDisplayName(), info.getFeatureImportance()));
      }
    }
  }
}

Node.js

If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.


/**
 * Demonstrates using the AutoML client to request prediction from
 * automl tables using csv.
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project";
// const computeRegion = '[REGION_NAME]' e.g., "us-central1";
// const modelId = '[MODEL_ID]' e.g., "TBL000000000000";
// const inputs = [{ numberValue: 1 }, { stringValue: 'value' }, { stringValue: 'value2' } ...]

const automl = require('@google-cloud/automl');

// Create client for prediction service.
const automlClient = new automl.v1beta1.PredictionServiceClient();

// Get the full path of the model.
const modelFullId = automlClient.modelPath(projectId, computeRegion, modelId);

async function predict() {
  // Set the payload by giving the row values.
  const payload = {
    row: {
      values: inputs,
    },
  };

  // Params is additional domain-specific parameters.
  // Currently there is no additional parameters supported.
  const [response] = await automlClient.predict({
    name: modelFullId,
    payload: payload,
    params: {feature_importance: true},
  });
  console.log('Prediction results:');

  for (const result of response.payload) {
    console.log(`Predicted class name: ${result.displayName}`);
    console.log(`Predicted class score: ${result.tables.score}`);

    // Get features of top importance
    const featureList = result.tables.tablesModelColumnInfo.map(
      columnInfo => {
        return {
          importance: columnInfo.featureImportance,
          displayName: columnInfo.columnDisplayName,
        };
      }
    );
    // Sort features by their importance, highest importance first
    featureList.sort((a, b) => {
      return b.importance - a.importance;
    });

    // Print top 10 important features
    console.log('Features of top importance');
    console.log(featureList.slice(0, 10));
  }
}
predict();

Python

The client library for AutoML Tables includes additional Python methods that simplify using the AutoML Tables API. These methods refer to datasets and models by name instead of id. Your dataset and model names must be unique. For more information, see the Client reference.

If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.

# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# model_display_name = 'MODEL_DISPLAY_NAME_HERE'
# inputs = {'value': 3, ...}

from google.cloud import automl_v1beta1 as automl

client = automl.TablesClient(project=project_id, region=compute_region)

if feature_importance:
    response = client.predict(
        model_display_name=model_display_name,
        inputs=inputs,
        feature_importance=True,
    )
else:
    response = client.predict(model_display_name=model_display_name, inputs=inputs)

print("Prediction results:")
for result in response.payload:
    print(f"Predicted class name: {result.tables.value}")
    print(f"Predicted class score: {result.tables.score}")

    if feature_importance:
        # get features of top importance
        feat_list = [
            (column.feature_importance, column.column_display_name)
            for column in result.tables.tables_model_column_info
        ]
        feat_list.sort(reverse=True)
        if len(feat_list) < 10:
            feat_to_show = len(feat_list)
        else:
            feat_to_show = 10

        print("Features of top importance:")
        for feat in feat_list[:feat_to_show]:
            print(feat)

Getting local feature importance for batch predictions

Console

To get local feature importance values for a batch prediction using the Google Cloud console, follow the steps in Requesting a batch prediction, making sure you check the Generate feature importance checkbox.

AutoML Tables feature importance checkbox

Feature importance is returned by adding a new column for every feature, named feature_importance.<feature_name>.

REST

To get local feature importance for a batch prediction request, you use the model.batchPredict method, setting the feature_importance parameter to true.

The following example uses BigQuery for the request data and the results; you use the same additional parameter for requests using Cloud Storage.

Before using any of the request data, make the following replacements:

  • endpoint: automl.googleapis.com for the global location, and eu-automl.googleapis.com for the EU region.
  • project-id: your Google Cloud project ID.
  • location: the location for the resource: us-central1 for Global or eu for the European Union.
  • model-id: the ID of the model. For example, TBL543.
  • dataset-id: the ID of the BigQuery dataset where the prediction data is located.
  • table-id: the ID of the BigQuery table where the prediction data is located.

    AutoML Tables creates a subfolder for the prediction results named prediction-<model_name>-<timestamp> in project-id.dataset-id.table-id.

HTTP method and URL:

POST https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict

Request JSON body:

{
  "inputConfig": {
    "bigquerySource": {
      "inputUri": "bq://project-id.dataset-id.table-id"
    },
  },
  "outputConfig": {
    "bigqueryDestination": {
      "outputUri": "bq://project-id"
    },
  },
  "params": {"feature_importance": "true"}
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://endpoint/v1beta1/projects/project-id/locations/location/models/model-id:batchPredict" | Select-Object -Expand Content
Batch prediction is a long-running operation. You can poll for the operation status or wait for the operation to return. Learn more.

Feature importance is returned by adding a new column for every feature, named feature_importance.<feature_name>.

Considerations for using local feature importance:

  • Local feature importance results are available only for models trained on or after 15 November, 2019.

  • Enabling local feature importance on a batch prediction request with more than 1,000,000 rows or 300 columns is not supported.

  • Each local feature importance value shows only how much the feature affected the prediction for that row. To understand the overall behavior of the model, use model feature importance.

  • Local feature importance values are always relative to the baseline value. Make sure you reference the baseline value when you are evaluating your local feature importance results. The baseline value is available only from the Google Cloud console.

  • The local feature importance values depend entirely on the model and data used to train the model. They can tell only the patterns the model found in the data, and can't detect any fundamental relationships in the data. So, the presence of a high feature importance for a certain feature does not demonstrate a relationship between that feature and the target; it merely shows that the model is using the feature in its predictions.

  • If a prediction includes data that falls completely outside of the range of the training data, local feature importance might not provide meaningful results.

  • Generating feature importance increases the time and compute resources required for your prediction. In addition, your request uses a different quota than for prediction requests without feature importance. Learn more.

  • Feature importance values alone do not tell you if your model is fair, unbiased, or of sound quality. You should carefully evaluate your training dataset, procedure, and evaluation metrics, in addition to feature importance.