Interpret prediction results from AutoML models

After requesting a prediction, Vertex AI returns results based on your model's objective. The following guide describes how to interpret results for each data type and objective.

For more information about getting predictions, see Get online predictions or Get batch predictions.

Image

Image data type objectives include classification and object detection.

Classification

AutoML image classification (single-label) prediction return a single label category and its corresponding confidence score. Multi-label classification predictions return multiple label categories and their corresponding confidence scores.

The confidence score communicates how strongly your model associates each class or label with a test item. The higher the number, the higher the model's confidence that the label should be applied to that item. You decide how high the confidence score must be for you to accept the model's results.

Score threshold slider

In the Cloud Console, Vertex AI provides a slider where you can adjust the confidence threshold for all classes or labels, or an individual class or label. The slider is available on a model's detail page in the Evaluate tab. The confidence threshold is the confidence level that the model must have for it to assign a class or label to a test item. As you adjust the threshold, you can see how your model's precision and recall changes. Higher thresholds typically increase precision but lower recall.

Example batch prediction output

Batch AutoML image classification prediction output are stored as JSONL files in Cloud Storage buckets. Each line of the JSONL file contains all annotation (label) categories and their corresponding confidence scores for a single image file.

{
  "instance": {"content": "gs://bucket/image.jpg", "mimeType": "image/jpeg"},
  "prediction": {
    "ids": [1, 2],
    "displayNames": ["cat", "dog"],
    "confidences": [0.7, 0.5]
  }
}

Object detection

AutoML image object detection prediction responses return all objects found in an image. Each found object has an annotation (label and normalized bounding box) with a corresponding confidence score.

Example batch prediction output

Batch AutoML image object detection prediction responses are stored as JSONL files in Cloud Storage buckets. Each line of the JSONL file contains all objects found in a single image file. Each found object has an annotation (label and normalized bounding box) with a corresponding confidence score.

{
  "instance": {"content": "gs://bucket/image.jpg", "mimeType": "image/jpeg"},
  "prediction": {
    "ids": [1, 2],
    "displayNames": ["cat", "dog"],
    "bboxes":  [
      [0.1, 0.2, 0.3, 0.4],
      [0.2, 0.3, 0.4, 0.5]
    ],
    "confidences": [0.7, 0.5]
  }
}

Tabular

Tabular data type objectives include classification and regression.

Classification

The confidence score communicates how strongly your model associates each class or label with a test item. The higher the number, the higher the model's confidence that the label should be applied to that item. You decide how high the confidence score must be for you to accept the model's results.

Confidence threshold slider

In the Cloud Console, Vertex AI provides a slider where you can adjust the confidence threshold for all classes or labels, or an individual class or label. The slider is available on a model's detail page in the Evaluate tab. The confidence threshold is the confidence level that the model must have for it to assign a class or label to a test item. As you adjust the threshold, you can see how your model's precision and recall changes. Higher thresholds typically increase precision but lower recall.

Local feature importance

Local feature importance, also called feature attributions, enables you to determine how strongly each feature impacted a specific prediction. Local feature importance is part of Vertex Explainable AI.

To calculate local feature importance, first the baseline prediction score is calculated. Baseline values are computed from the training data, using the median value for numeric features and the mode for categorical features. The prediction generated from the baseline values is the baseline prediction score. Baseline values are calculated once for a model and do not change.

For a specific prediction, the local feature importance for each feature tells you how much that feature added to or subtracted from the probability assigned to the class with the highest score for that prediction, as compared with the baseline prediction score. The sum of all of the feature importance values equals the difference between the baseline prediction score and the prediction score for the highest probability score.

For classification models, the score is always between 0.0 and 1.0, inclusive. Therefore, local feature importance values for classification models are always between -1.0 and 1.0 (inclusive).

You can use feature importance to make sure the model is using the prediction data in a way that makes sense for your data and business problem. If you requested a prediction without feature importance and the result did not make sense, you can use the deployedModelId field from the prediction and request explanations for the same data and the same model. Learn more.

Example output

The return payload for an online prediction from a tabular classification model with feature importance looks similar to the following example.

The instanceOutputValue of 0.928652400970459 is the confidence score of the highest-scoring class, in this case class_a. The baselineOutputValue field contains the baseline prediction score, 0.808652400970459. The feature that contributed most strongly to this result was feature_3.

{
 "predictions": [
    {
      "scores": [
        0.928652400970459,
        0.071347599029541
      ],
      "classes": [
        "class_a",
        "class_b"
      ]
    }
  ]
  "explanations": [
    {
      "attributions": [
        {
          "baselineOutputValue": 0.808652400970459,
          "instanceOutputValue": 0.928652400970459,
          "approximationError":  0.0058915703929231,
          "featureAttributions": {
            "feature_1": 0.012394922231235,
            "feature_2": 0.050212341234556,
            "feature_3": 0.057392736534209,
          },
          "outputIndex": [
            0
          ],
          "outputName": "scores"
        }
      ],
    }
  ]
  "deployedModelId": "234567"
}

Forecasting

Forecasting models return a sequence of prediction values for each time series. The number of prediction values depends on your prediction input and forecast horizon. For example, if your input included 14 null entries for the target column (such as sales for the next 14 days), your prediction request returns 14 values, the sales number for each day. If your prediction request exceeds the model's forecast horizon, Vertex AI returns only predictions up to the forecast horizon.

Example output

The following example is batch prediction output for a quantile-loss optimized model. The output was sent to a BigQuery table. In this scenario, the forecasting model predicted sales for the next 14 days for each store.

The prediction values are in the predicted_Sales.quantile_predictions column, which is an array of sales values. These values map to the quantile values in the predicted_Sales.quantile_values, so, for example, the first value in each column are related.

In this example, the model predicted values at the 0.1, 0.5, and 0.9 quantiles, which are set when the model was trained. By using quantile predictions, you can compare values for under-forecasting (lower quantile) and over-forecasting (higher quantile).

Sample batch
  prediction output for forecasting model

Regression

Regression models return a prediction value, and for BigQuery destinations, a prediction interval. The prediction interval provides a range of values that the model has 95% confidence contain the actual result.

Local feature importance

Local feature importance, sometimes also called feature attributions, enables you to determine how strongly each feature impacted a specific prediction. Local feature importance is part of Vertex Explainable AI.

To calculate local feature importance, first the baseline prediction score is calculated. Baseline values are computed from the training data, using the median value for numeric features and the mode for categorical features. The prediction generated from the baseline values is the baseline prediction score. Baseline values are calculated once for a model and do not change.

For a specific prediction, the local feature importance for each feature tells you how much that feature added to or subtracted from the result as compared with the baseline prediction score. The sum of all of the feature importance values equals the difference between the baseline prediction score and the prediction result.

You can use feature importance to make sure the model is using the prediction data in a way that makes sense for your data and business problem. If you requested a prediction without feature importance and the result did not make sense, you can use the deployedModelId field from the prediction and request explanations for the same data and the same model. Learn more.

Example output

The return payload for an online prediction with feature importance from a tabular regression model looks similar to the following example.

The instanceOutputValue of 1795.1246466281819 is the predicted value, with the lower_bound and upper_bound fields providing the 95% confidence interval. The baselineOutputValue field contains the baseline prediction score, 1788.7423095703125. The feature that contributed most strongly to this result was feature_3.

{
  "predictions": [
    {
      "value": 1795.1246466281819,
      "lower_bound": 246.32196807861328,
      "upper_bound": 8677.51904296875
    }
  ]
  "explanations": [
    {
      "attributions": [
        {
          "baselineOutputValue": 1788.7423095703125,
          "instanceOutputValue": 1795.1246466281819,
          "approximationError": 0.0038215703911553,
          "featureAttributions": {
            "feature_1": 0.123949222312359,
            "feature_2": 0.802123412345569,
            "feature_3": 5.456264423211472,
          },
          "outputIndex": [
            -1
          ]
        }
      ]
    }
  ],
  "deployedModelId": "345678"
}

Text

Text data type objectives include classification, entity extraction, and sentiment analysis.

Classification

Predictions from multi-label classification models return one or more labels for each document and a confidence score for each label. For single-label classification models, predictions return only one label and confidence score per document.

The confidence score communicates how strongly your model associates each class or label with a test item. The higher the number, the higher the model's confidence that the label should be applied to that item. You decide how high the confidence score must be for you to accept the model's results.

Score threshold slider

In the Cloud Console, Vertex AI provides a slider where you can adjust the confidence threshold for all classes or labels, or an individual class or label. The slider is available on a model's detail page in the Evaluate tab. The confidence threshold is the confidence level that the model must have for it to assign a class or label to a test item. As you adjust the threshold, you can see how your model's precision and recall changes. Higher thresholds typically increase precision but lower recall.

Example batch prediction output

The following sample is the predicted result for a multi-label classification model. The model applied the GreatService, Suggestion, and InfoRequest labels to the submitted document. The confidence values apply to each of the labels in order. In this example, the model predicted GreatService as the most relevant label.

{
  "instance": {"content": "gs://bucket/text.txt", "mimeType": "text/plain"},
  "predictions": [
    {
      "ids": [
        "1234567890123456789",
        "2234567890123456789",
        "3234567890123456789"
      ],
      "displayNames": [
        "GreatService",
        "Suggestion",
        "InfoRequest"
      ],
      "confidences": [
        0.8986392080783844,
        0.81984345316886902,
        0.7722353458404541
      ]
    }
  ]
}

Entity extraction

Predictions from entity extraction models return annotations for each document, such as the location of detected entities, the assigned labels, and confidence scores.

The confidence communicates how confident your model accurately identified and labeled each entity. The higher the number, the higher the model's confidence in the correctness of the prediction.

Example batch prediction output

The following sample is the predicted result for an entity extraction model that was trained to detect diseases. The offsets (start and end character offsets) specify the location where the model detected an entity in the document, and the content field shows the detected entity.

The display names show the labels that the model associated with each entity, such as SpecificDisease or DiseaseClass. The labels map to the text segments in order.

{
  "key": 1,
  "predictions": {
    "ids": [
      "1234567890123456789",
      "2234567890123456789",
      "3234567890123456789"
    ],
    "displayNames": [
      "SpecificDisease",
      "DiseaseClass",
      "SpecificDisease"
    ],
    "textSegmentStartOffsets":  [13, 40, 57],
    "textSegmentEndOffsets": [29, 51, 75],
    "confidences": [
      0.99959725141525269,
      0.99912621492484128,
      0.99935531616210938
    ]
  }
}

Sentiment analysis

Predictions from sentiment analysis models return the overall sentiment for a document. The sentiment is represented by an integer from 0 to the model's max sentiment score, which can be equal to or less than 10. The maximum sentiment value for a model is set during training. For example, if a model was trained on a dataset with a maximum sentiment score of 2, predicted sentiment scores can be 0 (negative), 1 (neutral), or 2 (positive).

Example batch prediction output

The following sample is the predicted result for a single document. Because the model's maximum sentiment score is 8, the predicted sentiment for this sample is clearly positive.

{
  "instance": {"content": "gs://bucket/text.txt", "mimeType": "text/plain"},
  "prediction": {"sentiment": 8}
}

Video

Video data type objectives include classification and object tracking.

Action recognition

Predictions from an action recognition model return moments of actions, according to your own defined labels. The model assigns a confidence score to each prediction, which communicates how confident your model accurately identified an action. The higher the number - the higher the model's confidence is of the correctness of the prediction.

Example batch prediction output

The following sample is the predicted result for a model that identifies the "swing" and "jump" actions in a video. Each result includes a label ("swing" or "jump") for the identified action, a time segment with the same start and end time that specifies the moment of the action, and a confidence score.

{
  "instance": {
   "content": "gs://bucket/video.mp4",
    "mimeType": "video/mp4",
    "timeSegmentStart": "1s",
    "timeSegmentEnd": "5s"
  }
  "prediction": [{
    "id": "1",
    "displayName": "swing",
    "timeSegmentStart": "1.2s",
    "timeSegmentEnd": "1.2s",
    "confidence": 0.7
  }, {
    "id": "2",
    "displayName": "jump",
    "timeSegmentStart": "3.4s",
    "timeSegmentEnd": "3.4s",
    "confidence": 0.5
  }]
}

Classification

Predictions from a classification model return shots and segments in your videos that have been classified according to your own defined labels. Each prediction is assigned a confidence score.

The confidence score communicates how strongly your model associates each class or label with a test item. The higher the number, the higher the model's confidence that the label should be applied to that item. You decide how high the confidence score must be for you to accept the model's results.

Score threshold slider

In the Cloud Console, Vertex AI provides a slider where you can adjust the confidence threshold for all classes or labels, or an individual class or label. The slider is available on a model's detail page in the Evaluate tab. The confidence threshold is the confidence level that the model must have for it to assign a class or label to a test item. As you adjust the threshold, you can see how your model's precision and recall changes. Higher thresholds typically increase precision but lower recall.

Example batch prediction output

The following sample is the predicted result for a model that identifies cats and dogs in a video. The result includes segment, shot, and one-second interval classifications.

{
  "instance": {
   "content": "gs://bucket/video.mp4",
    "mimeType": "video/mp4",
    "timeSegmentStart": "1s",
    "timeSegmentEnd": "5s"
  }
  "prediction": [{
    "id": "1",
    "displayName": "cat",
    "type": "segment-classification",
    "timeSegmentStart": "1s",
    "timeSegmentEnd": "5s",
    "confidence": 0.7
  }, {
    "id": "1",
    "displayName": "cat",
    "type": "shot-classification",
    "timeSegmentStart": "1s",
    "timeSegmentEnd": "4s",
    "confidence": 0.9
  }, {
    "id": "2",
    "displayName": "dog",
    "type": "shot-classification",
    "timeSegmentStart": "4s",
    "timeSegmentEnd": "5s",
    "confidence": 0.6
  }, {
    "id": "1",
    "displayName": "cat",
    "type": "one-sec-interval-classification",
    "timeSegmentStart": "1s",
    "timeSegmentEnd": "1s",
    "confidence": 0.95
  }, {
    "id": "1",
    "displayName": "cat",
    "type": "one-sec-interval-classification",
    "timeSegmentStart": "2s",
    "timeSegmentEnd": "2s",
    "confidence": 0.9
  }, {
    "id": "1",
    "displayName": "cat",
    "type": "one-sec-interval-classification",
    "timeSegmentStart": "3s",
    "timeSegmentEnd": "3s",
    "confidence": 0.85
  }, {
    "id": "2",
    "displayName": "dog",
    "type": "one-sec-interval-classification",
    "timeSegmentStart": "4s",
    "timeSegmentEnd": "4s",
    "confidence": 0.6
  }]
}

Object tracking

Predictions from an object tracking model return time and locations of objects to track, according to your own defined labels. The model assigns a confidence score to each prediction, which communicates how confident your model accurately identified and tracked an object. The higher the number, the higher the model's confidence in the correctness of the prediction.

Example batch prediction output

The following sample is the predicted result for a model that tracks cats and dogs in a video. Each result includes a label (cat or dog) for the object being tracked, a time segment that specifies when and for how long the object is being tracked, and a bounding box that describes the location of the object.

{
  "instance": {
   "content": "gs://bucket/video.mp4",
    "mimeType": "video/mp4",
    "timeSegmentStart": "1s",
    "timeSegmentEnd": "5s"
  }
  "prediction": [{
    "id": "1",
    "displayName": "cat",
    "timeSegmentStart": "1.2s",
    "timeSegmentEnd": "3.4s",
    "frames": [{
      "timeOffset": "1.2s",
      "xMin": 0.1,
      "xMax": 0.2,
      "yMin": 0.3,
      "yMax": 0.4
    }, {
      "timeOffset": "3.4s",
      "xMin": 0.2,
      "xMax": 0.3,
      "yMin": 0.4,
      "yMax": 0.5,
    }],
    "confidence": 0.7
  }, {
    "id": "1",
    "displayName": "cat",
    "timeSegmentStart": "4.8s",
    "timeSegmentEnd": "4.8s",
    "frames": [{
      "timeOffset": "4.8s",
      "xMin": 0.2,
      "xMax": 0.3,
      "yMin": 0.4,
      "yMax": 0.5,
    }],
    "confidence": 0.6
  }, {
    "id": "2",
    "displayName": "dog",
    "timeSegmentStart": "1.2s",
    "timeSegmentEnd": "3.4s",
    "frames": [{
      "timeOffset": "1.2s",
      "xMin": 0.1,
      "xMax": 0.2,
      "yMin": 0.3,
      "yMax": 0.4
    }, {
      "timeOffset": "3.4s",
      "xMin": 0.2,
      "xMax": 0.3,
      "yMin": 0.4,
      "yMax": 0.5,
    }],
    "confidence": 0.5
  }]
}