Evaluating models

After training a model, AutoML Vision Object Detection uses images from the TEST set to evaluate the quality and accuracy of the new model.

AutoML Vision Object Detection provides an aggregate set of evaluation metrics (evaluation process outputs) indicating how well the model performs overall, as well as evaluation metrics for each category label, indicating how well the model performs for that label.

Evaluation overview

Evaluation process inputs

  • IoU threshold : Intersection over Union, a value used in object detection to measure the overlap of a predicted versus actual bounding box for an object. The closer the predicted bounding box values are to the actual bounding box values the greater the intersection, and the greater the IoU value.

  • Score threshold: Output metrics (below) are computed with an assumption that the model never returns predictions with score lower than this value.

Evaluation process output

  • AuPRC : Area under Precision/Recall curve, also referred to as "average precision." Generally between 0.5 and 1.0. Higher values indicate more accurate models.

  • Confidence threshold curves: show how different confidence thresholds would affect precision, recall, true and false positive rates. Read about the relationship of precision and recall.

  • F1 score: The harmonic mean of precision and recall. F1 is a useful metric if you want to find a balance between precision and recall. F1 is also useful when you have an uneven class distribution in your training data.

Use this data to evaluate your model's readiness. High confusion, low AUC scores, or low precision and recall scores can indicate that your model needs additional training data or has inconsistent labels. A very high AUC score and perfect precision and recall can indicate that the data is too "easy" and may not generalize well: a high AUC may indicate that the model was trained on idealized data that would not represent future inferences well.

Managing model evaluations

List model evaluations

Once you have trained a model, you can list evaluation metrics for that model.

Web UI

  1. Open the AutoML Vision Object Detection UI and click the Models tab (with lightbulb icon) in the left navigation bar to display the available models.

    To view the models for a different project, select the project from the drop-down list in the upper right of the title bar.

  2. Click the row for the model you want to evaluate.

  3. If necessary, click the Evaluate tab just below the title bar.

    If training has been completed for the model, AutoML Vision Object Detection shows its evaluation metrics.

    Model evaluation page

  4. To view the metrics for a specific label, select the label name from the list of labels in the lower part of the page.

    Model evaluation page specific label

REST

Before using any of the request data, make the following replacements:

  • project-id: your GCP project ID.
  • model-id: the ID of your model, from the response when you created the model. The ID is the last element of the name of your model. For example:
    • model name: projects/project-id/locations/location-id/models/IOD4412217016962778756
    • model id: IOD4412217016962778756

HTTP method and URL:

GET https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations" | Select-Object -Expand Content

You should receive a JSON response similar to the following sample. Key object detection specific fields are in bold, and varying amounts of boundingBoxMetricsEntries entries are shown for clarity:

{
  "modelEvaluation": [
    {
      "name": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations/MODEL_EVAL_ID",
      "annotationSpecId": "6342510834593300480",
      "createTime": "2019-07-26T22:28:56.890727Z",
      "evaluatedExampleCount": 18,
      "imageObjectDetectionEvaluationMetrics": {
        "evaluatedBoundingBoxCount": 96,
        "boundingBoxMetricsEntries": [
          {
            "iouThreshold": 0.15,
            "meanAveragePrecision": 0.6317751,
            "confidenceMetricsEntries": [
              {
                "confidenceThreshold": 0.101631254,
                "recall": 0.84375,
                "precision": 0.2555205,
                "f1Score": 0.3922518
              },
              {
                "confidenceThreshold": 0.10180253,
                "recall": 0.8333333,
                "precision": 0.25316456,
                "f1Score": 0.3883495
              },
              ...
              {
                "confidenceThreshold": 0.8791167,
                "recall": 0.020833334,
                "precision": 1,
                "f1Score": 0.040816326
              },
              {
                "confidenceThreshold": 0.8804436,
                "recall": 0.010416667,
                "precision": 1,
                "f1Score": 0.020618558
              }
            ]
          },
          {
            "iouThreshold": 0.8,
            "meanAveragePrecision": 0.15461995,
            "confidenceMetricsEntries": [
              {
                "confidenceThreshold": 0.101631254,
                "recall": 0.22916667,
                "precision": 0.06940063,
                "f1Score": 0.10653753
              },
              ...
              {
                "confidenceThreshold": 0.8804436,
                "recall": 0.010416667,
                "precision": 1,
                "f1Score": 0.020618558
              }
            ]
          },
          {
            "iouThreshold": 0.4,
            "meanAveragePrecision": 0.56170964,
            "confidenceMetricsEntries": [
              {
                "confidenceThreshold": 0.101631254,
                "recall": 0.7604167,
                "precision": 0.23028392,
                "f1Score": 0.3535109
              },
              ...
              {
                "confidenceThreshold": 0.8804436,
                "recall": 0.010416667,
                "precision": 1,
                "f1Score": 0.020618558
              }
            ]
          },
          ...
        ],
        "boundingBoxMeanAveragePrecision": 0.4306387
      },
      "displayName": "Tomato"
    },
    {
      "name": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations/MODEL_EVAL_ID",
      "annotationSpecId": "1730824816165912576",
      "createTime": "2019-07-26T22:28:56.890727Z",
      "evaluatedExampleCount": 9,
      "imageObjectDetectionEvaluationMetrics": {
        "evaluatedBoundingBoxCount": 51,
        "boundingBoxMetricsEntries": [
          {
            ...
          }
        ],
        "boundingBoxMeanAveragePrecision": 0.29565892
      },
      "displayName": "Cheese"
    },
    {
      "name": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations/MODEL_EVAL_ID",
      "annotationSpecId": "7495432339200147456",
      "createTime": "2019-07-26T22:28:56.890727Z",
      "evaluatedExampleCount": 4,
      "imageObjectDetectionEvaluationMetrics": {
        "evaluatedBoundingBoxCount": 22,
        "boundingBoxMetricsEntries": [
          {
            "iouThreshold": 0.2,
            "meanAveragePrecision": 0.104004614,
            "confidenceMetricsEntries": [
              {
                "confidenceThreshold": 0.1008248,
                "recall": 0.36363637,
                "precision": 0.08888889,
                "f1Score": 0.14285715
              },
              ...
              {
                "confidenceThreshold": 0.47585258,
                "recall": 0.045454547,
                "precision": 1,
                "f1Score": 0.08695653
              }
            ]
          },
          ...
        ],
        "boundingBoxMeanAveragePrecision": 0.057070773
      },
      "displayName": "Seafood"
    }
  ]
}

Go

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
	"google.golang.org/api/iterator"
)

// listModelEvaluation lists existing model evaluations.
func listModelEvaluations(w io.Writer, projectID string, location string, modelID string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// modelID := "TRL123456789..."

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.ListModelEvaluationsRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s/models/%s", projectID, location, modelID),
	}

	it := client.ListModelEvaluations(ctx, req)

	// Iterate over all results
	for {
		evaluation, err := it.Next()
		if err == iterator.Done {
			break
		}
		if err != nil {
			return fmt.Errorf("ListModelEvaluations.Next: %w", err)
		}

		fmt.Fprintf(w, "Model evaluation name: %v\n", evaluation.GetName())
		fmt.Fprintf(w, "Model annotation spec id: %v\n", evaluation.GetAnnotationSpecId())
		fmt.Fprintf(w, "Create Time:\n")
		fmt.Fprintf(w, "\tseconds: %v\n", evaluation.GetCreateTime().GetSeconds())
		fmt.Fprintf(w, "\tnanos: %v\n", evaluation.GetCreateTime().GetNanos())
		fmt.Fprintf(w, "Evaluation example count: %v\n", evaluation.GetEvaluatedExampleCount())
		fmt.Fprintf(w, "Object detection model evaluation metrics: %v\n", evaluation.GetImageObjectDetectionEvaluationMetrics())
	}

	return nil
}

Java

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.


import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.ListModelEvaluationsRequest;
import com.google.cloud.automl.v1.ModelEvaluation;
import com.google.cloud.automl.v1.ModelName;
import java.io.IOException;

class ListModelEvaluations {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String modelId = "YOUR_MODEL_ID";
    listModelEvaluations(projectId, modelId);
  }

  // List model evaluations
  static void listModelEvaluations(String projectId, String modelId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the full path of the model.
      ModelName modelFullId = ModelName.of(projectId, "us-central1", modelId);
      ListModelEvaluationsRequest modelEvaluationsrequest =
          ListModelEvaluationsRequest.newBuilder().setParent(modelFullId.toString()).build();

      // List all the model evaluations in the model by applying filter.
      System.out.println("List of model evaluations:");
      for (ModelEvaluation modelEvaluation :
          client.listModelEvaluations(modelEvaluationsrequest).iterateAll()) {

        System.out.format("Model Evaluation Name: %s\n", modelEvaluation.getName());
        System.out.format("Model Annotation Spec Id: %s", modelEvaluation.getAnnotationSpecId());
        System.out.println("Create Time:");
        System.out.format("\tseconds: %s\n", modelEvaluation.getCreateTime().getSeconds());
        System.out.format("\tnanos: %s", modelEvaluation.getCreateTime().getNanos() / 1e9);
        System.out.format(
            "Evalution Example Count: %d\n", modelEvaluation.getEvaluatedExampleCount());
        System.out.format(
            "Object Detection Model Evaluation Metrics: %s\n",
            modelEvaluation.getImageObjectDetectionEvaluationMetrics());
      }
    }
  }
}

Node.js

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const modelId = 'YOUR_MODEL_ID';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function listModelEvaluations() {
  // Construct request
  const request = {
    parent: client.modelPath(projectId, location, modelId),
    filter: '',
  };

  const [response] = await client.listModelEvaluations(request);

  console.log('List of model evaluations:');
  for (const evaluation of response) {
    console.log(`Model evaluation name: ${evaluation.name}`);
    console.log(`Model annotation spec id: ${evaluation.annotationSpecId}`);
    console.log(`Model display name: ${evaluation.displayName}`);
    console.log('Model create time');
    console.log(`\tseconds ${evaluation.createTime.seconds}`);
    console.log(`\tnanos ${evaluation.createTime.nanos / 1e9}`);
    console.log(
      `Evaluation example count: ${evaluation.evaluatedExampleCount}`
    );
    console.log(
      `Object detection model evaluation metrics: ${evaluation.imageObjectDetectionEvaluationMetrics}`
    );
  }
}

listModelEvaluations();

Python

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# model_id = "YOUR_MODEL_ID"

client = automl.AutoMlClient()
# Get the full path of the model.
model_full_id = client.model_path(project_id, "us-central1", model_id)

print("List of model evaluations:")
for evaluation in client.list_model_evaluations(parent=model_full_id, filter=""):
    print(f"Model evaluation name: {evaluation.name}")
    print(f"Model annotation spec id: {evaluation.annotation_spec_id}")
    print(f"Create Time: {evaluation.create_time}")
    print(f"Evaluation example count: {evaluation.evaluated_example_count}")
    print(
        "Object detection model evaluation metrics: {}\n\n".format(
            evaluation.image_object_detection_evaluation_metrics
        )
    )

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the AutoML Vision Object Detection reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the AutoML Vision Object Detection reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the AutoML Vision Object Detection reference documentation for Ruby.

Get model evaluations

You can also get a specific model evaluation for a label (displayName) using an evaluation ID.

Web UI

In the AutoML Vision Object Detection UI the equivalent operation is available by going to the Models page, and then selecting your model. After selecting your model, go to the Evaluate tab and select the label to view label-specific evaluations.

Model evaluation page specific label

REST

Before using any of the request data, make the following replacements:

  • project-id: your GCP project ID.
  • model-id: the ID of your model, from the response when you created the model. The ID is the last element of the name of your model. For example:
    • model name: projects/project-id/locations/location-id/models/IOD4412217016962778756
    • model id: IOD4412217016962778756
  • model-evaluation-id: the ID value of the model evaluation. You can get model evaluation IDs from the list model evaluations operation.

HTTP method and URL:

GET https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations/MODEL_EVALUATION_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations/MODEL_EVALUATION_ID"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/models/MODEL_ID/modelEvaluations/MODEL_EVALUATION_ID" | Select-Object -Expand Content

You should receive a JSON response similar to the following sample. Key object detection specific fields are in bold, and a shortened version of boundingBoxMetricsEntries entries are shown for clarity:

{
  "name": "projects/PROJECT_ID/locations/us-
  central1/models/MODEL_ID/modelEvaluations/MODEL-EVALUATION-
  ID",
  "annotationSpecId": "6342510834593300480",
  "createTime": "2019-07-26T22:28:56.890727Z",
  "evaluatedExampleCount": 18,
  "imageObjectDetectionEvaluationMetrics": {
    "evaluatedBoundingBoxCount": 96,
    "boundingBoxMetricsEntries": [
      {
        "iouThreshold": 0.15,
        "meanAveragePrecision": 0.6317751,
        "confidenceMetricsEntries": [
          {
            "confidenceThreshold": 0.101631254,
            "recall": 0.84375,
            "precision": 0.2555205,
            "f1Score": 0.3922518
          },
          ...
          {
            "confidenceThreshold": 0.8804436,
            "recall": 0.010416667,
            "precision": 1,
            "f1Score": 0.020618558
          }
        ]
      },
      {
        "iouThreshold": 0.8,
        "meanAveragePrecision": 0.15461995,
        "confidenceMetricsEntries": [
          {
            "confidenceThreshold": 0.101631254,
            "recall": 0.22916667,
            "precision": 0.06940063,
            "f1Score": 0.10653753
          },
          ...
          {
            "confidenceThreshold": 0.8804436,
            "recall": 0.010416667,
            "precision": 1,
            "f1Score": 0.020618558
          }
        ]
      },
      {
        "iouThreshold": 0.4,
        "meanAveragePrecision": 0.56170964,
        "confidenceMetricsEntries": [
          {
            "confidenceThreshold": 0.101631254,
            "recall": 0.7604167,
            "precision": 0.23028392,
            "f1Score": 0.3535109
          },
          ...
          {
            "confidenceThreshold": 0.8804436,
            "recall": 0.010416667,
            "precision": 1,
            "f1Score": 0.020618558
          }
        ]
      },
      ...
    ],
    "boundingBoxMeanAveragePrecision": 0.4306387
  },
  "displayName": "Tomato"
}

Go

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// getModelEvaluation gets a model evaluation.
func getModelEvaluation(w io.Writer, projectID string, location string, modelID string, modelEvaluationID string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// modelID := "TRL123456789..."
	// modelEvaluationID := "123456789..."

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.GetModelEvaluationRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/models/%s/modelEvaluations/%s", projectID, location, modelID, modelEvaluationID),
	}

	evaluation, err := client.GetModelEvaluation(ctx, req)
	if err != nil {
		return fmt.Errorf("GetModelEvaluation: %w", err)
	}

	fmt.Fprintf(w, "Model evaluation name: %v\n", evaluation.GetName())
	fmt.Fprintf(w, "Model annotation spec id: %v\n", evaluation.GetAnnotationSpecId())
	fmt.Fprintf(w, "Create Time:\n")
	fmt.Fprintf(w, "\tseconds: %v\n", evaluation.GetCreateTime().GetSeconds())
	fmt.Fprintf(w, "\tnanos: %v\n", evaluation.GetCreateTime().GetNanos())
	fmt.Fprintf(w, "Evaluation example count: %v\n", evaluation.GetEvaluatedExampleCount())
	fmt.Fprintf(w, "Object detection model evaluation metrics: %v\n", evaluation.GetImageObjectDetectionEvaluationMetrics())

	return nil
}

Java

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.


import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.ModelEvaluation;
import com.google.cloud.automl.v1.ModelEvaluationName;
import java.io.IOException;

class GetModelEvaluation {

  static void getModelEvaluation() throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String modelId = "YOUR_MODEL_ID";
    String modelEvaluationId = "YOUR_MODEL_EVALUATION_ID";
    getModelEvaluation(projectId, modelId, modelEvaluationId);
  }

  // Get a model evaluation
  static void getModelEvaluation(String projectId, String modelId, String modelEvaluationId)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the full path of the model evaluation.
      ModelEvaluationName modelEvaluationFullId =
          ModelEvaluationName.of(projectId, "us-central1", modelId, modelEvaluationId);

      // Get complete detail of the model evaluation.
      ModelEvaluation modelEvaluation = client.getModelEvaluation(modelEvaluationFullId);

      System.out.format("Model Evaluation Name: %s\n", modelEvaluation.getName());
      System.out.format("Model Annotation Spec Id: %s", modelEvaluation.getAnnotationSpecId());
      System.out.println("Create Time:");
      System.out.format("\tseconds: %s\n", modelEvaluation.getCreateTime().getSeconds());
      System.out.format("\tnanos: %s", modelEvaluation.getCreateTime().getNanos() / 1e9);
      System.out.format(
          "Evalution Example Count: %d\n", modelEvaluation.getEvaluatedExampleCount());
      System.out.format(
          "Object Detection Model Evaluation Metrics: %s\n",
          modelEvaluation.getImageObjectDetectionEvaluationMetrics());
    }
  }
}

Node.js

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const modelId = 'YOUR_MODEL_ID';
// const modelEvaluationId = 'YOUR_MODEL_EVALUATION_ID';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function getModelEvaluation() {
  // Construct request
  const request = {
    name: client.modelEvaluationPath(
      projectId,
      location,
      modelId,
      modelEvaluationId
    ),
  };

  const [response] = await client.getModelEvaluation(request);

  console.log(`Model evaluation name: ${response.name}`);
  console.log(`Model annotation spec id: ${response.annotationSpecId}`);
  console.log(`Model display name: ${response.displayName}`);
  console.log('Model create time');
  console.log(`\tseconds ${response.createTime.seconds}`);
  console.log(`\tnanos ${response.createTime.nanos / 1e9}`);
  console.log(`Evaluation example count: ${response.evaluatedExampleCount}`);
  console.log(
    `Object detection model evaluation metrics: ${response.imageObjectDetectionEvaluationMetrics}`
  );
}

getModelEvaluation();

Python

Before trying this sample, follow the setup instructions for this language on the Client Libraries page.

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# model_id = "YOUR_MODEL_ID"
# model_evaluation_id = "YOUR_MODEL_EVALUATION_ID"

client = automl.AutoMlClient()
# Get the full path of the model evaluation.
model_path = client.model_path(project_id, "us-central1", model_id)
model_evaluation_full_id = f"{model_path}/modelEvaluations/{model_evaluation_id}"

# Get complete detail of the model evaluation.
response = client.get_model_evaluation(name=model_evaluation_full_id)

print(f"Model evaluation name: {response.name}")
print(f"Model annotation spec id: {response.annotation_spec_id}")
print(f"Create Time: {response.create_time}")
print(f"Evaluation example count: {response.evaluated_example_count}")
print(
    "Object detection model evaluation metrics: {}".format(
        response.image_object_detection_evaluation_metrics
    )
)

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the AutoML Vision Object Detection reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the AutoML Vision Object Detection reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the AutoML Vision Object Detection reference documentation for Ruby.

True Positives, False Negatives, and False Positives (UI only)

In the user interface you can observe specific examples of model performance, namely true positive (TP), false negative (FN), and false positive (FP) instances from your TRAINING and VALIDATION sets.

Web UI

You can access the TP, FN, and FP view in the UI by selecting the Evaluate tab, and then selecting any specific label.

By viewing trends in these predictions, you can modify your training set to improve model performance.

True positive images are validation boxes provided to the trained model that the model correctly annotated:

true positives shown

False negative images are similarly provided to the trained model, but the model failed to correctly annotate an instance of an object:

false negatives shown

Lastly, false positive images are those provided to the trained model that annotated instances of an object that were not annotated in the area specified:

false positives shown

The model is selecting interesting corner cases, which presents an opportunity to refine your definitions and labels to help the model understand your label interpretations. For example, a stricter definition would help the model understand if you consider a stuffed bell pepper a "salad" (or not). With repeated label, train, and evaluate loops your model will surface other such ambiguities in your data.

You can also adjust the score threshold in this view in the user interface, and the TP, FN, and FP images displayed will reflect the threshold change:

true positives with updated threshold

Interpreting evaluation metrics

Object detection models output many bounding boxes for an input image; each box comes with 1) a label and 2) a score or confidence. The evaluation metrics will help you answer several key performance questions about your model:

  • Am I getting the right number of boxes?
  • Does the model tend to give marginal cases lower scores?
  • How closely do the predicted boxes match my ground truth boxes?

Note that these metrics, just like those in multi-label classification, will not point out any class confusion other than generically lower scores.

When examining the per-image model output you need a way to examine a pair of boxes (the ground truth box and the predicted box), and determine how good of a match they are. You must consider things like:

  • Do both boxes have the same label?
  • How well do the boxes overlap?, and
  • How confidently did the model predict the box?

To address the second requirement we introduce a new measurement called the intersection-over-union, or IoU.

IoU and the IoU Threshold

visual of boxes intersection over boxes union

The intersection over union determines how closely two boxes match. The IoU value ranges from 0 (no overlap) to 1 (the boxes are identical) and is calculated by dividing the area in common between the two boxes by the area included in at least one of the boxes. The AutoML service lets you examine your model's performance at several IoU thresholds.

Why might you want to change the IoU threshold?

Consider the use case of counting cars in a parking lot. You don't care if the box coordinates are very accurate, you just care that you have the right total number of boxes. In this case a low IoU threshold is appropriate.

visual of low threshold boxes around cars
Image credit: Nacho, "Smart" (CC BY 2.0, bounding boxes and text added).

Alternatively, consider trying to measure the size of a fabric stain. In this case you need very precise coordinates, and thus a much higher IoU threshold is appropriate.

visual of high threshold box around fabric stain
Image credit: Housing Works Thrift Shops, "Modern Sofa" (CC BY-SA 2.0, bounding boxes and text added).

Note that you do not need to retrain your model if you change your mind as to the right threshold for your use case; you already have access to evaluation metrics at various IoU thresholds.

Score and Score Threshold

Similar to classification models, object detection model outputs (now boxes) come with scores. Also like image classification, there is a score threshold you can specify after training to determine what is (or is not) a "good" match. Changing the score threshold allows you to tune the false positive and true positive rates to your specific model needs. A user who wants very high recall would typically employ a lower score threshold in their processing of the model output.

Iterate on your model

If you're not happy with the quality levels, you can go back to earlier steps to improve the quality:

  • Consider adding more images to any bounding box labels with low quality.
  • You may need to add different types of images (e.g. wider angle, higher or lower resolution, different points of view).
  • Consider removing bounding box labels altogether if you don't have enough training images.
  • Our training algorithms do not use your label names. If you have one label that says "door" and another that says "door_with_knob" the algorithm has no way of figuring out the nuance other than the images you provide it.
  • Augment your data with more examples of true positives and negatives. Especially important examples are the ones that are close to the decision boundary (i.e. likely to produce confusion, but still correctly labeled).
  • Specify your own TRAIN, TEST, VALIDATE split. The tool randomly assigns images, but near-duplicates may end up in TRAIN and VALIDATE which could lead to overfitting and then poor performance on the TEST set.

Once you've made changes, train and evaluate a new model until you reach a high enough quality level.