Evaluating models

After training a model, AutoML Vision uses items from the TEST set to evaluate the quality and accuracy of the new model.

AutoML Vision provides an aggregate set of evaluation metrics indicating how well the model performs overall, as well as evaluation metrics for each category label, indicating how well the model performs for that label.

  • AuPRC : Area under Precision/Recall curve, also referred to as "average precision." Generally between 0.5 and 1.0. Higher values indicate more accurate models.

  • The Confidence threshold curves show how different confidence thresholds would affect precision, recall, true and false positive rates. Read about the relationship of precision and recall.

  • Confusion matrix: Only present for single-label-per-image models. Represents the percentage of times each label was predicted for each label in the training set during evaluation. Ideally, label one would be assigned only to images classified as label one, etc, so a perfect matrix would look like:

    100  0   0   0
     0  100  0   0
     0   0  100  0
     0   0   0  100
    

    In the example above, if an image was classified as one but the model predicted two, the first row would instead look like:

    99  1  0  0
    

    More information can be found by searching for 'confusion matrix machine learning'.

Use this data to evaluate your model's readiness. High confusion, low AUC scores, or low precision and recall scores can indicate that your model needs additional training data or has inconsistent labels. A very high AUC score and perfect precision and recall can indicate that the data is too easy and may not generalize well.

Get model evaluation values

Web UI

  1. Open the AutoML Vision UI and click the lightbulb icon in the left navigation bar to display the available models.

    To view the models for a different project, select the project from the drop-down list in the upper right of the title bar.

  2. Click the row for the model you want to evaluate.

  3. If necessary, click the Evaluate tab just below the title bar.

    If training has been completed for the model, AutoML Vision shows its evaluation metrics.

    Evaluate page

  4. To view the metrics for a specific label, select the label name from the list of labels in the lower part of the page.

Command-line

  • Replace model-name with the full name of your model, from the response when you created the model. The full name has the format: projects/{project-id}/locations/us-central1/models/{model-id}
curl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  https://automl.googleapis.com/v1beta1/model-name/modelEvaluations

The response includes a ModelEvaluation resource for each label (identified by its annotationSpecId) as well as one for the overall model (identified by an empty annotationSpecId).

{
  "modelEvaluation": [
    {
      "name": "projects/434039606874/locations/us-central1/models/7537307368641647584/modelEvaluations/9009741181387603448",
      "annotationSpecId": "17040929661974749",
      "classificationMetrics": {
        "auPrc": 0.99772006,
        "baseAuPrc": 0.21706384,
        "evaluatedExamplesCount": 377,
        "confidenceMetricsEntry": [
          {
            "recall": 1,
            "precision": -1.3877788e-17,
            "f1Score": -2.7755576e-17,
            "recallAt1": 0.9761273,
            "precisionAt1": 0.9761273,
            "f1ScoreAt1": 0.9761273
          },
          {
            "confidenceThreshold": 0.05,
            "recall": 0.997,
            "precision": 0.867,
            "f1Score": 0.92746675,
            "recallAt1": 0.9761273,
            "precisionAt1": 0.9761273,
            "f1ScoreAt1": 0.9761273
          },
          {
            "confidenceThreshold": 0.1,
            "recall": 0.995,
            "precision": 0.905,
            "f1Score": 0.9478684,
            "recallAt1": 0.9761273,
            "precisionAt1": 0.9761273,
            "f1ScoreAt1": 0.9761273
          },
          {
            "confidenceThreshold": 0.15,
            "recall": 0.992,
            "precision": 0.932,
            "f1Score": 0.96106446,
            "recallAt1": 0.9761273,
            "precisionAt1": 0.9761273,
            "f1ScoreAt1": 0.9761273
          },
          {
            "confidenceThreshold": 0.2,
            "recall": 0.989,
            "precision": 0.951,
            "f1Score": 0.96962786,
            "recallAt1": 0.9761273,
            "precisionAt1": 0.9761273,
            "f1ScoreAt1": 0.9761273
          },
          {
            "confidenceThreshold": 0.25,
            "recall": 0.987,
            "precision": 0.957,
            "f1Score": 0.9717685,
            "recallAt1": 0.9761273,
            "precisionAt1": 0.9761273,
            "f1ScoreAt1": 0.9761273
          },
        ...
        ],
      },
      "createTime": "2018-04-30T23:06:14.746840Z"
    },
    {
      "name": "projects/434039606874/locations/us-central1/models/7537307368641647584/modelEvaluations/9009741181387603671",
      "annotationSpecId": "1258823357545045636",
      "classificationMetrics": {
        "auPrc": 0.9972302,
        "baseAuPrc": 0.1883289,
      ...
      },
      "createTime": "2018-04-30T23:06:14.649260Z"
    }
  ]
}

To get just the evaluation metrics for a specific label, add /{modelEvaluation-name} to the request above, using the full value of the "name" from the response.

Python

Before you can run this code example, you must install the Python Client Libraries.

  • The model_id parameter is the ID of your model. The ID is the last element of the name of your model. For example, if the name of your model is projects/434039606874/locations/us-central1/models/3745331181667467569, then the ID of your model is 3745331181667467569.
# TODO(developer): Uncomment and set the following variables
# project_id = 'PROJECT_ID_HERE'
# compute_region = 'COMPUTE_REGION_HERE'
# model_id = 'MODEL_ID_HERE'
# filter_ = 'filter expression here'

from google.cloud import automl_v1beta1 as automl

client = automl.AutoMlClient()

# Get the full path of the model.
model_full_id = client.model_path(project_id, compute_region, model_id)

# List all the model evaluations in the model by applying filter.
response = client.list_model_evaluations(model_full_id, filter_)

print("List of model evaluations:")
for element in response:
    print(element)

Java

/**
 * Demonstrates using the AutoML client to display model evaluation.
 *
 * @param projectId the Id of the project.
 * @param computeRegion the Region name.
 * @param modelId the Id of the model.
 * @param filter the filter expression.
 * @throws IOException on Input/Output errors.
 */
public static void displayEvaluation(
    String projectId, String computeRegion, String modelId, String filter) throws IOException {
  AutoMlClient client = AutoMlClient.create();

  // Get the full path of the model.
  ModelName modelFullId = ModelName.of(projectId, computeRegion, modelId);

  // List all the model evaluations in the model by applying filter.
  ListModelEvaluationsRequest modelEvaluationsrequest =
      ListModelEvaluationsRequest.newBuilder()
          .setParent(modelFullId.toString())
          .setFilter(filter)
          .build();

  // Iterate through the results.
  String modelEvaluationId = "";
  for (ModelEvaluation element :
      client.listModelEvaluations(modelEvaluationsrequest).iterateAll()) {
    if (element.getAnnotationSpecId() != null) {
      modelEvaluationId = element.getName().split("/")[element.getName().split("/").length - 1];
    }
  }

  // Resource name for the model evaluation.
  ModelEvaluationName modelEvaluationFullId =
      ModelEvaluationName.of(projectId, computeRegion, modelId, modelEvaluationId);

  // Get a model evaluation.
  ModelEvaluation modelEvaluation = client.getModelEvaluation(modelEvaluationFullId);

  ClassificationEvaluationMetrics classMetrics =
      modelEvaluation.getClassificationEvaluationMetrics();
  List<ConfidenceMetricsEntry> confidenceMetricsEntries =
      classMetrics.getConfidenceMetricsEntryList();

  // Showing model score based on threshold of 0.5
  for (ConfidenceMetricsEntry confidenceMetricsEntry : confidenceMetricsEntries) {
    if (confidenceMetricsEntry.getConfidenceThreshold() == 0.5) {
      System.out.println("Precision and recall are based on a score threshold of 0.5");
      System.out.println(
          String.format("Model Precision: %.2f ", confidenceMetricsEntry.getPrecision() * 100)
              + '%');
      System.out.println(
          String.format("Model Recall: %.2f ", confidenceMetricsEntry.getRecall() * 100) + '%');
      System.out.println(
          String.format("Model F1 score: %.2f ", confidenceMetricsEntry.getF1Score() * 100)
              + '%');
      System.out.println(
          String.format(
                  "Model Precision@1: %.2f ", confidenceMetricsEntry.getPrecisionAt1() * 100)
              + '%');
      System.out.println(
          String.format("Model Recall@1: %.2f ", confidenceMetricsEntry.getRecallAt1() * 100)
              + '%');
      System.out.println(
          String.format("Model F1 score@1: %.2f ", confidenceMetricsEntry.getF1ScoreAt1() * 100)
              + '%');
    }
  }
}

Node.js

  const automl = require(`@google-cloud/automl`).v1beta1;
  const math = require(`mathjs`);

  const client = new automl.AutoMlClient();

  /**
   * TODO(developer): Uncomment the following line before running the sample.
   */
  // const projectId = `The GCLOUD_PROJECT string, e.g. "my-gcloud-project"`;
  // const computeRegion = `region-name, e.g. "us-central1"`;
  // const modelId = `id of the model, e.g. “ICN12345”`;
  // const filter = `filter expressions, must specify field, e.g. “imageClassificationModelMetadata:*”`;

  // Get the full path of the model.
  const modelFullId = client.modelPath(projectId, computeRegion, modelId);

  // List all the model evaluations in the model by applying filter.
  client
    .listModelEvaluations({parent: modelFullId, filter: filter})
    .then(respond => {
      const response = respond[0];
      response.forEach(element => {
        // There is evaluation for each class in a model and for overall model.
        // Get only the evaluation of overall model.
        if (!element.annotationSpecId) {
          const modelEvaluationId = element.name.split(`/`).pop(-1);

          // Resource name for the model evaluation.
          const modelEvaluationFullId = client.modelEvaluationPath(
            projectId,
            computeRegion,
            modelId,
            modelEvaluationId
          );

          // Get a model evaluation.
          client
            .getModelEvaluation({name: modelEvaluationFullId})
            .then(responses => {
              const modelEvaluation = responses[0];

              const classMetrics =
                modelEvaluation.classificationEvaluationMetrics;

              const confidenceMetricsEntries =
                classMetrics.confidenceMetricsEntry;

              // Showing model score based on threshold of 0.5
              confidenceMetricsEntries.forEach(confidenceMetricsEntry => {
                if (confidenceMetricsEntry.confidenceThreshold === 0.5) {
                  console.log(
                    `Precision and recall are based on a score threshold of 0.5`
                  );
                  console.log(
                    `Model Precision: %`,
                    math.round(confidenceMetricsEntry.precision * 100, 2)
                  );
                  console.log(
                    `Model Recall: %`,
                    math.round(confidenceMetricsEntry.recall * 100, 2)
                  );
                  console.log(
                    `Model F1 score: %`,
                    math.round(confidenceMetricsEntry.f1Score * 100, 2)
                  );
                  console.log(
                    `Model Precision@1: %`,
                    math.round(confidenceMetricsEntry.precisionAt1 * 100, 2)
                  );
                  console.log(
                    `Model Recall@1: %`,
                    math.round(confidenceMetricsEntry.recallAt1 * 100, 2)
                  );
                  console.log(
                    `Model F1 score@1: %`,
                    math.round(confidenceMetricsEntry.f1ScoreAt1 * 100, 2)
                  );
                }
              });
            })
            .catch(err => {
              console.error(err);
            });
        }
      });
    })
    .catch(err => {
      console.error(err);
    });

Iterate on your model

If you're not happy with the quality levels, you can go back to earlier steps to improve the quality:

  • AutoML Vision allows you to sort the images by how “confused” the model is, by the true label and its predicted label. Look through these images and make sure they're labeled correctly.
  • Consider adding more images to any labels with low quality.
  • You may need to add different types of images (e.g. wider angle, higher or lower resolution, different points of view).
  • Consider removing labels altogether if you don't have enough training images.
  • Remember that machines can’t read your label name; it's just a random string of letters to them. If you have one label that says "door" and another that says "door_with_knob" the machine has no way of figuring out the nuance other than the images you provide it.
  • Augment your data with more examples of true positives and negatives. Especially important examples are the ones that are close to the decision boundary (i.e. likely to produce confusion, but still correctly labeled).
  • Specify your own TRAIN, TEST, VALIDATION split. The tool randomly assigns images, but near-duplicates may end up in TRAIN and VALIDATION which could lead to overfitting and then poor performance on the TEST set.

Once you've made changes, train and evaluate a new model until you reach a high enough quality level.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud AutoML Vision
Need help? Visit our support page.