Evaluating models

After training a model, AutoML Natural Language Entity Extraction uses items from the TEST set to evaluate the quality and accuracy of the new model.

Precision and recall measure how well the model is capturing information, and how much it’s leaving out. Precision indicates, from all the items identified as a particular entity, how many actually were supposed to be assigned to that entity. Recall indicates, from all the items that should have been identified as a particular entity, how many were actually assigned to that entity.

The Confusion matrix represents the percentage of times each label was predicted in the training set during evaluation. Ideally, label one would be assigned only to documents classified as label one, etc, so a perfect matrix would look like:

    100  0   0   0
     0  100  0   0
     0   0  100  0
     0   0   0  100

In the example above, if a document was classified as one but the model predicted two, the first row would instead look like:

    99  1  0  0

More information can be found by searching for 'confusion matrix machine learning'.

AutoML Natural Language Entity Extraction creates the confusion matrix for up to 10 labels. If you have more than 10 labels, the matrix includes the 10 labels with the most confusion (incorrect predictions).

Use these metrics to evaluate your model's readiness. Low precision and recall scores can indicate that your model needs additional training data or has inconsistent annotations. Perfect precision and recall can indicate that the data is too easy and may not generalize well.

If you're not happy with the quality levels, you can go back to earlier steps to improve the quality:

  • Consider adding more annotations for any labels with low quality.
  • Consider removing labels altogether if you don't have enough training documents.

Web UI

  1. Open the AutoML Natural Language Entity Extraction UI, select the Get started link in the AutoML Entity Extraction box, and click the lightbulb icon in the left navigation bar to display the available models.

    To view the models for a different project, select the project from the drop-down list in the upper right of the title bar.

  2. Click the row for the model you want to evaluate.

  3. If necessary, click the Evaluate tab just below the title bar.

    If training has been completed for the model, AutoML Natural Language Entity Extraction shows its evaluation metrics.

    Evaluate page

  4. To view the metrics for a specific label, select the label name from the list of labels in the lower part of the page.

    When viewing the metrics for a single label, AutoML Natural Language Entity Extraction shows examples of true positives (where the model predicted the label correctly), false positives (where the model applied the label incorrectly), and false negatives (where the model should have applied the label but did not). These examples can help you determine what adjustments you need to make to the training data.

Command-line

In the command below, replace project-id and model-id with your IDs.

curl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  https://automl.googleapis.com/v1beta1/projects/project-id/locations/us-central1/models/model-id/modelEvaluations

The response includes a ModelEvaluation resource for each label (identified by its displayName) as well as one for the overall model (identified by annotationSpecId = -1).

Java

/**
 * Demonstrates using the AutoML client to display model evaluation.
 *
 * @param projectId the Id of the project.
 * @param computeRegion the Region name. (e.g., "us-central1")
 * @param modelId the Id of the model.
 * @param filter the Filter expression.
 * @throws IOException
 */
public static void displayEvaluation(
    String projectId, String computeRegion, String modelId, String filter) throws IOException {
  // Instantiates a client
  AutoMlClient client = AutoMlClient.create();

  // Get the full path of the model.
  ModelName modelFullId = ModelName.of(projectId, computeRegion, modelId);

  // List all the model evaluations in the model by applying filter.
  ListModelEvaluationsRequest modelEvaluationsrequest =
      ListModelEvaluationsRequest.newBuilder()
          .setParent(modelFullId.toString())
          .setFilter(filter)
          .build();

  // Iterate through the results.
  String modelEvaluationId = "";
  for (ModelEvaluation element :
      client.listModelEvaluations(modelEvaluationsrequest).iterateAll()) {
    // There is evaluation for each class in a model and for overall model.
    // Get only the evaluation of overall model.
    if (element.getAnnotationSpecId().isEmpty()) {
      modelEvaluationId = element.getName().split("/")[element.getName().split("/").length - 1];
    }
  }

  System.out.println("Model Evaluation ID:" + modelEvaluationId);

  // Resource name for the model evaluation.
  ModelEvaluationName modelEvaluationFullId =
      ModelEvaluationName.of(projectId, computeRegion, modelId, modelEvaluationId);

  // Get a model evaluation.
  ModelEvaluation modelEvaluation = client.getModelEvaluation(modelEvaluationFullId);

  TextExtractionEvaluationMetrics textExtractionMetrics =
      modelEvaluation.getTextExtractionEvaluationMetrics();
  List<ConfidenceMetricsEntry> confidenceMetricsEntries =
      textExtractionMetrics.getConfidenceMetricsEntriesList();

  // Showing model score based on threshold of 0.5
  for (ConfidenceMetricsEntry confidenceMetricsEntry : confidenceMetricsEntries) {
    if (confidenceMetricsEntry.getConfidenceThreshold() == 0.5) {
      System.out.println("Precision and recall are based on a score threshold of 0.5");
      System.out.println(
          String.format("Model precision: %.2f ", confidenceMetricsEntry.getPrecision() * 100)
              + '%');
      System.out.println(
          String.format("Model recall: %.2f ", confidenceMetricsEntry.getRecall() * 100) + '%');
      System.out.println(
          String.format("Model f1 score: %.2f ", confidenceMetricsEntry.getF1Score() * 100)
              + '%');
    }
  }
}

Node.js

const automl = require(`@google-cloud/automl`);
const math = require(`mathjs`);
const client = new automl.v1beta1.AutoMlClient();

/**
 * Demonstrates using the AutoML client to display model evaluation.
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project";
// const computeRegion = '[REGION_NAME]' e.g., "us-central1";
// const modelId = '[MODEL_ID]'  e.g., "TEN5200971474357190656";
// const filter_ = '[FILTER_EXPRESSIONS]'
// e.g., "textExtractionModelMetadata:*";

// Get the full path of the model.
const modelFullId = client.modelPath(projectId, computeRegion, modelId);

// List all the model evaluations in the model by applying filter.
client
  .listModelEvaluations({parent: modelFullId, filter: filter})
  .then(respond => {
    const response = respond[0];
    // Iterate through the results.
    let modelEvaluationId = ``;
    for (const element of response) {
      // There is evaluation for each class in a model and for overall model.
      // Get only the evaluation of overall model.
      if (!element.annotationSpecId) {
        modelEvaluationId = element.name.split(`/`).pop(-1);
      }
    }
    console.log(`Model Evaluation ID: ${modelEvaluationId}`);

    // Resource name for the model evaluation.
    const modelEvaluationFullId = client.modelEvaluationPath(
      projectId,
      computeRegion,
      modelId,
      modelEvaluationId
    );

    // Get a model evaluation.
    client
      .getModelEvaluation({name: modelEvaluationFullId})
      .then(responses => {
        const modelEvaluation = responses[0];
        const extractMetrics =
          modelEvaluation.textExtractionEvaluationMetrics;
        const confidenceMetricsEntries =
          extractMetrics.confidenceMetricsEntries;

        // Showing model score based on threshold of 0.5
        for (const confidenceMetricsEntry of confidenceMetricsEntries) {
          if (confidenceMetricsEntry.confidenceThreshold === 0.5) {
            console.log(
              `Precision and recall are based ` +
                `on a score threshold of 0.5 `
            );
            console.log(
              `Model precision: ${math.round(
                confidenceMetricsEntry.precision * 100,
                2
              )} %`
            );
            console.log(
              `Model recall: ${math.round(
                confidenceMetricsEntry.recall * 100,
                2
              )} %`
            );
            console.log(
              `Model f1 score: ${math.round(
                confidenceMetricsEntry.f1Score * 100,
                2
              )} %`
            );
          }
        }
      })
      .catch(err => {
        console.error(err);
      });
  })
  .catch(err => {
    console.error(err);
  });

Python

    # TODO(developer): Uncomment and set the following variables
    # project_id = '[PROJECT_ID]'
    # compute_region = '[COMPUTE_REGION]'
    # model_id = '[MODEL_ID]'
    # filter_ = 'filter expression here'

    from google.cloud import automl_v1beta1 as automl

    client = automl.AutoMlClient()

    # Get the full path of the model.
    model_full_id = client.model_path(project_id, compute_region, model_id)

    # List all the model evaluations in the model by applying filter.
    response = client.list_model_evaluations(model_full_id, filter_)

    # Iterate through the results.
    for element in response:
        # There is evaluation for each class in a model and for overall model.
        # Get only the evaluation of overall model.
        if not element.annotation_spec_id:
            model_evaluation_id = element.name.split("/")[-1]

    # Resource name for the model evaluation.
    model_evaluation_full_id = client.model_evaluation_path(
        project_id, compute_region, model_id, model_evaluation_id
    )

    # Get a model evaluation.
    model_evaluation = client.get_model_evaluation(model_evaluation_full_id)

    entity_metrics = model_evaluation.text_extraction_evaluation_metrics

    print(entity_metrics)

Was this page helpful? Let us know how we did:

Send feedback about...

AutoML Natural Language Entity Extraction