Evaluating models

After training a model, AutoML Natural Language Sentiment Analysis uses items from the TEST set to evaluate the quality and accuracy of the new model.

AutoML Natural Language Sentiment Analysis provides an aggregate set of evaluation metrics indicating how well the model performs. The set includes metrics to evaluate both the classification and regression aspects of sentiment analysis.

  • Precision and recall measure how well the model is capturing information, and how much it’s leaving out. Precision indicates, from all the items assigned a particular sentiment value, how many actually were supposed to be assigned that value. Recall indicates, from all the items that should have had a particular sentiment value assigned, how many were actually assigned that value. The F1 score is the harmonic mean of precision and recall.

  • Mean absolute error (MAE) and mean squared error (MSE) measure the distance between the predicted sentiment value and the actual sentiment value. Lower values indicate more accurate models.

  • Linear-weighted kappa and quadratic-weighted kappa measure how closely the sentiment values assigned by the model agree with values assigned by human raters. Higher values indicate more accurate models.

  • Confusion matrix represents the percentage of times each sentiment value was predicted for each value in the training set during evaluation. Ideally, the value one would be assigned only to items classified as value one, so a perfect matrix would look like:

    100  0   0   0
     0  100  0   0
     0   0  100  0
     0   0   0  100
    

    In the example above, if an item was classified as one but the model predicted two, the first row would instead look like:

    99  1  0  0
    

    More information can be found by searching for 'confusion matrix machine learning'.

Web UI

  1. Open the AutoML Natural Language Sentiment Analysis UI, select the Launch app link in the AutoML Sentiment Analysis box, and click the lightbulb icon in the left navigation bar to display the available models.

    To view the models for a different project, select the project from the drop-down list in the upper right of the title bar.

  2. Click the row for the model you want to evaluate.

  3. If necessary, click the Evaluate tab just below the title bar.

    If training has been completed for the model, AutoML Natural Language Sentiment Analysis shows its evaluation metrics.

    Evaluate page

  4. To view the metrics for a specific sentiment value, select the value from the list in the lower part of the page.

Command-line

In the command below, replace project-id and model-id with your IDs.

curl -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  https://automl.googleapis.com/v1beta1/projects/project-id/locations/us-central1/models/model-id/modelEvaluations

The response includes a ModelEvaluation.

Java

import com.google.cloud.automl.v1beta1.AutoMlClient;
import com.google.cloud.automl.v1beta1.ClassificationProto.ClassificationEvaluationMetrics.ConfusionMatrix;
import com.google.cloud.automl.v1beta1.ClassificationProto.ClassificationEvaluationMetrics.ConfusionMatrix.Row;
import com.google.cloud.automl.v1beta1.ListModelEvaluationsRequest;
import com.google.cloud.automl.v1beta1.ModelEvaluation;
import com.google.cloud.automl.v1beta1.ModelEvaluationName;
import com.google.cloud.automl.v1beta1.ModelName;
import com.google.cloud.automl.v1beta1.TextSentimentProto.TextSentimentEvaluationMetrics;
import com.google.protobuf.ProtocolStringList;
import java.io.IOException;
import java.util.List;

public class DisplayEvaluation {

  // Display Model Evaluation
  public static void displayEvaluation(
      String projectId, String computeRegion, String modelId, String filter) throws IOException {
    // String projectId = "YOUR_PROJECT_ID";
    // String computeRegion = "us-central1";
    // String modelId = "YOUR_MODEL_ID";
    // String filter = "YOUR_FILTER_EXPRESSION";

    // Instantiates a client
    try (AutoMlClient client = AutoMlClient.create()) {

      // Get the full path of the model.
      ModelName modelName = ModelName.of(projectId, computeRegion, modelId);

      // List all the model evaluations in the model by applying.
      ListModelEvaluationsRequest modelEvaluationsrequest =
          ListModelEvaluationsRequest.newBuilder()
              .setParent(modelName.toString())
              .setFilter(filter)
              .build();

      // Iterate through the results.
      String modelEvaluationId = "";
      for (ModelEvaluation element :
          client.listModelEvaluations(modelEvaluationsrequest).iterateAll()) {
        if (element.getAnnotationSpecId().isEmpty()) {
          modelEvaluationId = element.getName().split("/")[element.getName().split("/").length - 1];
        }
      }

      System.out.println("Model Evaluation ID:" + modelEvaluationId);

      // Resource name for the model evaluation.
      ModelEvaluationName modelEvaluationFullId =
          ModelEvaluationName.of(projectId, computeRegion, modelId, modelEvaluationId);

      // Get a model evaluation.
      ModelEvaluation modelEvaluation = client.getModelEvaluation(modelEvaluationFullId);

      TextSentimentEvaluationMetrics textSentimentMetrics =
          modelEvaluation.getTextSentimentEvaluationMetrics();

      // Showing text sentiment evaluation metrics
      System.out.println(
          String.format("Model precision: %.2f ", textSentimentMetrics.getPrecision() * 100) + '%');
      System.out.println(
          String.format("Model recall: %.2f ", textSentimentMetrics.getRecall() * 100) + '%');
      System.out.println(
          String.format("Model f1 score: %.2f ", textSentimentMetrics.getF1Score() * 100) + '%');
      System.out.println(
          String.format(
              "Model mean absolute error: %.2f ",
              textSentimentMetrics.getMeanAbsoluteError() * 100)
              + '%');
      System.out.println(
          String.format(
              "Model mean squared error: %.2f ", textSentimentMetrics.getMeanSquaredError() * 100)
              + '%');
      System.out.println(
          String.format("Model linear kappa: %.2f ", textSentimentMetrics.getLinearKappa() * 100)
              + '%');
      System.out.println(
          String.format(
              "Model quadratic kappa: %.2f ", textSentimentMetrics.getQuadraticKappa() * 100)
              + '%');

      ConfusionMatrix confusionMatrix = textSentimentMetrics.getConfusionMatrix();

      ProtocolStringList annotationSpecIdList = confusionMatrix.getAnnotationSpecIdList();
      System.out.println("Model confusion matrix:");
      for (String annotationSpecId : annotationSpecIdList) {
        System.out.println(String.format("\tAnnotation spec Id: " + annotationSpecId));
      }
      List<Row> rowList = confusionMatrix.getRowList();

      for (Row row : rowList) {
        System.out.println("\tRow:");
        List<Integer> exampleCountList = row.getExampleCountList();
        for (Integer exampleCount : exampleCountList) {
          System.out.println(String.format("\t\tExample count: " + exampleCount));
        }
      }
      annotationSpecIdList = textSentimentMetrics.getAnnotationSpecIdList();
      for (String annotationSpecId : annotationSpecIdList) {
        System.out.println(String.format("Annotation spec Id: " + annotationSpecId));
      }

    }
  }
}

Node.js

const automl = require(`@google-cloud/automl`);
const math = require(`mathjs`);
const util = require(`util`);
const client = new automl.v1beta1.AutoMlClient();

/**
 * Demonstrates using the AutoML client to display model evaluation.
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project";
// const computeRegion = '[REGION_NAME]' e.g., "us-central1";
// const modelId = '[MODEL_ID]'  e.g., "TEN5200971474357190656";
// const filter_ = '[FILTER_EXPRESSIONS]'
// e.g., "textSentimentModelMetadata:*";

// Get the full path of the model.
const modelFullId = client.modelPath(projectId, computeRegion, modelId);

// List all the model evaluations in the model by applying filter.
client
  .listModelEvaluations({parent: modelFullId, filter: filter})
  .then(respond => {
    const response = respond[0];
    // Iterate through the results.
    let modelEvaluationId = ``;
    for (const element of response) {
      // There is evaluation for each class in a model and for overall model.
      // Get only the evaluation of overall model.
      if (!element.annotationSpecId) {
        modelEvaluationId = element.name.split(`/`).pop(-1);
      }
    }
    console.log(`Model Evaluation ID:`, modelEvaluationId);

    // Resource name for the model evaluation.
    const modelEvaluationFullId = client.modelEvaluationPath(
      projectId,
      computeRegion,
      modelId,
      modelEvaluationId
    );

    // Get a model evaluation.
    client
      .getModelEvaluation({name: modelEvaluationFullId})
      .then(responses => {
        const modelEvaluation = responses[0];

        const sentimentMetrics =
          modelEvaluation.textSentimentEvaluationMetrics;
        const confusionMatrix = sentimentMetrics.confusionMatrix;

        console.log(
          `Model precision: ${math.round(
            sentimentMetrics.precision * 100,
            2
          )} %`
        );
        console.log(
          `Model recall: ${math.round(sentimentMetrics.recall * 100, 2)} %`
        );
        console.log(
          `Model f1 score: ${math.round(sentimentMetrics.f1Score * 100, 2)} %`
        );
        console.log(
          `Model mean absolute error: ${math.round(
            sentimentMetrics.meanAbsoluteError * 100,
            2
          )} %`
        );
        console.log(
          `Model mean squared error: ${math.round(
            sentimentMetrics.meanSquaredError * 100,
            2
          )} %`
        );
        console.log(
          `Model linear kappa: ${math.round(
            sentimentMetrics.linearKappa * 100,
            2
          )} %`
        );
        console.log(
          `Model quadratic kappa: ${math.round(
            sentimentMetrics.quadraticKappa * 100,
            2
          )} %`
        );

        console.log(`Model confusion matrix:`);
        const annotationSpecIdList = confusionMatrix.annotationSpecId;

        for (const annotationSpecId of annotationSpecIdList) {
          console.log(`\tAnnotation spec Id: ${annotationSpecId}`);
        }
        const rowList = confusionMatrix.row;

        for (const row of rowList) {
          console.log(`\tRow:`);
          const exampleCountList = row.exampleCount;

          for (const exampleCount of exampleCountList) {
            console.log(
              `\t\tExample count: ${util.inspect(exampleCount, false, null)}`
            );
          }
        }
        console.log(
          `Annotation spec Id: ${sentimentMetrics.annotationSpecId}`
        );
      })
      .catch(err => {
        console.error(err);
      });
  })
  .catch(err => {
    console.error(err);
  });

Python

    # TODO(developer): Uncomment and set the following variables
    # project_id = '[PROJECT_ID]'
    # compute_region = '[COMPUTE_REGION]'
    # model_id = '[MODEL_ID]'
    # filter_ = 'filter expression here'

    from google.cloud import automl_v1beta1 as automl

    client = automl.AutoMlClient()

    # Get the full path of the model.
    model_full_id = client.model_path(project_id, compute_region, model_id)

    # List all the model evaluations in the model by applying filter.
    response = client.list_model_evaluations(model_full_id, filter_)

    # Iterate through the results.
    for element in response:
        # There is evaluation for each class in a model and for overall model.
        # Get only the evaluation of overall model.
        if not element.annotation_spec_id:
            model_evaluation_id = element.name.split("/")[-1]

    # Resource name for the model evaluation.
    model_evaluation_full_id = client.model_evaluation_path(
        project_id, compute_region, model_id, model_evaluation_id
    )

    # Get a model evaluation.
    model_evaluation = client.get_model_evaluation(model_evaluation_full_id)

    sentiment_metrics = model_evaluation.text_sentiment_evaluation_metrics

    print(
        "Model Precision: {}%".format(
            round(sentiment_metrics.precision * 100, 2)
        )
    )
    print(
        "Model Recall: {}%".format(
            round(sentiment_metrics.recall * 100, 2)
        )
    )
    print(
        "Model F1 score: {}%".format(
            round(sentiment_metrics.f1_score * 100, 2)
        )
    )
    print(
        "Model absolute error: {}%".format(
            round(sentiment_metrics.mean_absolute_error * 100, 2)
        )
    )
    print(
        "Model mean squared error: {}%".format(
            round(sentiment_metrics.mean_squared_error * 100, 2)
        )
    )
    print(
        "Model linear kappa: {}%".format(
            round(sentiment_metrics.linear_kappa * 100, 2)
        )
    )
    print(
        "Model quadratic kappa: {}%".format(
            round(sentiment_metrics.quadratic_kappa * 100, 2)
        )
    )

Was this page helpful? Let us know how we did:

Send feedback about...

AutoML Natural Language Sentiment Analysis