Evaluating annotation stores

This page describes how to use the projects.locations.datasets.annotationStores.evaluate method to evaluate the quality of annotation records generated by a machine learning algorithm.

Overview

The evaluate method compares annotation records in one annotation store (eval_store) to a manually annotated ground truth annotation store (golden_store) that describes the same resource. The annotation resource is defined in each store's AnnotationSource.

The annotation records in eval_store or golden_store could be generated individually by projects.locations.datasets.annotationStores.annotations.create or by:

Evaluation requirements

To perform evaluation, the following conditions must be met:

Evaluation output

The evaluate method reports the evaluation metrics to BigQuery. The method outputs a row in a specified BigQuery table with the following schema:

Field name Type Mode Description
opTimestamp TIMESTAMP NULLABLE Timestamp when the method was called
opName STRING NULLABLE Name of evaluate long-running operation (LRO)
evalStore STRING NULLABLE Name of eval_store
goldenStore STRING NULLABLE Name of golden_store
goldenCount INTEGER NULLABLE Number of annotation records in the golden_store
matchedCount INTEGER NULLABLE Number of annotation records in the eval_store matched to the annotation records in the golden_store
averageResults RECORD NULLABLE Average results across all infoTypes
averageResults.
sensitiveTextMetrics
RECORD NULLABLE Average results for SensitiveTextAnnotation
averageResults.
sensitiveTextMetrics.
truePositives
INTEGER NULLABLE Number of correct predictions
averageResults.
sensitiveTextMetrics.
falsePositives
INTEGER NULLABLE Number of incorrect predictions
averageResults.
sensitiveTextMetrics.
falseNegatives
INTEGER NULLABLE Number of predictions that were missed
averageResults.
sensitiveTextMetrics.
precision
FLOAT NULLABLE truePositives / (truePositives + falsePositives),
ranges from [0..1] where 1.0 indicates all correct predictions
averageResults.
sensitiveTextMetrics.
recall
FLOAT NULLABLE truePositives / (truePositives + falseNegatives),
ranges from [0..1] where 1.0 indicates no missing prediction
averageResults.
sensitiveTextMetrics.
fScore
FLOAT NULLABLE 2 * precision * recall / (precision + recall),
harmonic average of the precision and recall, ranges from [0..1] where 1.0 indicates perfect predictions
infoResults RECORD REPEATED similar to averageResults, but broken down per infoType
infoResults.
sensitiveTextMetrics
RECORD NULLABLE infoType results for SensitiveTextAnnotation
infoResults.
sensitiveTextMetrics.
infoType
STRING NULLABLE infoType category
infoResults.
sensitiveTextMetrics.
truePositives
INTEGER NULLABLE Number of correct predictions
infoResults.
sensitiveTextMetrics.
falsePositives
INTEGER NULLABLE Number of incorrect predictions
infoResults.
sensitiveTextMetrics.
falseNegatives
INTEGER NULLABLE Number of predictions that were missed
infoResults.
sensitiveTextMetrics.
precision
FLOAT NULLABLE truePositives / (truePositives + falsePositives),
ranges from [0..1] where 1.0 indicates all correct predictions
infoResults.
sensitiveTextMetrics.
recall
FLOAT NULLABLE truePositives / (truePositives + falseNegatives),
ranges from [0..1] where 1.0 indicates no missing prediction
infoResults.
sensitiveTextMetrics.
fScore
FLOAT NULLABLE 2 * precision * recall / (precision + recall),
harmonic average of the precision and recall, ranges from [0..1] where 1.0 indicates perfect predictions

You can refer to EvaluateAnnotationStore for a detailed definition of the method.

See also