This page describes how to use the projects.locations.datasets.annotationStores.evaluate
method to evaluate the quality of annotation records generated by a machine
learning algorithm.
Overview
The evaluate
method compares annotation records in one annotation store (eval_store
) to a manually
annotated ground truth annotation store (golden_store
) that describes the same resource.
The annotation resource is defined in
each store's AnnotationSource
.
The annotation records in eval_store
or golden_store
could be
generated individually by projects.locations.datasets.annotationStores.annotations.create
or by:
- Calling
datasets.deidentify
with anAnnotationConfig
object - Calling
projects.locations.datasets.annotationStores.import
Evaluation requirements
To perform evaluation, the following conditions must be met:
In the
eval_store
, each annotated resource defined inAnnotationSource
can have only one annotation record for each annotation type:SensitiveTextAnnotation
must store thequote
s obtained from the annotated resource. If you generated annotation records usingdatasets.deidentify
, setstore_quote
inAnnotationConfig
totrue
.
Evaluation output
The
evaluate
method reports the evaluation metrics to BigQuery. The
method outputs a row in a specified BigQuery table with the
following schema:
Field name | Type | Mode | Description |
---|---|---|---|
opTimestamp |
TIMESTAMP |
NULLABLE |
Timestamp when the method was called |
opName |
STRING |
NULLABLE |
Name of evaluate long-running operation (LRO) |
evalStore |
STRING |
NULLABLE |
Name of eval_store |
goldenStore |
STRING |
NULLABLE |
Name of golden_store |
goldenCount |
INTEGER |
NULLABLE |
Number of annotation records in the golden_store |
matchedCount
|
INTEGER
|
NULLABLE
|
Number of annotation records in the eval_store matched to the
annotation records in the golden_store |
averageResults |
RECORD |
NULLABLE |
Average results across all infoTypes |
averageResults. sensitiveTextMetrics |
RECORD
|
NULLABLE
|
Average results for SensitiveTextAnnotation
|
averageResults. sensitiveTextMetrics. truePositives |
INTEGER
|
NULLABLE
|
Number of correct predictions |
averageResults. sensitiveTextMetrics. falsePositives |
INTEGER
|
NULLABLE
|
Number of incorrect predictions |
averageResults. sensitiveTextMetrics. falseNegatives |
INTEGER
|
NULLABLE
|
Number of predictions that were missed |
averageResults. sensitiveTextMetrics. precision |
FLOAT
|
NULLABLE
|
truePositives / (truePositives + falsePositives) ,ranges from [0..1]
where 1.0 indicates all correct predictions |
averageResults. sensitiveTextMetrics. recall |
FLOAT
|
NULLABLE
|
truePositives / (truePositives + falseNegatives) ,ranges from [0..1]
where 1.0 indicates no missing prediction |
averageResults. sensitiveTextMetrics. fScore |
FLOAT
|
NULLABLE
|
2 * precision * recall / (precision + recall) ,harmonic average of the precision and recall, ranges from [0..1] where 1.0 indicates perfect predictions |
infoResults |
RECORD |
REPEATED |
similar to averageResults , but broken down per infoType |
infoResults. sensitiveTextMetrics |
RECORD
|
NULLABLE
|
infoType results for SensitiveTextAnnotation
|
infoResults. sensitiveTextMetrics. infoType |
STRING
|
NULLABLE
|
infoType category |
infoResults. sensitiveTextMetrics. truePositives |
INTEGER
|
NULLABLE
|
Number of correct predictions |
infoResults. sensitiveTextMetrics. falsePositives |
INTEGER
|
NULLABLE
|
Number of incorrect predictions |
infoResults. sensitiveTextMetrics. falseNegatives |
INTEGER
|
NULLABLE
|
Number of predictions that were missed |
infoResults. sensitiveTextMetrics. precision |
FLOAT
|
NULLABLE
|
truePositives / (truePositives + falsePositives) ,ranges from [0..1]
where 1.0 indicates all correct predictions |
infoResults. sensitiveTextMetrics. recall |
FLOAT
|
NULLABLE
|
truePositives / (truePositives + falseNegatives) ,ranges from [0..1]
where 1.0 indicates no missing prediction |
infoResults. sensitiveTextMetrics. fScore |
FLOAT
|
NULLABLE
|
2 * precision * recall / (precision + recall) ,harmonic average of the precision and recall, ranges from [0..1] where 1.0 indicates perfect predictions |
You can refer to EvaluateAnnotationStore
for a detailed definition of the method.