- Resource: ModelEvaluation
- ClassificationEvaluationMetrics
- ConfidenceMetricsEntry
- ConfusionMatrix
- Row
- RegressionEvaluationMetrics
- TranslationEvaluationMetrics
- ImageObjectDetectionEvaluationMetrics
- BoundingBoxMetricsEntry
- ConfidenceMetricsEntry
- VideoObjectTrackingEvaluationMetrics
- TextSentimentEvaluationMetrics
- TextExtractionEvaluationMetrics
- ConfidenceMetricsEntry
- Methods
Resource: ModelEvaluation
Evaluation results of a model.
JSON representation | |
---|---|
{ "name": string, "annotationSpecId": string, "displayName": string, "createTime": string, "evaluatedExampleCount": integer, // Union field |
Fields | ||
---|---|---|
name |
Output only. Resource name of the model evaluation. Format:
|
|
annotationSpecId |
Output only. The ID of the annotation spec that the model evaluation applies to. The The ID is empty for the overall model evaluation. For Tables annotation specs in the dataset do not exist and this ID is always not set, but for CLASSIFICATION
|
|
displayName |
Output only. The value of
|
|
createTime |
Output only. Timestamp when this model evaluation was created. A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: |
|
evaluatedExampleCount |
Output only. The number of examples used for model evaluation, i.e. for which ground truth from time of model creation is compared against the predicted annotations created by the model. For overall ModelEvaluation (i.e. with annotationSpecId not set) this is the total number of all examples used for evaluation. Otherwise, this is the count of examples that according to the ground truth were annotated by the |
|
Union field metrics . Output only. Problem type specific evaluation metrics. metrics can be only one of the following: |
||
classificationEvaluationMetrics |
Model evaluation metrics for image, text, video and tables classification. Tables problem is considered a classification when the target column is CATEGORY DataType. |
|
regressionEvaluationMetrics |
Model evaluation metrics for Tables regression. Tables problem is considered a regression when the target column has FLOAT64 DataType. |
|
translationEvaluationMetrics |
Model evaluation metrics for translation. |
|
imageObjectDetectionEvaluationMetrics |
Model evaluation metrics for image object detection. |
|
videoObjectTrackingEvaluationMetrics |
Model evaluation metrics for video object tracking. |
|
textSentimentEvaluationMetrics |
Evaluation metrics for text sentiment models. |
|
textExtractionEvaluationMetrics |
Evaluation metrics for text extraction models. |
ClassificationEvaluationMetrics
Model evaluation metrics for classification problems. Note: For Video Classification this metrics only describe quality of the Video Classification predictions of "segment_classification" type.
JSON representation | |
---|---|
{ "auPrc": number, "baseAuPrc": number, "auRoc": number, "logLoss": number, "confidenceMetricsEntry": [ { object ( |
Fields | |
---|---|
auPrc |
Output only. The Area Under Precision-Recall Curve metric. Micro-averaged for the overall evaluation. |
baseAuPrc |
Output only. The Area Under Precision-Recall Curve metric based on priors. Micro-averaged for the overall evaluation. Deprecated. |
auRoc |
Output only. The Area Under Receiver Operating Characteristic curve metric. Micro-averaged for the overall evaluation. |
logLoss |
Output only. The Log Loss metric. |
confidenceMetricsEntry[] |
Output only. Metrics for each confidenceThreshold in 0.00,0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and positionThreshold = INT32_MAX_VALUE. ROC and precision-recall curves, and other aggregated metrics are derived from them. The confidence metrics entries may also be supplied for additional values of positionThreshold, but from these no aggregated metrics are computed. |
confusionMatrix |
Output only. Confusion matrix of the evaluation. Only set for MULTICLASS classification problems where number of labels is no more than 10. Only set for model level evaluation, not for evaluation per label. |
annotationSpecId[] |
Output only. The annotation spec ids used for this evaluation. |
ConfidenceMetricsEntry
Metrics for a single confidence threshold.
JSON representation | |
---|---|
{ "confidenceThreshold": number, "positionThreshold": integer, "recall": number, "precision": number, "falsePositiveRate": number, "f1Score": number, "recallAt1": number, "precisionAt1": number, "falsePositiveRateAt1": number, "f1ScoreAt1": number, "truePositiveCount": string, "falsePositiveCount": string, "falseNegativeCount": string, "trueNegativeCount": string } |
Fields | |
---|---|
confidenceThreshold |
Output only. Metrics are computed with an assumption that the model never returns predictions with score lower than this value. |
positionThreshold |
Output only. Metrics are computed with an assumption that the model always returns at most this many predictions (ordered by their score, descendingly), but they all still need to meet the confidenceThreshold. |
recall |
Output only. Recall (True Positive Rate) for the given confidence threshold. |
precision |
Output only. Precision for the given confidence threshold. |
falsePositiveRate |
Output only. False Positive Rate for the given confidence threshold. |
f1Score |
Output only. The harmonic mean of recall and precision. |
recallAt1 |
Output only. The Recall (True Positive Rate) when only considering the label that has the highest prediction score and not below the confidence threshold for each example. |
precisionAt1 |
Output only. The precision when only considering the label that has the highest prediction score and not below the confidence threshold for each example. |
falsePositiveRateAt1 |
Output only. The False Positive Rate when only considering the label that has the highest prediction score and not below the confidence threshold for each example. |
f1ScoreAt1 |
Output only. The harmonic mean of |
truePositiveCount |
Output only. The number of model created labels that match a ground truth label. |
falsePositiveCount |
Output only. The number of model created labels that do not match a ground truth label. |
falseNegativeCount |
Output only. The number of ground truth labels that are not matched by a model created label. |
trueNegativeCount |
Output only. The number of labels that were not created by the model, but if they would, they would not match a ground truth label. |
ConfusionMatrix
Confusion matrix of the model running the classification.
JSON representation | |
---|---|
{
"annotationSpecId": [
string
],
"displayName": [
string
],
"row": [
{
object ( |
Fields | |
---|---|
annotationSpecId[] |
Output only. IDs of the annotation specs used in the confusion matrix. For Tables CLASSIFICATION
|
displayName[] |
Output only. Display name of the annotation specs used in the confusion matrix, as they were at the moment of the evaluation. For Tables CLASSIFICATION
|
row[] |
Output only. Rows in the confusion matrix. The number of rows is equal to the size of |
Row
Output only. A row in the confusion matrix.
JSON representation | |
---|---|
{ "exampleCount": [ integer ] } |
Fields | |
---|---|
exampleCount[] |
Output only. Value of the specific cell in the confusion matrix. The number of values each row has (i.e. the length of the row) is equal to the length of the |
RegressionEvaluationMetrics
Metrics for regression problems.
JSON representation | |
---|---|
{ "rootMeanSquaredError": number, "meanAbsoluteError": number, "meanAbsolutePercentageError": number, "rSquared": number, "rootMeanSquaredLogError": number } |
Fields | |
---|---|
rootMeanSquaredError |
Output only. Root Mean Squared Error (RMSE). |
meanAbsoluteError |
Output only. Mean Absolute Error (MAE). |
meanAbsolutePercentageError |
Output only. Mean absolute percentage error. Only set if all ground truth values are are positive. |
rSquared |
Output only. R squared. |
rootMeanSquaredLogError |
Output only. Root mean squared log error. |
TranslationEvaluationMetrics
Evaluation metrics for the dataset.
JSON representation | |
---|---|
{ "bleuScore": number, "baseBleuScore": number } |
Fields | |
---|---|
bleuScore |
Output only. BLEU score. |
baseBleuScore |
Output only. BLEU score for base model. |
ImageObjectDetectionEvaluationMetrics
Model evaluation metrics for image object detection problems. Evaluates prediction quality of labeled bounding boxes.
JSON representation | |
---|---|
{
"evaluatedBoundingBoxCount": integer,
"boundingBoxMetricsEntries": [
{
object ( |
Fields | |
---|---|
evaluatedBoundingBoxCount |
Output only. The total number of bounding boxes (i.e. summed over all images) the ground truth used to create this evaluation had. |
boundingBoxMetricsEntries[] |
Output only. The bounding boxes match metrics for each Intersection-over-union threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and each label confidence threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 pair. |
boundingBoxMeanAveragePrecision |
Output only. The single metric for bounding boxes evaluation: the meanAveragePrecision averaged over all boundingBoxMetricsEntries. |
BoundingBoxMetricsEntry
Bounding box matching model metrics for a single intersection-over-union threshold and multiple label match confidence thresholds.
JSON representation | |
---|---|
{
"iouThreshold": number,
"meanAveragePrecision": number,
"confidenceMetricsEntries": [
{
object ( |
Fields | |
---|---|
iouThreshold |
Output only. The intersection-over-union threshold value used to compute this metrics entry. |
meanAveragePrecision |
Output only. The mean average precision, most often close to auPrc. |
confidenceMetricsEntries[] |
Output only. Metrics for each label-match confidenceThreshold from 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99. Precision-recall curve is derived from them. |
ConfidenceMetricsEntry
Metrics for a single confidence threshold.
JSON representation | |
---|---|
{ "confidenceThreshold": number, "recall": number, "precision": number, "f1Score": number } |
Fields | |
---|---|
confidenceThreshold |
Output only. The confidence threshold value used to compute the metrics. |
recall |
Output only. Recall under the given confidence threshold. |
precision |
Output only. Precision under the given confidence threshold. |
f1Score |
Output only. The harmonic mean of recall and precision. |
VideoObjectTrackingEvaluationMetrics
Model evaluation metrics for video object tracking problems. Evaluates prediction quality of both labeled bounding boxes and labeled tracks (i.e. series of bounding boxes sharing same label and instance ID).
JSON representation | |
---|---|
{
"evaluatedFrameCount": integer,
"evaluatedBoundingBoxCount": integer,
"boundingBoxMetricsEntries": [
{
object ( |
Fields | |
---|---|
evaluatedFrameCount |
Output only. The number of video frames used to create this evaluation. |
evaluatedBoundingBoxCount |
Output only. The total number of bounding boxes (i.e. summed over all frames) the ground truth used to create this evaluation had. |
boundingBoxMetricsEntries[] |
Output only. The bounding boxes match metrics for each Intersection-over-union threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and each label confidence threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 pair. |
boundingBoxMeanAveragePrecision |
Output only. The single metric for bounding boxes evaluation: the meanAveragePrecision averaged over all boundingBoxMetricsEntries. |
TextSentimentEvaluationMetrics
Model evaluation metrics for text sentiment problems.
JSON representation | |
---|---|
{
"precision": number,
"recall": number,
"f1Score": number,
"meanAbsoluteError": number,
"meanSquaredError": number,
"linearKappa": number,
"quadraticKappa": number,
"confusionMatrix": {
object ( |
Fields | |
---|---|
precision |
Output only. Precision. |
recall |
Output only. Recall. |
f1Score |
Output only. The harmonic mean of recall and precision. |
meanAbsoluteError |
Output only. Mean absolute error. Only set for the overall model evaluation, not for evaluation of a single annotation spec. |
meanSquaredError |
Output only. Mean squared error. Only set for the overall model evaluation, not for evaluation of a single annotation spec. |
linearKappa |
Output only. Linear weighted kappa. Only set for the overall model evaluation, not for evaluation of a single annotation spec. |
quadraticKappa |
Output only. Quadratic weighted kappa. Only set for the overall model evaluation, not for evaluation of a single annotation spec. |
confusionMatrix |
Output only. Confusion matrix of the evaluation. Only set for the overall model evaluation, not for evaluation of a single annotation spec. |
annotationSpecId[] |
Output only. The annotation spec ids used for this evaluation. Deprecated . |
TextExtractionEvaluationMetrics
Model evaluation metrics for text extraction problems.
JSON representation | |
---|---|
{
"auPrc": number,
"confidenceMetricsEntries": [
{
object ( |
Fields | |
---|---|
auPrc |
Output only. The Area under precision recall curve metric. |
confidenceMetricsEntries[] |
Output only. Metrics that have confidence thresholds. Precision-recall curve can be derived from it. |
ConfidenceMetricsEntry
Metrics for a single confidence threshold.
JSON representation | |
---|---|
{ "confidenceThreshold": number, "recall": number, "precision": number, "f1Score": number } |
Fields | |
---|---|
confidenceThreshold |
Output only. The confidence threshold value used to compute the metrics. Only annotations with score of at least this threshold are considered to be ones the model would return. |
recall |
Output only. Recall under the given confidence threshold. |
precision |
Output only. Precision under the given confidence threshold. |
f1Score |
Output only. The harmonic mean of recall and precision. |
Methods |
|
---|---|
|
Gets a model evaluation. |
|
Lists model evaluations. |