REST Resource: projects.locations.models.modelEvaluations

Resource: ModelEvaluation

Evaluation results of a model.

JSON representation
{
  "name": string,
  "annotationSpecId": string,
  "displayName": string,
  "createTime": string,
  "evaluatedExampleCount": integer,

  // Union field metrics can be only one of the following:
  "classificationEvaluationMetrics": {
    object (ClassificationEvaluationMetrics)
  },
  "regressionEvaluationMetrics": {
    object (RegressionEvaluationMetrics)
  },
  "translationEvaluationMetrics": {
    object (TranslationEvaluationMetrics)
  },
  "imageObjectDetectionEvaluationMetrics": {
    object (ImageObjectDetectionEvaluationMetrics)
  },
  "videoObjectTrackingEvaluationMetrics": {
    object (VideoObjectTrackingEvaluationMetrics)
  },
  "textSentimentEvaluationMetrics": {
    object (TextSentimentEvaluationMetrics)
  },
  "textExtractionEvaluationMetrics": {
    object (TextExtractionEvaluationMetrics)
  }
  // End of list of possible types for union field metrics.
}
Fields
name

string

Output only. Resource name of the model evaluation. Format:

projects/{project_id}/locations/{locationId}/models/{modelId}/modelEvaluations/{model_evaluation_id}

annotationSpecId

string

Output only. The ID of the annotation spec that the model evaluation applies to. The The ID is empty for the overall model evaluation. For Tables annotation specs in the dataset do not exist and this ID is always not set, but for CLASSIFICATION

predictionType-s the displayName field is used.

displayName

string

Output only. The value of displayName at the moment when the model was trained. Because this field returns a value at model training time, for different models trained from the same dataset, the values may differ, since display names could had been changed between the two model's trainings. For Tables CLASSIFICATION

predictionType-s distinct values of the target column at the moment of the model evaluation are populated here. The displayName is empty for the overall model evaluation.

createTime

string (Timestamp format)

Output only. Timestamp when this model evaluation was created.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

evaluatedExampleCount

integer

Output only. The number of examples used for model evaluation, i.e. for which ground truth from time of model creation is compared against the predicted annotations created by the model. For overall ModelEvaluation (i.e. with annotationSpecId not set) this is the total number of all examples used for evaluation. Otherwise, this is the count of examples that according to the ground truth were annotated by the

annotationSpecId.

Union field metrics. Output only. Problem type specific evaluation metrics. metrics can be only one of the following:
classificationEvaluationMetrics

object (ClassificationEvaluationMetrics)

Model evaluation metrics for image, text, video and tables classification. Tables problem is considered a classification when the target column is CATEGORY DataType.

regressionEvaluationMetrics

object (RegressionEvaluationMetrics)

Model evaluation metrics for Tables regression. Tables problem is considered a regression when the target column has FLOAT64 DataType.

translationEvaluationMetrics

object (TranslationEvaluationMetrics)

Model evaluation metrics for translation.

imageObjectDetectionEvaluationMetrics

object (ImageObjectDetectionEvaluationMetrics)

Model evaluation metrics for image object detection.

videoObjectTrackingEvaluationMetrics

object (VideoObjectTrackingEvaluationMetrics)

Model evaluation metrics for video object tracking.

textSentimentEvaluationMetrics

object (TextSentimentEvaluationMetrics)

Evaluation metrics for text sentiment models.

textExtractionEvaluationMetrics

object (TextExtractionEvaluationMetrics)

Evaluation metrics for text extraction models.

ClassificationEvaluationMetrics

Model evaluation metrics for classification problems. Note: For Video Classification this metrics only describe quality of the Video Classification predictions of "segment_classification" type.

JSON representation
{
  "auPrc": number,
  "baseAuPrc": number,
  "auRoc": number,
  "logLoss": number,
  "confidenceMetricsEntry": [
    {
      object (ConfidenceMetricsEntry)
    }
  ],
  "confusionMatrix": {
    object (ConfusionMatrix)
  },
  "annotationSpecId": [
    string
  ]
}
Fields
auPrc

number

Output only. The Area Under Precision-Recall Curve metric. Micro-averaged for the overall evaluation.

baseAuPrc
(deprecated)

number

Output only. The Area Under Precision-Recall Curve metric based on priors. Micro-averaged for the overall evaluation. Deprecated.

auRoc

number

Output only. The Area Under Receiver Operating Characteristic curve metric. Micro-averaged for the overall evaluation.

logLoss

number

Output only. The Log Loss metric.

confidenceMetricsEntry[]

object (ConfidenceMetricsEntry)

Output only. Metrics for each confidenceThreshold in 0.00,0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and positionThreshold = INT32_MAX_VALUE. ROC and precision-recall curves, and other aggregated metrics are derived from them. The confidence metrics entries may also be supplied for additional values of positionThreshold, but from these no aggregated metrics are computed.

confusionMatrix

object (ConfusionMatrix)

Output only. Confusion matrix of the evaluation. Only set for MULTICLASS classification problems where number of labels is no more than 10. Only set for model level evaluation, not for evaluation per label.

annotationSpecId[]

string

Output only. The annotation spec ids used for this evaluation.

ConfidenceMetricsEntry

Metrics for a single confidence threshold.

JSON representation
{
  "confidenceThreshold": number,
  "positionThreshold": integer,
  "recall": number,
  "precision": number,
  "falsePositiveRate": number,
  "f1Score": number,
  "recallAt1": number,
  "precisionAt1": number,
  "falsePositiveRateAt1": number,
  "f1ScoreAt1": number,
  "truePositiveCount": string,
  "falsePositiveCount": string,
  "falseNegativeCount": string,
  "trueNegativeCount": string
}
Fields
confidenceThreshold

number

Output only. Metrics are computed with an assumption that the model never returns predictions with score lower than this value.

positionThreshold

integer

Output only. Metrics are computed with an assumption that the model always returns at most this many predictions (ordered by their score, descendingly), but they all still need to meet the confidenceThreshold.

recall

number

Output only. Recall (True Positive Rate) for the given confidence threshold.

precision

number

Output only. Precision for the given confidence threshold.

falsePositiveRate

number

Output only. False Positive Rate for the given confidence threshold.

f1Score

number

Output only. The harmonic mean of recall and precision.

recallAt1

number

Output only. The Recall (True Positive Rate) when only considering the label that has the highest prediction score and not below the confidence threshold for each example.

precisionAt1

number

Output only. The precision when only considering the label that has the highest prediction score and not below the confidence threshold for each example.

falsePositiveRateAt1

number

Output only. The False Positive Rate when only considering the label that has the highest prediction score and not below the confidence threshold for each example.

f1ScoreAt1

number

Output only. The harmonic mean of recallAt1 and precisionAt1.

truePositiveCount

string (int64 format)

Output only. The number of model created labels that match a ground truth label.

falsePositiveCount

string (int64 format)

Output only. The number of model created labels that do not match a ground truth label.

falseNegativeCount

string (int64 format)

Output only. The number of ground truth labels that are not matched by a model created label.

trueNegativeCount

string (int64 format)

Output only. The number of labels that were not created by the model, but if they would, they would not match a ground truth label.

ConfusionMatrix

Confusion matrix of the model running the classification.

JSON representation
{
  "annotationSpecId": [
    string
  ],
  "displayName": [
    string
  ],
  "row": [
    {
      object (Row)
    }
  ]
}
Fields
annotationSpecId[]

string

Output only. IDs of the annotation specs used in the confusion matrix. For Tables CLASSIFICATION

predictionType only list of [annotation_spec_display_name-s][] is populated.

displayName[]

string

Output only. Display name of the annotation specs used in the confusion matrix, as they were at the moment of the evaluation. For Tables CLASSIFICATION

predictionType-s, distinct values of the target column at the moment of the model evaluation are populated here.

row[]

object (Row)

Output only. Rows in the confusion matrix. The number of rows is equal to the size of annotationSpecId. row[i].example_count[j] is the number of examples that have ground truth of the annotationSpecId[i] and are predicted as annotationSpecId[j] by the model being evaluated.

Row

Output only. A row in the confusion matrix.

JSON representation
{
  "exampleCount": [
    integer
  ]
}
Fields
exampleCount[]

integer

Output only. Value of the specific cell in the confusion matrix. The number of values each row has (i.e. the length of the row) is equal to the length of the annotationSpecId field or, if that one is not populated, length of the displayName field.

RegressionEvaluationMetrics

Metrics for regression problems.

JSON representation
{
  "rootMeanSquaredError": number,
  "meanAbsoluteError": number,
  "meanAbsolutePercentageError": number,
  "rSquared": number,
  "rootMeanSquaredLogError": number
}
Fields
rootMeanSquaredError

number

Output only. Root Mean Squared Error (RMSE).

meanAbsoluteError

number

Output only. Mean Absolute Error (MAE).

meanAbsolutePercentageError

number

Output only. Mean absolute percentage error. Only set if all ground truth values are are positive.

rSquared

number

Output only. R squared.

rootMeanSquaredLogError

number

Output only. Root mean squared log error.

TranslationEvaluationMetrics

Evaluation metrics for the dataset.

JSON representation
{
  "bleuScore": number,
  "baseBleuScore": number
}
Fields
bleuScore

number

Output only. BLEU score.

baseBleuScore

number

Output only. BLEU score for base model.

ImageObjectDetectionEvaluationMetrics

Model evaluation metrics for image object detection problems. Evaluates prediction quality of labeled bounding boxes.

JSON representation
{
  "evaluatedBoundingBoxCount": integer,
  "boundingBoxMetricsEntries": [
    {
      object (BoundingBoxMetricsEntry)
    }
  ],
  "boundingBoxMeanAveragePrecision": number
}
Fields
evaluatedBoundingBoxCount

integer

Output only. The total number of bounding boxes (i.e. summed over all images) the ground truth used to create this evaluation had.

boundingBoxMetricsEntries[]

object (BoundingBoxMetricsEntry)

Output only. The bounding boxes match metrics for each Intersection-over-union threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and each label confidence threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 pair.

boundingBoxMeanAveragePrecision

number

Output only. The single metric for bounding boxes evaluation: the meanAveragePrecision averaged over all boundingBoxMetricsEntries.

BoundingBoxMetricsEntry

Bounding box matching model metrics for a single intersection-over-union threshold and multiple label match confidence thresholds.

JSON representation
{
  "iouThreshold": number,
  "meanAveragePrecision": number,
  "confidenceMetricsEntries": [
    {
      object (ConfidenceMetricsEntry)
    }
  ]
}
Fields
iouThreshold

number

Output only. The intersection-over-union threshold value used to compute this metrics entry.

meanAveragePrecision

number

Output only. The mean average precision, most often close to auPrc.

confidenceMetricsEntries[]

object (ConfidenceMetricsEntry)

Output only. Metrics for each label-match confidenceThreshold from 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99. Precision-recall curve is derived from them.

ConfidenceMetricsEntry

Metrics for a single confidence threshold.

JSON representation
{
  "confidenceThreshold": number,
  "recall": number,
  "precision": number,
  "f1Score": number
}
Fields
confidenceThreshold

number

Output only. The confidence threshold value used to compute the metrics.

recall

number

Output only. Recall under the given confidence threshold.

precision

number

Output only. Precision under the given confidence threshold.

f1Score

number

Output only. The harmonic mean of recall and precision.

VideoObjectTrackingEvaluationMetrics

Model evaluation metrics for video object tracking problems. Evaluates prediction quality of both labeled bounding boxes and labeled tracks (i.e. series of bounding boxes sharing same label and instance ID).

JSON representation
{
  "evaluatedFrameCount": integer,
  "evaluatedBoundingBoxCount": integer,
  "boundingBoxMetricsEntries": [
    {
      object (BoundingBoxMetricsEntry)
    }
  ],
  "boundingBoxMeanAveragePrecision": number
}
Fields
evaluatedFrameCount

integer

Output only. The number of video frames used to create this evaluation.

evaluatedBoundingBoxCount

integer

Output only. The total number of bounding boxes (i.e. summed over all frames) the ground truth used to create this evaluation had.

boundingBoxMetricsEntries[]

object (BoundingBoxMetricsEntry)

Output only. The bounding boxes match metrics for each Intersection-over-union threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 and each label confidence threshold 0.05,0.10,...,0.95,0.96,0.97,0.98,0.99 pair.

boundingBoxMeanAveragePrecision

number

Output only. The single metric for bounding boxes evaluation: the meanAveragePrecision averaged over all boundingBoxMetricsEntries.

TextSentimentEvaluationMetrics

Model evaluation metrics for text sentiment problems.

JSON representation
{
  "precision": number,
  "recall": number,
  "f1Score": number,
  "meanAbsoluteError": number,
  "meanSquaredError": number,
  "linearKappa": number,
  "quadraticKappa": number,
  "confusionMatrix": {
    object (ConfusionMatrix)
  },
  "annotationSpecId": [
    string
  ]
}
Fields
precision

number

Output only. Precision.

recall

number

Output only. Recall.

f1Score

number

Output only. The harmonic mean of recall and precision.

meanAbsoluteError

number

Output only. Mean absolute error. Only set for the overall model evaluation, not for evaluation of a single annotation spec.

meanSquaredError

number

Output only. Mean squared error. Only set for the overall model evaluation, not for evaluation of a single annotation spec.

linearKappa

number

Output only. Linear weighted kappa. Only set for the overall model evaluation, not for evaluation of a single annotation spec.

quadraticKappa

number

Output only. Quadratic weighted kappa. Only set for the overall model evaluation, not for evaluation of a single annotation spec.

confusionMatrix

object (ConfusionMatrix)

Output only. Confusion matrix of the evaluation. Only set for the overall model evaluation, not for evaluation of a single annotation spec.

annotationSpecId[]
(deprecated)

string

Output only. The annotation spec ids used for this evaluation. Deprecated .

TextExtractionEvaluationMetrics

Model evaluation metrics for text extraction problems.

JSON representation
{
  "auPrc": number,
  "confidenceMetricsEntries": [
    {
      object (ConfidenceMetricsEntry)
    }
  ]
}
Fields
auPrc

number

Output only. The Area under precision recall curve metric.

confidenceMetricsEntries[]

object (ConfidenceMetricsEntry)

Output only. Metrics that have confidence thresholds. Precision-recall curve can be derived from it.

ConfidenceMetricsEntry

Metrics for a single confidence threshold.

JSON representation
{
  "confidenceThreshold": number,
  "recall": number,
  "precision": number,
  "f1Score": number
}
Fields
confidenceThreshold

number

Output only. The confidence threshold value used to compute the metrics. Only annotations with score of at least this threshold are considered to be ones the model would return.

recall

number

Output only. Recall under the given confidence threshold.

precision

number

Output only. Precision under the given confidence threshold.

f1Score

number

Output only. The harmonic mean of recall and precision.

Methods

get

Gets a model evaluation.

list

Lists model evaluations.