REST Resource: projects.evaluationJobs

Resource: EvaluationJob

Defines an evaluation job that runs periodically to generate Evaluations. Creating an evaluation job is the starting point for using continuous evaluation.

JSON representation
{
  "name": string,
  "description": string,
  "state": enum (State),
  "schedule": string,
  "modelVersion": string,
  "evaluationJobConfig": {
    object (EvaluationJobConfig)
  },
  "annotationSpecSet": string,
  "labelMissingGroundTruth": boolean,
  "attempts": [
    {
      object (Attempt)
    }
  ],
  "createTime": string
}
Fields
name

string

Output only. After you create a job, Data Labeling Service assigns a name to the job with the following format:

"projects/{project_id}/evaluationJobs/{evaluation_job_id}"

description

string

Required. Description of the job. The description can be up to 25,000 characters long.

state

enum (State)

Output only. Describes the current state of the job.

schedule

string

Required. Describes the interval at which the job runs. This interval must be at least 1 day, and it is rounded to the nearest day. For example, if you specify a 50-hour interval, the job runs every 2 days.

You can provide the schedule in crontab format or in an English-like format.

Regardless of what you specify, the job will run at 10:00 AM UTC. Only the interval from this schedule is used, not the specific time of day.

modelVersion

string

Required. The AI Platform Prediction model version to be evaluated. Prediction input and output is sampled from this model version. When creating an evaluation job, specify the model version in the following format:

"projects/{project_id}/models/{model_name}/versions/{version_name}"

There can only be one evaluation job per model version.

evaluationJobConfig

object (EvaluationJobConfig)

Required. Configuration details for the evaluation job.

annotationSpecSet

string

Required. Name of the AnnotationSpecSet describing all the labels that your machine learning model outputs. You must create this resource before you create an evaluation job and provide its name in the following format:

"projects/{project_id}/annotationSpecSets/{annotation_spec_set_id}"

labelMissingGroundTruth

boolean

Required. Whether you want Data Labeling Service to provide ground truth labels for prediction input. If you want the service to assign human labelers to annotate your data, set this to true. If you want to provide your own ground truth labels in the evaluation job's BigQuery table, set this to false.

attempts[]

object (Attempt)

Output only. Every time the evaluation job runs and an error occurs, the failed attempt is appended to this array.

createTime

string (Timestamp format)

Output only. Timestamp of when this evaluation job was created.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

State

State of the job.

Enums
STATE_UNSPECIFIED
SCHEDULED

The job is scheduled to run at the configured interval. You can pause or delete the job.

When the job is in this state, it samples prediction input and output from your model version into your BigQuery table as predictions occur.

RUNNING

The job is currently running. When the job runs, Data Labeling Service does several things:

  1. If you have configured your job to use Data Labeling Service for ground truth labeling, the service creates a Dataset and a labeling task for all data sampled since the last time the job ran. Human labelers provide ground truth labels for your data. Human labeling may take hours, or even days, depending on how much data has been sampled. The job remains in the RUNNING state during this time, and it can even be running multiple times in parallel if it gets triggered again (for example 24 hours later) before the earlier run has completed. When human labelers have finished labeling the data, the next step occurs.

    If you have configured your job to provide your own ground truth labels, Data Labeling Service still creates a Dataset for newly sampled data, but it expects that you have already added ground truth labels to the BigQuery table by this time. The next step occurs immediately.

  2. Data Labeling Service creates an Evaluation by comparing your model version's predictions with the ground truth labels.

If the job remains in this state for a long time, it continues to sample prediction data into your BigQuery table and will run again at the next interval, even if it causes the job to run multiple times in parallel.

PAUSED The job is not sampling prediction input and output into your BigQuery table and it will not run according to its schedule. You can resume the job.
STOPPED The job has this state right before it is deleted.

EvaluationJobConfig

Configures specific details of how a continuous evaluation job works. Provide this configuration when you create an EvaluationJob.

JSON representation
{
  "inputConfig": {
    object (InputConfig)
  },
  "evaluationConfig": {
    object (EvaluationConfig)
  },
  "humanAnnotationConfig": {
    object (HumanAnnotationConfig)
  },
  "bigqueryImportKeys": {
    string: string,
    ...
  },
  "exampleCount": number,
  "exampleSamplePercentage": number,
  "evaluationJobAlertConfig": {
    object (EvaluationJobAlertConfig)
  },

  // Union field human_annotation_request_config can be only one of the
  // following:
  "imageClassificationConfig": {
    object (ImageClassificationConfig)
  },
  "boundingPolyConfig": {
    object (BoundingPolyConfig)
  },
  "textClassificationConfig": {
    object (TextClassificationConfig)
  }
  // End of list of possible types for union field
  // human_annotation_request_config.
}
Fields
inputConfig

object (InputConfig)

Rquired. Details for the sampled prediction input. Within this configuration, there are requirements for several fields:

  • dataType must be one of IMAGE, TEXT, or GENERAL_DATA.
  • annotationType must be one of IMAGE_CLASSIFICATION_ANNOTATION, TEXT_CLASSIFICATION_ANNOTATION, GENERAL_CLASSIFICATION_ANNOTATION, or IMAGE_BOUNDING_BOX_ANNOTATION (image object detection).
  • If your machine learning model performs classification, you must specify classificationMetadata.isMultiLabel.
  • You must specify bigquerySource (not gcsSource).

evaluationConfig

object (EvaluationConfig)

Required. Details for calculating evaluation metrics and creating Evaulations. If your model version performs image object detection, you must specify the boundingBoxEvaluationOptions field within this configuration. Otherwise, provide an empty object for this configuration.

humanAnnotationConfig

object (HumanAnnotationConfig)

Optional. Details for human annotation of your data. If you set labelMissingGroundTruth to true for this evaluation job, then you must specify this field. If you plan to provide your own ground truth labels, then omit this field.

Note that you must create an Instruction resource before you can specify this field. Provide the name of the instruction resource in the instruction field within this configuration.

bigqueryImportKeys

map (key: string, value: string)

Required. Prediction keys that tell Data Labeling Service where to find the data for evaluation in your BigQuery table. When the service samples prediction input and output from your model version and saves it to BigQuery, the data gets stored as JSON strings in the BigQuery table. These keys tell Data Labeling Service how to parse the JSON.

You can provide the following entries in this field:

  • data_json_key: the data key for prediction input. You must provide either this key or reference_json_key.
  • reference_json_key: the data reference key for prediction input. You must provide either this key or data_json_key.
  • label_json_key: the label key for prediction output. Required.
  • label_score_json_key: the score key for prediction output. Required.
  • bounding_box_json_key: the bounding box key for prediction output. Required if your model version perform image object detection.

Learn how to configure prediction keys.

exampleCount

number

Required. The maximum number of predictions to sample and save to BigQuery during each evaluation interval. This limit overrides exampleSamplePercentage: even if the service has not sampled enough predictions to fulfill example_sample_perecentage during an interval, it stops sampling predictions when it meets this limit.

exampleSamplePercentage

number

Required. Fraction of predictions to sample and save to BigQuery during each evaluation interval. For example, 0.1 means 10% of predictions served by your model version get saved to BigQuery.

evaluationJobAlertConfig

object (EvaluationJobAlertConfig)

Optional. Configuration details for evaluation job alerts. Specify this field if you want to receive email alerts if the evaluation job finds that your predictions have low mean average precision during a run.

Union field human_annotation_request_config. Required. Details for how you want human reviewers to provide ground truth labels. human_annotation_request_config can be only one of the following:
imageClassificationConfig

object (ImageClassificationConfig)

Specify this field if your model version performs image classification or general classification.

annotationSpecSet in this configuration must match EvaluationJob.annotationSpecSet. allowMultiLabel in this configuration must match classificationMetadata.isMultiLabel in inputConfig.

boundingPolyConfig

object (BoundingPolyConfig)

Specify this field if your model version performs image object detection (bounding box detection).

annotationSpecSet in this configuration must match EvaluationJob.annotationSpecSet.

textClassificationConfig

object (TextClassificationConfig)

Specify this field if your model version performs text classification.

annotationSpecSet in this configuration must match EvaluationJob.annotationSpecSet. allowMultiLabel in this configuration must match classificationMetadata.isMultiLabel in inputConfig.

EvaluationJobAlertConfig

Provides details for how an evaluation job sends email alerts based on the results of a run.

JSON representation
{
  "email": string,
  "minAcceptableMeanAveragePrecision": number
}
Fields
email

string

Required. An email address to send alerts to.

minAcceptableMeanAveragePrecision

number

Required. A number between 0 and 1 that describes a minimum mean average precision threshold. When the evaluation job runs, if it calculates that your model version's predictions from the recent interval have meanAveragePrecision below this threshold, then it sends an alert to your specified email.

Attempt

Records a failed evaluation job run.

JSON representation
{
  "attemptTime": string,
  "partialFailures": [
    {
      object (Status)
    }
  ]
}
Fields
attemptTime

string (Timestamp format)

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

partialFailures[]

object (Status)

Details of errors that occurred.

Methods

create

Creates an evaluation job.

delete

Stops and deletes an evaluation job.

get

Gets an evaluation job by resource name.

list

Lists all evaluation jobs within a project with possible filters.

patch

Updates an evaluation job.

pause

Pauses an evaluation job.

resume

Resumes a paused evaluation job.