REST Resource: projects.locations.dataLabelingJobs

Resource: DataLabelingJob

DataLabelingJob is used to trigger a human labeling job on unlabeled data from the following Dataset:

JSON representation
{
  "name": string,
  "displayName": string,
  "datasets": [
    string
  ],
  "annotationLabels": {
    string: string,
    ...
  },
  "labelerCount": integer,
  "instructionUri": string,
  "inputsSchemaUri": string,
  "inputs": value,
  "state": enum (JobState),
  "labelingProgress": integer,
  "currentSpend": {
    object (Money)
  },
  "createTime": string,
  "updateTime": string,
  "error": {
    object (Status)
  },
  "labels": {
    string: string,
    ...
  },
  "specialistPools": [
    string
  ],
  "encryptionSpec": {
    object (EncryptionSpec)
  },
  "activeLearningConfig": {
    object (ActiveLearningConfig)
  }
}
Fields
name

string

Output only. Resource name of the DataLabelingJob.

displayName

string

Required. The user-defined name of the DataLabelingJob. The name can be up to 128 characters long and can be consist of any UTF-8 characters. Display name of a DataLabelingJob.

datasets[]

string

Required. Dataset resource names. Right now we only support labeling from a single Dataset. Format: projects/{project}/locations/{location}/datasets/{dataset}

annotationLabels

map (key: string, value: string)

Labels to assign to annotations generated by this DataLabelingJob.

Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable.

labelerCount

integer

Required. Number of labelers to work on each DataItem.

instructionUri

string

Required. The Google Cloud Storage location of the instruction pdf. This pdf is shared with labelers, and provides detailed description on how to label DataItems in Datasets.

inputsSchemaUri

string

Required. Points to a YAML file stored on Google Cloud Storage describing the config for a specific type of DataLabelingJob. The schema files that can be used here are found in the https://storage.googleapis.com/google-cloud-aiplatform bucket in the /schema/datalabelingjob/inputs/ folder.

inputs

value (Value format)

Required. Input config parameters for the DataLabelingJob.

state

enum (JobState)

Output only. The detailed state of the job.

labelingProgress

integer

Output only. Current labeling job progress percentage scaled in interval [0, 100], indicating the percentage of DataItems that has been finished.

currentSpend

object (Money)

Output only. Estimated cost(in US dollars) that the DataLabelingJob has incurred to date.

createTime

string (Timestamp format)

Output only. Timestamp when this DataLabelingJob was created.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

updateTime

string (Timestamp format)

Output only. Timestamp when this DataLabelingJob was updated most recently.

A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z" and "2014-10-02T15:01:23.045123456Z".

error

object (Status)

Output only. DataLabelingJob errors. It is only populated when job's state is JOB_STATE_FAILED or JOB_STATE_CANCELLED.

labels

map (key: string, value: string)

The labels with user-defined metadata to organize your DataLabelingJobs.

Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.

See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable. Following system labels exist for each DataLabelingJob:

  • "aiplatform.googleapis.com/schema": output only, its value is the inputs_schema's title.
specialistPools[]

string

The SpecialistPools' resource names associated with this job.

encryptionSpec

object (EncryptionSpec)

Customer-managed encryption key spec for a DataLabelingJob. If set, this DataLabelingJob will be secured by this key.

Note: Annotations created in the DataLabelingJob are associated with the EncryptionSpec of the Dataset they are exported to.

activeLearningConfig

object (ActiveLearningConfig)

Parameters that configure the active learning pipeline. Active learning will label the data incrementally via several iterations. For every iteration, it will select a batch of data based on the sampling strategy.

ActiveLearningConfig

Parameters that configure the active learning pipeline. Active learning will label the data incrementally by several iterations. For every iteration, it will select a batch of data based on the sampling strategy.

JSON representation
{
  "sampleConfig": {
    object (SampleConfig)
  },
  "trainingConfig": {
    object (TrainingConfig)
  },

  // Union field human_labeling_budget can be only one of the following:
  "maxDataItemCount": string,
  "maxDataItemPercentage": integer
  // End of list of possible types for union field human_labeling_budget.
}
Fields
sampleConfig

object (SampleConfig)

Active learning data sampling config. For every active learning labeling iteration, it will select a batch of data based on the sampling strategy.

trainingConfig

object (TrainingConfig)

CMLE training config. For every active learning labeling iteration, system will train a machine learning model on CMLE. The trained model will be used by data sampling algorithm to select DataItems.

Union field human_labeling_budget. Required. Max human labeling DataItems. The rest part will be labeled by machine. human_labeling_budget can be only one of the following:
maxDataItemCount

string (int64 format)

Max number of human labeled DataItems.

maxDataItemPercentage

integer

Max percent of total DataItems for human labeling.

SampleConfig

Active learning data sampling config. For every active learning labeling iteration, it will select a batch of data based on the sampling strategy.

JSON representation
{
  "sampleStrategy": enum (SampleStrategy),
  "initialBatchSamplePercentage": integer,
  "followingBatchSamplePercentage": integer
}
Fields
sampleStrategy

enum (SampleStrategy)

Field to choose sampling strategy. Sampling strategy will decide which data should be selected for human labeling in every batch.

initialBatchSamplePercentage

integer

The percentage of data needed to be labeled in the first batch.

followingBatchSamplePercentage

integer

The percentage of data needed to be labeled in each following batch (except the first batch).

SampleStrategy

Sample strategy decides which subset of DataItems should be selected for human labeling in every batch.

Enums
SAMPLE_STRATEGY_UNSPECIFIED Default will be treated as UNCERTAINTY.
UNCERTAINTY Sample the most uncertain data to label.

TrainingConfig

CMLE training config. For every active learning labeling iteration, system will train a machine learning model on CMLE. The trained model will be used by data sampling algorithm to select DataItems.

JSON representation
{
  "timeoutTrainingMilliHours": string
}
Fields
timeoutTrainingMilliHours

string (int64 format)

The timeout hours for the CMLE training job, expressed in milli hours i.e. 1,000 value in this field means 1 hour.

Methods

cancel

Cancels a DataLabelingJob.

create

Creates a DataLabelingJob.

delete

Deletes a DataLabelingJob.

get

Gets a DataLabelingJob.

list

Lists DataLabelingJobs in a Location.