Resource: DataLabelingJob
DataLabelingJob is used to trigger a human labeling job on unlabeled data from the following Dataset:
JSON representation |
---|
{ "name": string, "displayName": string, "datasets": [ string ], "annotationLabels": { string: string, ... }, "labelerCount": integer, "instructionUri": string, "inputsSchemaUri": string, "inputs": value, "state": enum ( |
Fields | |
---|---|
name |
Output only. Resource name of the DataLabelingJob. |
displayName |
Required. The user-defined name of the DataLabelingJob. The name can be up to 128 characters long and can consist of any UTF-8 characters. Display name of a DataLabelingJob. |
datasets[] |
Required. Dataset resource names. Right now we only support labeling from a single Dataset. Format: |
annotationLabels |
Labels to assign to annotations generated by this DataLabelingJob. label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable. |
labelerCount |
Required. Number of labelers to work on each DataItem. |
instructionUri |
Required. The Google Cloud Storage location of the instruction pdf. This pdf is shared with labelers, and provides detailed description on how to label DataItems in Datasets. |
inputsSchemaUri |
Required. Points to a YAML file stored on Google Cloud Storage describing the config for a specific type of DataLabelingJob. The schema files that can be used here are found in the https://storage.googleapis.com/google-cloud-aiplatform bucket in the /schema/datalabelingjob/inputs/ folder. |
inputs |
Required. Input config parameters for the DataLabelingJob. |
state |
Output only. The detailed state of the job. |
labelingProgress |
Output only. Current labeling job progress percentage scaled in interval [0, 100], indicating the percentage of DataItems that has been finished. |
currentSpend |
Output only. Estimated cost(in US dollars) that the DataLabelingJob has incurred to date. |
createTime |
Output only. timestamp when this DataLabelingJob was created. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
updateTime |
Output only. timestamp when this DataLabelingJob was updated most recently. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: |
error |
Output only. DataLabelingJob errors. It is only populated when job's state is |
labels |
The labels with user-defined metadata to organize your DataLabelingJobs. label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable. Following system labels exist for each DataLabelingJob:
|
specialistPools[] |
The SpecialistPools' resource names associated with this job. |
encryptionSpec |
Customer-managed encryption key spec for a DataLabelingJob. If set, this DataLabelingJob will be secured by this key. Note: Annotations created in the DataLabelingJob are associated with the EncryptionSpec of the Dataset they are exported to. |
activeLearningConfig |
Parameters that configure the active learning pipeline. Active learning will label the data incrementally via several iterations. For every iteration, it will select a batch of data based on the sampling strategy. |
ActiveLearningConfig
Parameters that configure the active learning pipeline. Active learning will label the data incrementally by several iterations. For every iteration, it will select a batch of data based on the sampling strategy.
JSON representation |
---|
{ "sampleConfig": { object ( |
Fields | |
---|---|
sampleConfig |
Active learning data sampling config. For every active learning labeling iteration, it will select a batch of data based on the sampling strategy. |
trainingConfig |
CMLE training config. For every active learning labeling iteration, system will train a machine learning model on CMLE. The trained model will be used by data sampling algorithm to select DataItems. |
Union field human_labeling_budget . Required. Max human labeling DataItems. The rest part will be labeled by machine. human_labeling_budget can be only one of the following: |
|
maxDataItemCount |
Max number of human labeled DataItems. |
maxDataItemPercentage |
Max percent of total DataItems for human labeling. |
SampleConfig
Active learning data sampling config. For every active learning labeling iteration, it will select a batch of data based on the sampling strategy.
JSON representation |
---|
{ "sampleStrategy": enum ( |
Fields | |
---|---|
sampleStrategy |
Field to choose sampling strategy. Sampling strategy will decide which data should be selected for human labeling in every batch. |
Union field initial_batch_sample_size . Decides sample size for the initial batch. initial_batch_sample_percentage is used by default. initial_batch_sample_size can be only one of the following: |
|
initialBatchSamplePercentage |
The percentage of data needed to be labeled in the first batch. |
Union field following_batch_sample_size . Decides sample size for the following batches. following_batch_sample_percentage is used by default. following_batch_sample_size can be only one of the following: |
|
followingBatchSamplePercentage |
The percentage of data needed to be labeled in each following batch (except the first batch). |
SampleStrategy
Sample strategy decides which subset of DataItems should be selected for human labeling in every batch.
Enums | |
---|---|
SAMPLE_STRATEGY_UNSPECIFIED |
Default will be treated as UNCERTAINTY. |
UNCERTAINTY |
Sample the most uncertain data to label. |
TrainingConfig
CMLE training config. For every active learning labeling iteration, system will train a machine learning model on CMLE. The trained model will be used by data sampling algorithm to select DataItems.
JSON representation |
---|
{ "timeoutTrainingMilliHours": string } |
Fields | |
---|---|
timeoutTrainingMilliHours |
The timeout hours for the CMLE training job, expressed in milli hours i.e. 1,000 value in this field means 1 hour. |
Methods |
|
---|---|
|
Cancels a DataLabelingJob. |
|
Creates a DataLabelingJob. |
|
Deletes a DataLabelingJob. |
|
Gets a DataLabelingJob. |
|
Lists DataLabelingJobs in a Location. |