Resource: DataLabelingJob
DataLabelingJob is used to trigger a human labeling job on unlabeled data from the following Dataset:
name
string
Output only. Resource name of the DataLabelingJob.
displayName
string
Required. The user-defined name of the DataLabelingJob. The name can be up to 128 characters long and can consist of any UTF-8 characters. Display name of a DataLabelingJob.
datasets[]
string
Required. Dataset resource names. Right now we only support labeling from a single Dataset. Format: projects/{project}/locations/{location}/datasets/{dataset}
annotationLabels
map (key: string, value: string)
Labels to assign to annotations generated by this DataLabelingJob.
label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable.
labelerCount
integer
Required. Number of labelers to work on each DataItem.
instructionUri
string
Required. The Google Cloud Storage location of the instruction pdf. This pdf is shared with labelers, and provides detailed description on how to label DataItems in Datasets.
inputsSchemaUri
string
Required. Points to a YAML file stored on Google Cloud Storage describing the config for a specific type of DataLabelingJob. The schema files that can be used here are found in the https://storage.googleapis.com/google-cloud-aiplatform bucket in the /schema/datalabelingjob/inputs/ folder.
Required. Input config parameters for the DataLabelingJob.
Output only. The detailed state of the job.
labelingProgress
integer
Output only. Current labeling job progress percentage scaled in interval [0, 100], indicating the percentage of DataItems that has been finished.
Output only. Estimated cost(in US dollars) that the DataLabelingJob has incurred to date.
Output only. timestamp when this DataLabelingJob was created.
A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z"
and "2014-10-02T15:01:23.045123456Z"
.
Output only. timestamp when this DataLabelingJob was updated most recently.
A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: "2014-10-02T15:01:23Z"
and "2014-10-02T15:01:23.045123456Z"
.
Output only. DataLabelingJob errors. It is only populated when job's state is JOB_STATE_FAILED
or JOB_STATE_CANCELLED
.
labels
map (key: string, value: string)
The labels with user-defined metadata to organize your DataLabelingJobs.
label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed.
See https://goo.gl/xmQnxf for more information and examples of labels. System reserved label keys are prefixed with "aiplatform.googleapis.com/" and are immutable. Following system labels exist for each DataLabelingJob:
- "aiplatform.googleapis.com/schema": output only, its value is the
inputs_schema
's title.
specialistPools[]
string
The SpecialistPools' resource names associated with this job.
Customer-managed encryption key spec for a DataLabelingJob. If set, this DataLabelingJob will be secured by this key.
Note: Annotations created in the DataLabelingJob are associated with the EncryptionSpec of the Dataset they are exported to.
Parameters that configure the active learning pipeline. Active learning will label the data incrementally via several iterations. For every iteration, it will select a batch of data based on the sampling strategy.
JSON representation |
---|
{ "name": string, "displayName": string, "datasets": [ string ], "annotationLabels": { string: string, ... }, "labelerCount": integer, "instructionUri": string, "inputsSchemaUri": string, "inputs": value, "state": enum ( |
ActiveLearningConfig
Parameters that configure the active learning pipeline. Active learning will label the data incrementally by several iterations. For every iteration, it will select a batch of data based on the sampling strategy.
Active learning data sampling config. For every active learning labeling iteration, it will select a batch of data based on the sampling strategy.
CMLE training config. For every active learning labeling iteration, system will train a machine learning model on CMLE. The trained model will be used by data sampling algorithm to select DataItems.
human_labeling_budget
. Required. Max human labeling DataItems. The rest part will be labeled by machine. human_labeling_budget
can be only one of the following:Max number of human labeled DataItems.
maxDataItemPercentage
integer
Max percent of total DataItems for human labeling.
JSON representation |
---|
{ "sampleConfig": { object ( |
SampleConfig
Active learning data sampling config. For every active learning labeling iteration, it will select a batch of data based on the sampling strategy.
Field to choose sampling strategy. Sampling strategy will decide which data should be selected for human labeling in every batch.
initial_batch_sample_size
. Decides sample size for the initial batch. initial_batch_sample_percentage is used by default. initial_batch_sample_size
can be only one of the following:initialBatchSamplePercentage
integer
The percentage of data needed to be labeled in the first batch.
following_batch_sample_size
. Decides sample size for the following batches. following_batch_sample_percentage is used by default. following_batch_sample_size
can be only one of the following:followingBatchSamplePercentage
integer
The percentage of data needed to be labeled in each following batch (except the first batch).
JSON representation |
---|
{ "sampleStrategy": enum ( |
SampleStrategy
Sample strategy decides which subset of DataItems should be selected for human labeling in every batch.
Enums | |
---|---|
SAMPLE_STRATEGY_UNSPECIFIED |
Default will be treated as UNCERTAINTY. |
UNCERTAINTY |
Sample the most uncertain data to label. |
TrainingConfig
CMLE training config. For every active learning labeling iteration, system will train a machine learning model on CMLE. The trained model will be used by data sampling algorithm to select DataItems.
The timeout hours for the CMLE training job, expressed in milli hours i.e. 1,000 value in this field means 1 hour.
JSON representation |
---|
{ "timeoutTrainingMilliHours": string } |
Methods |
|
---|---|
|
Cancels a DataLabelingJob. |
|
Creates a DataLabelingJob. |
|
Deletes a DataLabelingJob. |
|
Gets a DataLabelingJob. |
|
Lists DataLabelingJobs in a Location. |