This legacy version of AI Platform Data Labeling is deprecated and will no longer be available on Google Cloud after January 23, 2024. All the functionality of legacy AI Platform Data Labeling and new features are available on the Vertex AI platform. See Migrate to Vertex AI to learn how to migrate your resources.
Stay organized with collections
Save and categorize content based on your preferences.
Who will label my data?
We have two primary vendors that are officially onboarded
Subprocessors
under our Cloud Data Processing Addendum
(CDPA): GlobalLogic
Technologies Ltd and Teleperformance Global Services. They are subject to
all of the applicable standard Subprocessor security and compliance
obligations set forth in the CDPA.
Can you provide any information about the security and protection of my
data?
All data used in AI Platform Data Labeling Service and stored in Google Cloud is
encrypted by default. Human labelers can only view your data during
labeling. We will not disclose or use your data for any other purposes
beyond your requested data labeling without your permission. If you delete
the datasets labeled by the data labeling service, initiation of deletion of
all copies of your data from our system will commence within 24 hours.
We implement security measures intended to prevent data loss, unauthorized
access, or spam on your data.
Can I label healthcare data?
Yes, AI Platform Data Labeling Service is
HIPAA compliant and can be used to
label healthcare data.
What quality control methods can I use to ensure the labeling quality?
You can request multiple human labelers to annotate each piece of your
data.
In cases where there is disagreement on labeling, we will get additional
opinions from the other labelers until there is consensus
or we have reached the maximum number of labelers that you have set.
For example, if you request 3 labelers:
For image classification tasks, we will have all 3 labelers classify each
image and use the majority vote to decide the final answer.
For image bounding box tasks, we will have the first labeler draw the
boxes and the second labeler verify them. If the second labeler disagrees
and makes any edits, we will continue to the third one to get a majority
opinion.
Instructions matter a lot to the labeling result since it teaches our
labelers how to label your dataset. As a result, we encourage you
to take a look at the tips
about how to create good instructions. We may notify you if the instructions
are unclear.
In addition, we encourage you to ramp up your data labeling jobs
incrementally. Start your first labeling job with a small amount of data,
and then see whether the results are what you expect.
Revise your instructions according to the feedback and the
results you have received, and then create subsequent jobs to iterate until
you feel comfortable with sending larger quantities of data. This will help
you get high quality results and make the best use of your budget.
What is the difference between a "task" and an "operation"?
A task is an action you perform using Data Labeling Service, such
as importing data, exporting data, or requesting labeling. An operation is the
Google long-running job that completes the task you request using an API call.
How do I know when an (import, export, or labeling) operation is done?
When you use the Data Labeling Service API to request import, export, or
labeling, the response includes the name of the operation that will be completing
the requested task. You can use the operation name to
check the status of the
request.
While the operation is running, you see a progressPercent field
indicating the progress (if it's not shown, the progress is 0%). When the operation is
complete, the response includes the value "done": true.
You also receive an email whenever an operation completes.
How do I get the ID of the annotated dataset after requesting labeling?
ListAnnotatedDatasets
returns the names of your annotated datasets.The format of the name is
projects/sample_project_id/datasets/test_dataset_id/annotatedDatasets/sample_id;
the ID is the value that appears after annotatedDataSets/.
What does it mean when I get an HttpError 404 with the message "The requested
resource accesses are not available. This request is rejected because of resource conflict."?
It means that another running operation is using the resource. For example,
you might get this error if you request labeling before the import data operation
is complete.
Why can't I delete my dataset/instruction/labeling task?
There is probably a resource conflict because a running operation is using
the resource.
Do I have to manually type in all my labels one at a time to create a label set?
Yes, if you are using the AI Platform Data Labeling Service UI. If you are using the API,
you can programmatically forward as many arguments as you want.
Why does my image bounding box data labeling request returns within a few
minutes with no annotations?
Most likely your image format is not supported.
Why is the progress percentage still at zero a while after I submitted my
labeling task?
Two possible reasons (you can reach out to
cloudml-data-customer@google.com for more information):
Your task hasn't been picked up yet, due to a high volume of requests.
The task is queued and will be started as soon as possible.
You requested multiple labelers per item and not all labelers have
labeled any data items. For example, if you requested three labelers, a
data item is marked complete only after all three labelers have finished
labeling it. Even if all data items have been labeled by one or two
labelers, the progress percentage would remain at zero.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-01-17 UTC."],[],[]]