This legacy version of AI Platform Data Labeling is deprecated and will no longer be available on Google Cloud after January 23, 2024. All the functionality of legacy AI Platform Data Labeling and new features are available on the Vertex AI platform. See Migrate to Vertex AI to learn how to migrate your resources.

Frequently asked questions (FAQ)

Who will label my data?

We have two primary vendors that are officially onboarded Subprocessors under our Cloud Data Processing Addendum (CDPA): GlobalLogic Technologies Ltd and Teleperformance Global Services. They are subject to all of the applicable standard Subprocessor security and compliance obligations set forth in the CDPA.

Can you provide any information about the security and protection of my data?

All data used in AI Platform Data Labeling Service and stored in Google Cloud is encrypted by default. Human labelers can only view your data during labeling. We will not disclose or use your data for any other purposes beyond your requested data labeling without your permission. If you delete the datasets labeled by the data labeling service, initiation of deletion of all copies of your data from our system will commence within 24 hours. We implement security measures intended to prevent data loss, unauthorized access, or spam on your data.

Can I label healthcare data?

Yes, AI Platform Data Labeling Service is HIPAA compliant and can be used to label healthcare data.

What quality control methods can I use to ensure the labeling quality?

You can request multiple human labelers to annotate each piece of your data. In cases where there is disagreement on labeling, we will get additional opinions from the other labelers until there is consensus or we have reached the maximum number of labelers that you have set.

For example, if you request 3 labelers:

For image classification tasks, we will have all 3 labelers classify each image and use the majority vote to decide the final answer.
For image bounding box tasks, we will have the first labeler draw the boxes and the second labeler verify them. If the second labeler disagrees and makes any edits, we will continue to the third one to get a majority opinion.

Instructions matter a lot to the labeling result since it teaches our labelers how to label your dataset. As a result, we encourage you to take a look at the tips about how to create good instructions. We may notify you if the instructions are unclear.
In addition, we encourage you to ramp up your data labeling jobs incrementally. Start your first labeling job with a small amount of data, and then see whether the results are what you expect. Revise your instructions according to the feedback and the results you have received, and then create subsequent jobs to iterate until you feel comfortable with sending larger quantities of data. This will help you get high quality results and make the best use of your budget.

What is the difference between a "task" and an "operation"?

A task is an action you perform using Data Labeling Service, such as importing data, exporting data, or requesting labeling. An operation is the Google long-running job that completes the task you request using an API call.

How do I know when an (import, export, or labeling) operation is done?

When you use the Data Labeling Service API to request import, export, or labeling, the response includes the name of the operation that will be completing the requested task. You can use the operation name to check the status of the request.
While the operation is running, you see a progressPercent field indicating the progress (if it's not shown, the progress is 0%). When the operation is complete, the response includes the value "done": true.
You also receive an email whenever an operation completes.

How do I get the ID of the annotated dataset after requesting labeling?

ListAnnotatedDatasets returns the names of your annotated datasets.The format of the name is projects/sample_project_id/datasets/test_dataset_id/annotatedDatasets/sample_id; the ID is the value that appears after annotatedDataSets/.

What does it mean when I get an HttpError 404 with the message "The requested resource accesses are not available. This request is rejected because of resource conflict."?

It means that another running operation is using the resource. For example, you might get this error if you request labeling before the import data operation is complete.

Why can't I delete my dataset/instruction/labeling task?

There is probably a resource conflict because a running operation is using the resource.

Do I have to manually type in all my labels one at a time to create a label set?

Yes, if you are using the AI Platform Data Labeling Service UI. If you are using the API, you can programmatically forward as many arguments as you want.

Why does my image bounding box data labeling request returns within a few minutes with no annotations?

Most likely your image format is not supported.

Why is the progress percentage still at zero a while after I submitted my labeling task?

Two possible reasons (you can reach out to cloudml-data-customer@google.com for more information):

Your task hasn't been picked up yet, due to a high volume of requests. The task is queued and will be started as soon as possible.
You requested multiple labelers per item and not all labelers have labeled any data items. For example, if you requested three labelers, a data item is marked complete only after all three labelers have finished labeling it. Even if all data items have been labeled by one or two labelers, the progress percentage would remain at zero.