Creating instructions for human labelers

Instructions give the human labelers information about how to apply labels to your data. The instructions can include sample labeled data and other explicit directions.

The AI Platform Data Labeling Service supports two types of instructions:

  • CSV instructions (image classification only).. You provide a CSV file with labeled images as examples of how you want the labels to be applied. The CSV instructions should include examples for every annotation specification (label).

  • PDF instructions. PDF instructions can provide more sophisticated directions such as positive and negative examples or descriptions for each case. It's also more convenient for you to create especially for some complicated tasks such as image bounding boxes or video object tracking.

No instructions are needed for audio transcription. Audio transcribers are trained using a Google-owned convention to identify and label audio data.

A project can have multiple sets of instructions, each used for a different Data Labeling Service request. You can get a list of the available instructions and delete instructions you no longer need; see the instructions resource page for more information.

Designing good instructions

Good instructions are the most important factor in getting good human labeling results. Since you know your use case best, you need to let the human labelers know what you want them to do. Here are some guidelines for creating good instructions:

  • Remember that the human labelers don't have your domain knowledge. The distinctions you ask labelers to make should be easy to understand for someone unfamiliar with your use case.

  • Don't make the instructions too long. It's best if an labeler can review and understand them within 20 minutes.

  • Instructions should describe the concept of the task as well as details about how to label the data. For example, for a bounding box task, describe how you want labelers to draw the bounding box. Should it be a tight box or a loose box? If there are multiple instances of the object, should they draw one big bounding box or multiple smaller ones?

  • If your instructions have a corresponding annotation specification set, they should cover all labels in that set. The label name in the instructions should match the name in the annotation specification set.

  • It often takes several iterations to create good instructions. We recommend having a small dataset labeled first, then adjust your instructions based on what you see in the results you get back.

A good instruction should include the following sections:

  • Label list and description: list all the labels you would like to use and describe the meaning of each label.
  • Examples: For each label, give at least 3 positive examples and 1 negative example. These examples should cover different cases.
  • Edge cases: Clarify as many edge cases as you can, This reduces uncertainty by removing the need for the labeler to interpret the label. For example, if you need to draw a bounding box for a person, it is better to clarify:
    • Do you need a box for each person if there are multiple people?
    • Do you need box if a person is occluded?
    • Do you need a box for a person who is partially shown in the image?
    • Do you need a box for a person in a picture or painting?
  • How should annotations be added? For example:
    • For a bounding box, do you need a tight box or loose box?
    • For text entity extraction, where should the interested entity start and end?
  • Clarification on labels. If two labels are similar or easy to mix up, give examples to clarify the difference.

Create instructions

PDF instructions

The examples below show are examples of what the PDF instruction may include. Labelers will review the instructions before they start the task.

PDF instructions 1

PDF instructions 2

You can create the PDF instruction by creating a Google Slides presentation and then exporting it as a PDF file.

CSV instructions (optional)

CSV instruction is for straightforward image classification tasks. Each row of the .csv file provides a labeled example image in this format: gs://[BUCKET_NAME]/[IMAGE_NAME1], label_1, label_2, ...

The first column is image file location in your Google Cloud Storage bucket. Subsequent columns after are labels (from your annotation spec set) that apply to the image. These examples will be shown to labelers before they start doing the task.

Add instructions to a project

Web UI

  1. Open the Data Labeling Service UI.

    The Instructions page shows the status of previously created instructions for the current project.

    To add instructions for a different project, select the project from the drop-down list in the upper right of the title bar.

  2. Click the Create button in the title bar.

  3. On the New Instruction page, enter a name and description for the instructions file.

  4. From the Type of data drop-down, choose the type of data items that the labelers will be applying labels to: images, video, or text.

  5. In the Instructions location section, enter the full path to the instruction file(s).

    You must specify a PDF instructions file in the first box; a CSV instructions file is optional. The files must be in the same Google Cloud Storage bucket as the dataset and label set.

  6. Click Create instruction.

    You're returned to the Instructions list page; your instructions will show an in progress status while they are being imported.

Command-line

The following example creates an annotation specification set named test_spec_set_display_name. You must have the environment variables PROJECT_ID and GCS_PDF_FILE_PATH defined and pointing to your Google Cloud project ID and the Cloud Storage URI of the PDF you want to use, respectively.
curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json" \
     https://datalabeling.googleapis.com//v1beta1/projects/${PROJECT_ID}/instructions \
     -d '{
         "instruction": {
             "displayName": "curl_testing_instruction",
             "description": "instruction for curl commands testing",
             "dataType": "IMAGE",
           "pdfInstruction": {
                 "gcsFileUri": "${GCS_PDF_FILE_PATH}"
           }
         },
     }'

You should see output similar to the following:

{
  "name": "projects/data-labeling-codelab/instructions/5c73dbc1_0000_23e0_a25b_94eb2c119c4c"
}

Python

Before you can run this code example, you must install the Python Client Libraries.

At least one of CsvInstruction or PdfInstruction should be provided. They can be provided together. For instruction of other data types, replace dataType from "IMAGE" to corresponding data types.

def create_instruction(project_id, data_type, instruction_gcs_uri):
    """ Creates a data labeling PDF instruction for the given Google Cloud
    project. The PDF file should be uploaded to the project in
    Google Cloud Storage.
    """
    from google.cloud import datalabeling_v1beta1 as datalabeling
    client = datalabeling.DataLabelingServiceClient()

    project_path = client.project_path(project_id)

    pdf_instruction = datalabeling.types.PdfInstruction(
        gcs_file_uri=instruction_gcs_uri)

    instruction = datalabeling.types.Instruction(
        display_name='YOUR_INSTRUCTION_DISPLAY_NAME',
        description='YOUR_DESCRIPTION',
        data_type=data_type,
        pdf_instruction=pdf_instruction
    )

    operation = client.create_instruction(project_path, instruction)

    result = operation.result()

    # The format of the resource name:
    # project_id/{project_id}/instruction/{instruction_id}
    print('The instruction resource name: {}\n'.format(result.name))
    print('Display name: {}'.format(result.display_name))
    print('Description: {}'.format(result.description))
    print('Create time:')
    print('\tseconds: {}'.format(result.create_time.seconds))
    print('\tnanos: {}'.format(result.create_time.nanos))
    print('Data type: {}'.format(
        datalabeling.enums.DataType(result.data_type).name))
    print('Pdf instruction:')
    print('\tGcs file uri: {}'.format(
        result.pdf_instruction.gcs_file_uri))

    return result

Java

Before you can run this code example, you must install the Java Client Libraries.

At least one of CsvInstruction or PdfInstruction should be provided. They can be provided together. For instruction of other data types, replace dataType from "IMAGE" to corresponding data types.

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.datalabeling.v1beta1.CreateInstructionMetadata;
import com.google.cloud.datalabeling.v1beta1.CreateInstructionRequest;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceClient;
import com.google.cloud.datalabeling.v1beta1.DataType;
import com.google.cloud.datalabeling.v1beta1.Instruction;
import com.google.cloud.datalabeling.v1beta1.PdfInstruction;
import com.google.cloud.datalabeling.v1beta1.ProjectName;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class CreateInstruction {

  // Create a instruction for a dataset.
  static void createInstruction(String projectId, String pdfUri) {
    // String projectId = "YOUR_PROJECT_ID";
    // String pdfUri = "gs://YOUR_BUCKET_ID/path_to_pdf_or_csv";

    try (DataLabelingServiceClient dataLabelingServiceClient = DataLabelingServiceClient.create()) {

      ProjectName projectName = ProjectName.of(projectId);

      // There are two types of instructions: CSV (CsvInstruction) or PDF (PdfInstruction)
      PdfInstruction pdfInstruction = PdfInstruction.newBuilder()
          .setGcsFileUri(pdfUri)
          .build();

      Instruction instruction = Instruction.newBuilder()
          .setDisplayName("YOUR_INSTRUCTION_DISPLAY_NAME")
          .setDescription("YOUR_DESCRIPTION")
          .setDataType(DataType.IMAGE) // DataTypes: AUDIO, IMAGE, VIDEO, TEXT
          .setPdfInstruction(pdfInstruction)  // .setCsvInstruction() or .setPdfInstruction()
          .build();

      CreateInstructionRequest createInstructionRequest = CreateInstructionRequest.newBuilder()
          .setInstruction(instruction)
          .setParent(projectName.toString())
          .build();

      OperationFuture<Instruction, CreateInstructionMetadata> operation =
          dataLabelingServiceClient.createInstructionAsync(createInstructionRequest);

      Instruction result = operation.get();

      System.out.format("Name: %s\n", result.getName());
      System.out.format("DisplayName: %s\n", result.getDisplayName());
      System.out.format("Description: %s\n", result.getDescription());
      System.out.format("GCS SOURCE URI: %s\n", result.getPdfInstruction().getGcsFileUri());
    } catch (IOException | InterruptedException | ExecutionException e) {
      e.printStackTrace();
    }
  }
}

Update instructions in a project

To update instructions, update the instructions file and then re-upload it as described in Add instructions to a project.

When you submit a data labeling task, the service takes a snapshot of the instructions file and uses that to direct the data labeling done by that task. This prevents the service from returning inconsistent results in the case where you update the instructions while a data labeling task is in progress. If you update the instructions, submit a new data labeling task in order to use the new instructions.

หน้านี้มีประโยชน์ไหม โปรดแสดงความคิดเห็น

ส่งความคิดเห็นเกี่ยวกับ...

หากต้องการความช่วยเหลือ ให้ไปที่หน้าการสนับสนุน