Creating instructions for human labelers

Instructions give the human labelers information about how to apply labels to your data. The instructions should contain sample labeled data and other explicit directions.

AI Platform Data Labeling Service supports PDF instructions:

  • PDF instructions. PDF instructions can provide sophisticated directions such as positive and negative examples or descriptions for each case. It is also more convenient for you to create especially for some complicated tasks such as image bounding boxes or video object tracking.

A project can have multiple sets of instructions, each used for a different Data Labeling Service request. You can get a list of the available instructions and delete instructions you no longer need. More information can be found in the instructions resource page.

Design good instructions

Good instructions are the most important factor in getting good human labeling results. Since you know your use case best, you need to let the human labelers know what you want them to do. Here are some guidelines for creating good instructions:

  • The human labelers do not have your domain knowledge. The distinctions you ask labelers to make should be easy to understand for someone unfamiliar with your use case.

  • You should avoid to make the instructions too long. It is best if an labeler can review and understand them within 20 minutes.

  • Instructions should describe the concept of the task as well as details about how to label the data. For example, for a bounding box task, describe how you want labelers to draw the bounding box. Should it be a tight box or a loose box? If there are multiple instances of the object, should they draw one big bounding box or multiple smaller boxes?

  • If your instructions have a corresponding label set, they should cover all labels in that set. The label name in the instructions should match the name in the label set.

  • It often takes several iterations to create good instructions. We recommend having a small dataset labeled first, then adjusting your instructions based on what you see in the results you get back.

A good instructions file should include the following sections:

  • Label list and description: list all the labels you would like to use and describe the meaning of each label.
  • Examples: For each label, give at least 3 positive examples and 1 negative example. These examples should cover different cases.
  • Cover edge cases. Clarify as many edge cases as you can, This reduces the need for the labeler to interpret the label. For example, if you need to draw a bounding box for a person, it is better to clarify:
    • Do you need a box for each person if there are multiple people?
    • Do you need box if a person is occluded?
    • Do you need a box for a person who is partially shown in the image?
    • Do you need a box for a person in a picture or painting?
  • Describe how to add annotations. For example:
    • For a bounding box, do you need a tight box or loose box?
    • For text entity extraction, where should the interested entity start and end?
  • Clarification on labels. If two labels are similar or easy to mix up, give examples to clarify the difference.

Create instructions

PDF instructions

The examples below show what the PDF instructions may include. Labelers will review the instructions before they start the task.

PDF instructions 1

PDF instructions 2

You can create the instructions by creating a Google Slides and then exporting the slides as a PDF file.

Add instructions to a project

Web UI

  1. Open the Data Labeling Service UI.

    The Instructions page shows the status of previously created instructions for the current project.

    To add instructions for a different project, select the project from the drop-down list in the upper right of the title bar.

  2. Click the Create button in the title bar.

  3. On the New Instruction page, enter a name and description for the instructions file.

  4. From the Type of data drop-down, choose the type of data items that the labelers will be applying labels to: images, video, or text.

  5. In the Instructions location section, enter the full path to the instructions file.

    You must specify a PDF instructions file. The file must be in the same Google Cloud Storage bucket as the dataset and label set.

  6. Click Create instruction.

    You are returned to the Instructions list page. Your instructions will show an in progress status while they are being imported.

Command-line

The following example creates a label set named test_spec_set_display_name. You must have the environment variables PROJECT_ID and GCS_PDF_FILE_PATH defined and pointing to your Google Cloud project ID and the Cloud Storage URI of the PDF you want to use, respectively.
curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json" \
     https://datalabeling.googleapis.com//v1beta1/projects/${PROJECT_ID}/instructions \
     -d '{
         "instruction": {
             "displayName": "curl_testing_instruction",
             "description": "instruction for curl commands testing",
             "dataType": "IMAGE",
           "pdfInstruction": {
                 "gcsFileUri": "${GCS_PDF_FILE_PATH}"
           }
         },
     }'

You should see output similar to the following:

{
  "name": "projects/data-labeling-codelab/instructions/5c73dbc1_0000_23e0_a25b_94eb2c119c4c"
}

Python

Before you can run this code example, you must install the Python Client Libraries.

PdfInstruction should be provided. For instructions of other data types, replace dataType from "IMAGE" to corresponding data types.

def create_instruction(project_id, data_type, instruction_gcs_uri):
    """Creates a data labeling PDF instruction for the given Google Cloud
    project. The PDF file should be uploaded to the project in
    Google Cloud Storage.
    """
    from google.cloud import datalabeling_v1beta1 as datalabeling

    client = datalabeling.DataLabelingServiceClient()

    project_path = f"projects/{project_id}"

    pdf_instruction = datalabeling.PdfInstruction(gcs_file_uri=instruction_gcs_uri)

    instruction = datalabeling.Instruction(
        display_name="YOUR_INSTRUCTION_DISPLAY_NAME",
        description="YOUR_DESCRIPTION",
        data_type=data_type,
        pdf_instruction=pdf_instruction,
    )

    operation = client.create_instruction(
        request={"parent": project_path, "instruction": instruction}
    )

    result = operation.result()

    # The format of the resource name:
    # project_id/{project_id}/instruction/{instruction_id}
    print(f"The instruction resource name: {result.name}")
    print(f"Display name: {result.display_name}")
    print(f"Description: {result.description}")
    print("Create time:")
    print(f"\tseconds: {result.create_time.timestamp_pb().seconds}")
    print(f"\tnanos: {result.create_time.timestamp_pb().nanos}")
    print(f"Data type: {datalabeling.DataType(result.data_type).name}")
    print("Pdf instruction:")
    print(f"\tGcs file uri: {result.pdf_instruction.gcs_file_uri}\n")

    return result

Java

Before you can run this code example, you must install the Java Client Libraries.

PdfInstruction should be provided. For instructions of other data types, replace dataType from "IMAGE" to corresponding data types.

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.datalabeling.v1beta1.CreateInstructionMetadata;
import com.google.cloud.datalabeling.v1beta1.CreateInstructionRequest;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceClient;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceSettings;
import com.google.cloud.datalabeling.v1beta1.DataType;
import com.google.cloud.datalabeling.v1beta1.Instruction;
import com.google.cloud.datalabeling.v1beta1.PdfInstruction;
import com.google.cloud.datalabeling.v1beta1.ProjectName;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class CreateInstruction {

  // Create a instruction for a dataset.
  static void createInstruction(String projectId, String pdfUri) throws IOException {
    // String projectId = "YOUR_PROJECT_ID";
    // String pdfUri = "gs://YOUR_BUCKET_ID/path_to_pdf_or_csv";


    DataLabelingServiceSettings settings =
        DataLabelingServiceSettings.newBuilder()
            .build();
    try (DataLabelingServiceClient dataLabelingServiceClient =
        DataLabelingServiceClient.create(settings)) {
      ProjectName projectName = ProjectName.of(projectId);

      // There are two types of instructions: CSV (CsvInstruction) or PDF (PdfInstruction)
      PdfInstruction pdfInstruction = PdfInstruction.newBuilder().setGcsFileUri(pdfUri).build();

      Instruction instruction =
          Instruction.newBuilder()
              .setDisplayName("YOUR_INSTRUCTION_DISPLAY_NAME")
              .setDescription("YOUR_DESCRIPTION")
              .setDataType(DataType.IMAGE) // DataTypes: AUDIO, IMAGE, VIDEO, TEXT
              .setPdfInstruction(pdfInstruction) // .setCsvInstruction() or .setPdfInstruction()
              .build();

      CreateInstructionRequest createInstructionRequest =
          CreateInstructionRequest.newBuilder()
              .setInstruction(instruction)
              .setParent(projectName.toString())
              .build();

      OperationFuture<Instruction, CreateInstructionMetadata> operation =
          dataLabelingServiceClient.createInstructionAsync(createInstructionRequest);

      Instruction result = operation.get();

      System.out.format("Name: %s\n", result.getName());
      System.out.format("DisplayName: %s\n", result.getDisplayName());
      System.out.format("Description: %s\n", result.getDescription());
      System.out.format("GCS SOURCE URI: %s\n", result.getPdfInstruction().getGcsFileUri());
    } catch (IOException | InterruptedException | ExecutionException e) {
      e.printStackTrace();
    }
  }
}

Update instructions in a project

To update instructions, update the instructions file and then re-upload it as described in Add instructions to a project.

When you submit a data labeling task, the service takes a snapshot of the instructions file and uses that to direct the data labeling done by that task. This prevents the service from returning inconsistent results in the case where you update the instructions while a data labeling task is in progress. If you update the instructions, submit a new data labeling task in order to use the new instructions.