Creating label sets

A label set is the set of labels you want the human labelers to use to label your images. For example, if you want to classify images based on whether they contain a dog or a cat, you create a label set with two labels: "Dog" and "Cat". (Actually, as noted below, you might also want labels for "Neither" and "Both".) Your label set can include up to 100 labels.

A project can have multiple label sets, each used for a different Data Labeling Service request. You can get a list of the available label sets and delete sets you no longer need; see the annotation specification set resource page for more information.

Design a good label set

Here are some guidelines for creating a high-quality label set.

  • Make each label's display name a meaningful word, such as "dog", "cat", or "building". Do not use abstract names like "label1" and "label2" or unfamiliar acronyms. The more meaningful the label names, the easier it is for human labelers to apply them accurately and consistently.
  • Make sure the labels are easily distinguishable from one another. For classification tasks where a single label is applied to each data item, try not to use labels whose meanings overlap.
  • For classification tasks, it is usually a good idea to include a label named "other" or "none", to use for data that don't match the other labels. If the only available labels are "dog" and "cat", for example, labelers will have to label every image with one of those labels. Your custom model is typically more robust if you include images other than dogs or cats in its training data.
  • Keep in mind that labelers are most efficient and accurate when you have at most 20 labels defined in the label set.

Create a label set resource

Web UI

  1. Open the Data Labeling Service UI.

    The Label sets page shows the status of previously created label sets for the current project.

    To add a label set for a different project, select the project from the drop-down list in the upper right of the title bar.

  2. Click the Create button in the title bar.

  3. On the Create a label set page, enter a name and description for the set.

  4. In the Labels section, enter names and descriptions for each label you want the human labelers to apply.

    After entering the name and description for a label, click Add label to add a row for an additional label. You can add up to 100 labels.

  5. Click Create to create the annotation specification set.

    You're returned to the Label sets list page.

Command-line

To create the label set resource, you list all labels in JSON format, then pass it to the Data Labeling Service.

The following example creates a label set named code_sample_label_set that has two labels.

Save the "name" of the new label set (from the response) for use with other operations, such as sending the labeling request.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json" \
     https://datalabeling.googleapis.com/v1beta1/projects/"${PROJECT_ID}"/annotationSpecSets \
     -d '{
       "annotationSpecSet": {
           "displayName": "code_sample_label_set",
           "description": "code sample general label set",
           "annotationSpecs": [
               {
                   "displayName": "dog",
                   "description": "label dog",
               },
               {
                   "displayName": "cat",
                   "description": "label cat",
               }
           ],
       },
    }'

You should see output similar to the following:

{
  "name": "projects/data-labeling-codelab/annotationSpecSets/5c73db2d_0000_2f46_983d_001a114a5d7c",
  "displayName": "code_sample_label_set",
  "description": "code sample general label set",
  "annotationSpecs": [
    {
      "displayName": "dog",
      "description": "label dog"
    },
    {
      "displayName": "cat",
      "description": "label cat"
    }
  ]
}

Python

Before you can run this code example, you must install the Python Client Libraries.

def create_annotation_spec_set(project_id):
    """Creates a data labeling annotation spec set for the given
    Google Cloud project.
    """
    from google.cloud import datalabeling_v1beta1 as datalabeling

    client = datalabeling.DataLabelingServiceClient()

    project_path = f"projects/{project_id}"

    annotation_spec_1 = datalabeling.AnnotationSpec(
        display_name="label_1", description="label_description_1"
    )

    annotation_spec_2 = datalabeling.AnnotationSpec(
        display_name="label_2", description="label_description_2"
    )

    annotation_spec_set = datalabeling.AnnotationSpecSet(
        display_name="YOUR_ANNOTATION_SPEC_SET_DISPLAY_NAME",
        description="YOUR_DESCRIPTION",
        annotation_specs=[annotation_spec_1, annotation_spec_2],
    )

    response = client.create_annotation_spec_set(
        request={"parent": project_path, "annotation_spec_set": annotation_spec_set}
    )

    # The format of the resource name:
    # project_id/{project_id}/annotationSpecSets/{annotationSpecSets_id}
    print(f"The annotation_spec_set resource name: {response.name}")
    print(f"Display name: {response.display_name}")
    print(f"Description: {response.description}")
    print("Annotation specs:")
    for annotation_spec in response.annotation_specs:
        print(f"\tDisplay name: {annotation_spec.display_name}")
        print(f"\tDescription: {annotation_spec.description}\n")

    return response

Java

Before you can run this code example, you must install the Java Client Libraries.
import com.google.cloud.datalabeling.v1beta1.AnnotationSpec;
import com.google.cloud.datalabeling.v1beta1.AnnotationSpecSet;
import com.google.cloud.datalabeling.v1beta1.CreateAnnotationSpecSetRequest;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceClient;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceSettings;
import com.google.cloud.datalabeling.v1beta1.ProjectName;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

class CreateAnnotationSpecSet {

  // Create an annotation spec set.
  static void createAnnotationSpecSet(String projectId) throws IOException {
    // String projectId = "YOUR_PROJECT_ID";

    Map<String, String> annotationLabels = new HashMap<>();
    annotationLabels.put("label_1", "label_1_description");
    annotationLabels.put("label_2", "label_2_description");


    DataLabelingServiceSettings settings =
        DataLabelingServiceSettings.newBuilder()
            .build();
    try (DataLabelingServiceClient dataLabelingServiceClient =
        DataLabelingServiceClient.create(settings)) {
      ProjectName projectName = ProjectName.of(projectId);

      List<AnnotationSpec> annotationSpecs = new ArrayList<>();
      for (Entry<String, String> entry : annotationLabels.entrySet()) {
        AnnotationSpec annotationSpec =
            AnnotationSpec.newBuilder()
                .setDisplayName(entry.getKey())
                .setDescription(entry.getValue())
                .build();
        annotationSpecs.add(annotationSpec);
      }

      AnnotationSpecSet annotationSpecSet =
          AnnotationSpecSet.newBuilder()
              .setDisplayName("YOUR_ANNOTATION_SPEC_SET_DISPLAY_NAME")
              .setDescription("YOUR_DESCRIPTION")
              .addAllAnnotationSpecs(annotationSpecs)
              .build();

      CreateAnnotationSpecSetRequest request =
          CreateAnnotationSpecSetRequest.newBuilder()
              .setAnnotationSpecSet(annotationSpecSet)
              .setParent(projectName.toString())
              .build();

      AnnotationSpecSet result = dataLabelingServiceClient.createAnnotationSpecSet(request);

      System.out.format("Name: %s\n", result.getName());
      System.out.format("DisplayName: %s\n", result.getDisplayName());
      System.out.format("Description: %s\n", result.getDescription());
      System.out.format("Annotation Count: %d\n", result.getAnnotationSpecsCount());

      for (AnnotationSpec annotationSpec : result.getAnnotationSpecsList()) {
        System.out.format("\tDisplayName: %s\n", annotationSpec.getDisplayName());
        System.out.format("\tDescription: %s\n\n", annotationSpec.getDescription());
      }
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

For continuous evaluation

When you create an evaluation job, you must specify a CSV file that defines your annotation specification set:

  • The file must have one row for every possible label your model outputs during prediction.
  • Each row should be a comma-separated pair containing the label and a description of the label: LABEL_NAME,DESCRIPTION
  • When you create an evaluation job, Data Labeling Service uses the filename of the CSV file as the name of an annotation specification set that it creates in the background.

For example, if your model predicts what animal is in an image, you could write the following specification to a file named animals.csv:

bird,any animal in the class Aves - see https://en.wikipedia.org/wiki/Bird
cat,any animal in the species Felis catus (domestic cats, not wild cats) - see https://en.wikipedia.org/wiki/Cat
dog,any animal in the genus Canis (domestic dogs and close relatives) - see https://en.wikipedia.org/wiki/Canis
multiple,image contains more than one of the above
none,image contains none of the above

Then, upload this file to a Cloud Storage bucket in the same project as your continuous evaluation job.