Submitting text labeling requests

The AI Platform Data Labeling Service supports three types of text labeling tasks:

  • Classification tasks, where labelers assign one or more labels to each text segment. You could specify how many labelers label each text segment (should be five or less); the Data Labeling Service does a majority vote to determine the proper labels.
  • Classification tasks with sentiment, where the overall label input is the same as the text classification tasks, but besides one or labels, labelers could assign a sentiment regarding this label in this text segment, for example, "POSITIVE" or "NEGATIVE". The Data Labeling Service will collect the sentiment along with labels from the labelers.
  • Entity extraction tasks, where the labeler will be given a list of labels and a text segment (up to 100000 characters), and they will select the start and end place where the text is talking about for each label. They have the option to select "not included" as well. The Data Labeling Service will collect the indices of the selected text for each label.

The labeling request is a long-running operation. The response includes the operation ID, which you can use to check the status of the request. When the labeling is complete, the response includes the value "done": true.

Text classification tasks

Web UI

  1. Open the Data Labeling Service UI.

  2. Select Datasets from the left navigation.

    The Datasets page shows the status of previously created datasets for the current project.

  3. Click the name of the dataset you want to submit for labeling.

    Datasets with status Import complete are available to submit. The Type of data column shows whether the dataset includes images, videos, text, or audio.

  4. On the Dataset detail page, click the Create labeling task button in the title bar.

  5. On the New labeling task page, enter a name and description for the annotated dataset.

    The annotated dataset is the version of this dataset after human labelers have labeled it.

  6. From the Objective drop-down, select the type of labeling task you want performed on this dataset.

    The drop-down list includes only the objectives available for the type of data in this dataset. If you don't see the objective you want, it probably means you've selected a dataset with a different type of data in it. Close the New labeling task page and select a different dataset.

  7. From the Label set drop-down, choose the label set you want the labelers to apply to data items in this set.

    The drop-down list includes all label sets associated with this project. You must choose a set.

  8. From the Instruction drop-down, choose the instructions you want to provide to the labelers working with this dataset.

    The drop-down list includes all instructions associated with this project. You must include instructions in the labeling request.

  9. From the labelers per data item drop-down, specify how many labelers you want to have review each item in the dataset.

    The default is one, but you can request to have three or five labelers label each item.

  10. Click the check box to confirm that you understand how you will be charged for the labeling.

  11. Click Create.

Command-line

Set the following environment variables:
  1. PROJECT_ID variable to your Google Cloud project ID.
  2. DATASET_ID variable to the ID of your dataset, from the response when you created the dataset. The ID appears at the end of the full dataset name:

    projects/project-id/locations/us-central1/datasets/dataset-id
  3. INSTRUCTION_RESOURCE_NAME to the name of your instruction resource.
  4. ANNOTATION_SPEC_SET_RESOURCE_NAME to the name of your annotation spec set resource.
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}/text:label \
  -d '{
  "basicConfig": {
    "instruction": "${INSTRUCTION_RESOURCE_NAME}",
    "annotatedDatasetDisplayName": "curl_testing_annotated_dataset",
    "labelGroup": "test_label_group",
    "replica_count": 1
  },
  "feature": "TEXT_CLASSIFICATION",
  "textClassificationConfig": {
    "annotationSpecSet": "${ANNOTATION_SPEC_SET_RESOURCE_NAME}",
  },
}'

You should see output similar to the following. You can use the operation ID to get the status of the task. For an example, see Getting the status of an operation.

{
  "name": "projects/data-labeling-codelab/operations/5c73dd6b_0000_2b34_a920_883d24fa2064",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.data-labeling.v1beta1.LabelTextClassificationOperationMetadata",
    "dataset": "projects/data-labeling-codelab/datasets/5c73db3d_0000_23e0_a25b_94eb2c119c4c"
  }
}

Java

Before you can run this code example, you must install the Java Client Libraries.
import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.datalabeling.v1beta1.AnnotatedDataset;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceClient;
import com.google.cloud.datalabeling.v1beta1.HumanAnnotationConfig;
import com.google.cloud.datalabeling.v1beta1.LabelOperationMetadata;
import com.google.cloud.datalabeling.v1beta1.LabelTextRequest;
import com.google.cloud.datalabeling.v1beta1.LabelTextRequest.Feature;
import com.google.cloud.datalabeling.v1beta1.SentimentConfig;
import com.google.cloud.datalabeling.v1beta1.TextClassificationConfig;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LabelText {

  // Start a Text Labeling Task
  static void labelText(String formattedInstructionName, String formattedAnnotationSpecSetName,
      String formattedDatasetName) {
    // String formattedInstructionName = DataLabelingServiceClient.formatInstructionName(
    //      "YOUR_PROJECT_ID", "YOUR_INSTRUCTION_UUID");
    // String formattedAnnotationSpecSetName =
    //     DataLabelingServiceClient.formatAnnotationSpecSetName(
    //         "YOUR_PROJECT_ID", "YOUR_ANNOTATION_SPEC_SET_UUID");
    // String formattedDatasetName = DataLabelingServiceClient.formatDatasetName(
    //      "YOUR_PROJECT_ID", "YOUR_DATASET_UUID");

    try (DataLabelingServiceClient dataLabelingServiceClient = DataLabelingServiceClient.create()) {

      HumanAnnotationConfig humanAnnotationConfig = HumanAnnotationConfig.newBuilder()
          .setAnnotatedDatasetDisplayName("annotated_displayname")
          .setAnnotatedDatasetDescription("annotated_description")
          .setLanguageCode("en-us")
          .setInstruction(formattedInstructionName)
          .build();

      SentimentConfig sentimentConfig = SentimentConfig.newBuilder()
          .setEnableLabelSentimentSelection(false)
          .build();

      TextClassificationConfig textClassificationConfig = TextClassificationConfig.newBuilder()
          .setAnnotationSpecSet(formattedAnnotationSpecSetName)
          .setSentimentConfig(sentimentConfig)
          .build();

      LabelTextRequest labelTextRequest = LabelTextRequest.newBuilder()
          .setParent(formattedDatasetName)
          .setBasicConfig(humanAnnotationConfig)
          .setTextClassificationConfig(textClassificationConfig)
          .setFeature(Feature.TEXT_CLASSIFICATION)
          .build();

      OperationFuture<AnnotatedDataset, LabelOperationMetadata> operation =
          dataLabelingServiceClient.labelTextAsync(labelTextRequest);

      // You'll want to save this for later to retrieve your completed operation.
      System.out.format("Operation Name: %s\n", operation.getName());

    } catch (IOException | InterruptedException | ExecutionException e) {
      e.printStackTrace();
    }
  }
}

Entity extraction tasks

Web UI

  1. Open the Data Labeling Service UI.

  2. Select Datasets from the left navigation.

    The Datasets page shows the status of previously created datasets for the current project.

  3. Click the name of the dataset you want to submit for labeling.

    Datasets with status Import complete are available to submit. The Type of data column shows whether the dataset includes images, videos, text, or audio.

  4. On the Dataset detail page, click the Create labeling task button in the title bar.

  5. On the New labeling task page, enter a name and description for the annotated dataset.

    The annotated dataset is the version of this dataset after human labelers have labeled it.

  6. From the Objective drop-down, select the type of labeling task you want performed on this dataset.

    The drop-down list includes only the objectives available for the type of data in this dataset. If you don't see the objective you want, it probably means you've selected a dataset with a different type of data in it. Close the New labeling task page and select a different dataset.

  7. From the Label set drop-down, choose the label set you want the labelers to apply to data items in this set.

    The drop-down list includes all label sets associated with this project. You must choose a set.

  8. From the Instruction drop-down, choose the instructions you want to provide to the labelers working with this dataset.

    The drop-down list includes all instructions associated with this project. You must include instructions in the labeling request.

  9. From the labelers per data item drop-down, specify how many labelers you want to have review each item in the dataset.

    The default is one, but you can request to have three or five labelers label each item.

  10. Click the check box to confirm that you understand how you will be charged for the labeling.

  11. Click Create.

Command-line

Set the following environment variables:
  1. PROJECT_ID variable to your Google Cloud project ID.
  2. DATASET_ID variable to the ID of your dataset, from the response when you created the dataset. The ID appears at the end of the full dataset name:

    projects/project-id/locations/us-central1/datasets/dataset-id
  3. INSTRUCTION_RESOURCE_NAME to the name of your instruction resource.
  4. ANNOTATION_SPEC_SET_RESOURCE_NAME to the name of your annotation spec set resource.
curl -X POST \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}/text:label \
  -d '{
  "basicConfig": {
    "instruction": "${INSTRUCTION_RESOURCE_NAME}",
    "annotatedDatasetDisplayName": "curl_testing_annotated_dataset",
    "labelGroup": "test_label_group",
    "replica_count": 1
  },
  "feature": "TEXT_ENTITY_EXTRACTION",
  "textEntityExtractionConfig": {
    "annotationSpecSet": "${ANNOTATION_SPEC_SET_RESOURCE_NAME}",
  },
}'

You should see output similar to the following. You can use the operation ID to get the status of the task. For an example, see Getting the status of an operation.

{
  "name": "projects/data-labeling-codelab/operations/5c73dd6b_0000_2b34_a920_883d24fa2064",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.data-labeling.v1beta1.LabelTextEntityExtractionOperationMetadata",
    "dataset": "projects/data-labeling-codelab/datasets/5c73db3d_0000_23e0_a25b_94eb2c119c4c"
  }
}

Python

Before you can run this code example, you must install the Python Client Libraries.

def label_text(dataset_resource_name, instruction_resource_name,
               annotation_spec_set_resource_name):
  """Labels a text dataset."""
  from google.cloud import datalabeling_v1beta1 as datalabeling
  client = datalabeling.DataLabelingServiceClient()

  basic_config = datalabeling.types.HumanAnnotationConfig(
      instruction=instruction_resource_name,
      annotated_dataset_display_name='YOUR_ANNOTATED_DATASET_DISPLAY_NAME',
      label_group='YOUR_LABEL_GROUP',
      replica_count=1)

  feature = datalabeling.enums.LabelTextRequest.Feature.TEXT_ENTITY_EXTRACTION

  text_entity_extraction_config = datalabeling.types.TextEntityExtractionConfig(
      annotation_spec_set=annotation_spec_set_resource_name)

  response = client.label_text(
      dataset_resource_name,
      basic_config,
      feature,
      text_entity_extraction_config=text_entity_extraction_config)

  print('Label_text operation name: {}'.format(response.operation.name))
  return response

หน้านี้มีประโยชน์ไหม โปรดแสดงความคิดเห็น

ส่งความคิดเห็นเกี่ยวกับ...

หากต้องการความช่วยเหลือ ให้ไปที่หน้าการสนับสนุน