Exporting labeled data

When the labeling operation is complete, you can export the annotated dataset to your Google Cloud Storage bucket by calling ExportData.

ExportData returns a .csv file containing one row for each image (for image classification tasks) or each image bounding box, or a json file for text entity extraction tasks.

  • For image classification tasks, the format is: ,image_url,label_1,label_2,...

  • For image bounding box tasks, the format is: ,image_url,label,0.1,0.1,,,0.3,0.3,,

    The four points are top-left, top-right, bottom-right, bottom-left. The second and fourth points are optional. Each point is represented by x,y.

    Each line will contain one bounding box. Multiple boxes of an image will be in multiple lines.

  • For image bounding polygon and oriented bounding box tasks, the format is: ,image_url,label,0.1,0.1,,,0.3,0.3,,,0.6,0.6,,...

    Each point in the closed polygon is represented by the x,y point, separated by two empty csv column. The last pair connects back to the first pair to close the polygon. Each line will contain one polygon.

  • For image polyline tasks, the format is: ,image_url,label,0.1,0.1,,,0.3,0.3,,,0.6,0.6,,...

    Similar to polygon above but the last pair doesn't connect to the first pair to create a polyline. Each line will contain one polyline.

  • For image segmentation tasks, the output is json file with following format: { Annotation_colors { "0x000000": {display_name: "dog"} "0x111111": {display_name: "cat"} } image_bytes: "jpeg_encoded_colormap_bytes" }

  • For video classification tasks, the format is: ,video_url,label,segment_start_time,segment_end_time

  • For video object detection tasks, the format is: ,video_url,label,timestamp,0.1,0.1,,,0.3,0.3,,

    The four points are top-left, top-right, bottom-right, bottom-left. The second and fourth points are optional. Each point is represented by x,y. Each line will contain one bounding box.

  • For video object tracking tasks, the format is: ,video_url,label,instance_id,timestamp,0.1,0.1,,,0.3,0.3,,

    The four points are top-left, top-right, bottom-right, bottom-left. The second and fourth points are optional. Each point is represented by x,y. Each line will contain one bounding box.

  • For video event tasks, the format is: ,video_url,label,segment_start_time,segment_end_time

  • For text classification tasks, the format is: ,text_url,label_l

  • For text classification tasks with sentiment enabled, the format is: ,text_url,label_l,sentiment

  • For text entity extraction tasks, returns a .json file. The format is the same as AutoML NL input: { "annotations": [ { "text_extraction": { "text_segment": { "end_offset": number, "start_offset": number } }, "display_name": string }, ... ], "text_snippet": {"content": string} }

ExportData is a long running operation. The API will return an operation id. You can use the operation id to call GetOperation to get status for it later.

Web UI

Follow these steps to export the labeled data by using the Data Labeling Service UI.

  1. Open the Data Labeling Service UI in the Google Cloud Platform Console.

    The Datasets page shows the status of previously created datasets for the current project.

  2. Click the dataset name of the dataset you want to export. This takes you to the Dataset detail page.

  3. In the Labeled datasets section, click EXPORT in the Export status column.

  4. In the Export labeled dataset dialog, enter the Cloud Storage path to use for the output file, and select the file format that you want.

  5. Click EXPORT.

    The Dataset detail page shows an in-progress status while your data is being exported. Once it is completed, you can find the export file at the Cloud Storage path that you specified.

Command-line

Set the following environment variables:

  1. PROJECT_ID variable to your Google Cloud project ID.
  2. DATASET_ID variable to the ID of your dataset, from the response when you created the dataset. The ID appears at the end of the full dataset name:

    projects/project-id/locations/us-central1/datasets/dataset-id
  3. ANNOTATED_DATASET_ID variable to the ID of your annotated dataset resource name. The resource name is in the following format:

    projects/project-id/locations/us-central1/datasets/dataset-id/annotatedDatasets/annotated-dataset-id
  4. STORAGE_URI variable to the URI of the Cloud Storage bucket where you want the results stored.

For all annotation requests except image segmentation, the curl request looks similar to the following:

curl -X POST \
   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
   -H "Content-Type: application/json" \
   https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \
   -d '{
     "annotatedDataset": "${ANNOTATED_DATASET_ID}",
     "outputConfig": {
       "gcsDestination": {
           "output_uri": "${STORAGE_URI}",
           "mimeType": "text/csv"
       }
     }
   }'

To export image segmentation data, the curl request looks like the following:

curl -X POST \
   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
   -H "Content-Type: application/json" \
   https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \
   -d '{
     "annotatedDataset": "${ANNOTATED_DATASET_ID}",
     "outputConfig": {
       "gcsFolderDestination": {
         "output_folder_uri": "${STORAGE_URI}"
       }
     }
   }'

You should see output similar to the following:

{
  "name": "projects/data-labeling-codelab/operations/5c73dd6b_0000_2b34_a920_883d24fa2064",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.data-labeling.v1beta1.ExportDataOperationResponse",
    "dataset": "projects/data-labeling-codelab/datasets/5c73db3d_0000_23e0_a25b_94eb2c119c4c"
  }
}

Python

Before you can run this code example, you must install the Python Client Libraries.

def export_data(dataset_resource_name, annotated_dataset_resource_name,
                export_gcs_uri):
    """Exports a dataset from the given Google Cloud project."""
    from google.cloud import datalabeling_v1beta1 as datalabeling
    client = datalabeling.DataLabelingServiceClient()

    gcs_destination = datalabeling.types.GcsDestination(
        output_uri=export_gcs_uri, mime_type='text/csv')

    output_config = datalabeling.types.OutputConfig(
        gcs_destination=gcs_destination)

    response = client.export_data(
        dataset_resource_name,
        annotated_dataset_resource_name,
        output_config
    )

    print('Dataset ID: {}\n'.format(response.result().dataset))
    print('Output config:')
    print('\tGcs destination:')
    print('\t\tOutput URI: {}\n'.format(
        response.result().output_config.gcs_destination.output_uri))

Java

Before you can run this code example, you must install the Java Client Libraries.
import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceClient;
import com.google.cloud.datalabeling.v1beta1.ExportDataOperationMetadata;
import com.google.cloud.datalabeling.v1beta1.ExportDataOperationResponse;
import com.google.cloud.datalabeling.v1beta1.ExportDataRequest;
import com.google.cloud.datalabeling.v1beta1.GcsDestination;
import com.google.cloud.datalabeling.v1beta1.LabelStats;
import com.google.cloud.datalabeling.v1beta1.OutputConfig;
import java.io.IOException;
import java.util.Map.Entry;
import java.util.Set;
import java.util.concurrent.ExecutionException;

class ExportData {

  // Export data from an annotated dataset.
  static void exportData(String datasetName, String annotatedDatasetName, String gcsOutputUri) {
    // String datasetName = DataLabelingServiceClient.formatDatasetName(
    //     "YOUR_PROJECT_ID", "YOUR_DATASETS_UUID");
    // String annotatedDatasetName = DataLabelingServiceClient.formatAnnotatedDatasetName(
    //     "YOUR_PROJECT_ID",
    //     "YOUR_DATASET_UUID",
    //     "YOUR_ANNOTATED_DATASET_UUID");
    // String gcsOutputUri = "gs://YOUR_BUCKET_ID/export_path";

    try (DataLabelingServiceClient dataLabelingServiceClient = DataLabelingServiceClient.create()) {
      GcsDestination gcsDestination = GcsDestination.newBuilder()
          .setOutputUri(gcsOutputUri)
          .setMimeType("text/csv")
          .build();

      OutputConfig outputConfig = OutputConfig.newBuilder()
          .setGcsDestination(gcsDestination)
          .build();

      ExportDataRequest exportDataRequest = ExportDataRequest.newBuilder()
          .setName(datasetName)
          .setOutputConfig(outputConfig)
          .setAnnotatedDataset(annotatedDatasetName)
          .build();

      OperationFuture<ExportDataOperationResponse, ExportDataOperationMetadata> operation =
          dataLabelingServiceClient.exportDataAsync(exportDataRequest);

      ExportDataOperationResponse response = operation.get();

      System.out.format("Exported item count: %d\n", response.getExportCount());
      LabelStats labelStats = response.getLabelStats();
      Set<Entry<String, Long>> entries = labelStats.getExampleCountMap().entrySet();
      for (Entry<String, Long> entry : entries) {
        System.out.format("\tLabel: %s\n", entry.getKey());
        System.out.format("\tCount: %d\n\n", entry.getValue());
      }
    } catch (IOException | InterruptedException | ExecutionException e) {
      e.printStackTrace();
    }
  }
}

หน้านี้มีประโยชน์ไหม โปรดแสดงความคิดเห็น

ส่งความคิดเห็นเกี่ยวกับ...

หากต้องการความช่วยเหลือ ให้ไปที่หน้าการสนับสนุน