When the labeling operation is complete, you can export the annotated dataset
to your Google Cloud Storage bucket by calling ExportData
.
ExportData
supports returning a .csv file containing one row for each annotation or data item. The first field indicates the ml usage category of this line, which defaults to UNASSIGNED. ExportData
also supports a jsonl file where each line represents an example which includes a data item and all the annotations. Below are examples for each type.
Image classification
csv line:
UNASSIGNED,image_url,label_1,label_2,...
json line:
{ "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "imagePayload":{ "mimeType":"IMAGE_PNG", "imageUri":"gs://sample_bucket/image.png" }, "annotations":[ { "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id", "annotationValue":{ "imageClassificationAnnotation":{ "annotationSpec":{ "displayName":"tulip", } } } } ] }
Image bounding box
csv line: Each line contains information about one bounding box, using x,y coordinates to represent each box corner. Multiple boxes for a single image are on separate lines. Line format is
UNASSIGNED, image_url, label, topleft_x, topleft_y, topright_x, topright_y, bottomright_x, bottomright_y, bottomleft_x, bottomleft_y
. The topright_x, topright_y, bottomleft_x, and bottomleft_y coordinates might be empty strings, because they provide redundant information.UNASSIGNED,image_url,label,0.1,0.1,,,0.3,0.3,,
json line: if a coordinate in normalizedVertices is not set, that field is 0 by default. This also applies to any coordinate based annotations.
{ "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "imagePayload":{ "mimeType":"IMAGE_PNG", "imageUri":"gs://sample_bucket/image.png" }, "annotations":[ { "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id", "annotationValue":{ "image_bounding_poly_annotation": { "annotationSpec": { "displayName": "tulip" }, "normalizedBoundingPoly": { "normalizedVertices": [ { "x": 0.1, "y": 0.2 }, { "x": 0.9, "y": 0.9 } ] } } } } ] }
Image bounding polygon, oriented bounding box and polyline
csv line: Each point in the closed polygon/polyline is represented by the x,y point, separated by two empty csv columns. The last pair connects back to the first pair for polygon while there is no closed cycle for polyline. Each line represents one polygon/polyline.
UNASSIGNED,image_url,label,0.1,0.1,,,0.3,0.3,,,0.6,0.6,,...
json line:
{ "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "imagePayload":{ "mimeType":"IMAGE_PNG", "imageUri":"gs://sample_bucket/image.png" }, "annotations":[ { "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id", "annotationValue":{ "image_bounding_poly_annotation": { "annotationSpec": { "displayName": "tulip" }, "normalizedBoundingPoly": { "normalizedVertices": [ { "x": 0.1, "y": 0.1 }, { "x": 0.1, "y": 0.2 }, { "x": 0.2, "y": 0.3 } ] } } } } ] }
Image segmentation
For image segmentation, only jsonl output is provided.
- json line: The imageBytes field in imageSegmentationAnnotation represents the
segmentation mask for that image. The color for each label (that is, each dog
and cat) is shown in the annotationColors field.
{ "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "imagePayload":{ "mimeType":"IMAGE_PNG", "imageUri":"gs://sample_bucket/image.png" }, "annotations":[ { "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id", "annotationValue":{ "imageSegmentationAnnotation": { "annotationColors": [ { "key": "rgb(0,0,255)", "value": { "display_name": "dog" } }, { "key": "rgb(0,255,0)", "value": { "display_name": "cat" } } ], "mimeType": "IMAGE_JPEG", "imageBytes": "/9j/4AAQSkZJRgABAQAAAQABAAD/2" } } } ] }
Video classification
csv line:
UNASSIGNED,video_url,label,segment_start_time,segment_end_time
json line:
{ "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "videoPayload": { "mimeType": "VIDEO_MP4", "resolution": { width: 720, height: 360 } "frameRate": 24 }, "annotations": [ { "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id", "annotationSource": 3, "annotationValue": { "videoClassificationAnnotation": { "timeSegment": { "startTimeOffset": { "seconds": 10 }, "endTimeOffset": { "seconds": 20 } }, "annotationSpec": { "displayName": "dog" } } } } ] }
Video object detection
csv line:The four points are top-left, top-right, bottom-right, bottom-left. The second and fourth points are optional. Each point is represented by x,y. Each line will contain one bounding box.
UNASSIGNED,video_url,label,timestamp,0.1,0.1,,,0.3,0.3,,
json line:
{ "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "videoPayload": { "mimeType": "VIDEO_MP4", "resolution": { width: 720, height: 360 } "frameRate": 24 }, "annotations": [ { "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id", "annotationSource": 3, "annotationValue": { "videoObjectTrackingAnnotation": { "annotationSpec": { "displayName": "tulip" }, "timeSegment": { "startTimeOffset": { "seconds": 10 }, "endTimeOffset": { "seconds": 10 } }, "objectTrackingFrames": [ { "normalizedBoundingPoly": { "normalizedVertices": [ { "x": 0.2, "y": 0.3 }, { "x": 0.9, "y": 0.5 } ] }, }, { "normalizedBoundingPoly": { "normalizedVertices": [ { "x": 0.3, "y": 0.3 }, { "x": 0.5, "y": 0.7 } ] }, } ] } } }]}
Video object tracking
csv line:The four points are top-left, top-right, bottom-right, bottom-left. The second and fourth points are optional. Each point is represented by x,y. Each line will contain one bounding box. Each object in the video is represented by a unique instance_id.
UNASSIGNED,video_url,label,instance_id,timestamp,0.1,0.1,,,0.3,0.3,,
json line:
{ "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "videoPayload": { "mimeType": "VIDEO_MP4", "resolution": { width: 720, height: 360 } "frameRate": 24 }, "annotations": [ { "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id", "annotationSource": 3, "annotationValue": { "videoObjectTrackingAnnotation": { "annotationSpec": { "displayName": "tulip" }, "timeSegment": { "startTimeOffset": { "seconds": 10 }, "endTimeOffset": { "seconds": 20 } }, "objectTrackingFrames": [ { "normalizedBoundingPoly": { "normalizedVertices": [ { "x": 0.2, "y": 0.3 }, { "x": 0.9, "y": 0.5 } ] }, "timeOffset": { "nanos": 1000000 } }, { "normalizedBoundingPoly": { "normalizedVertices": [ { "x": 0.3, "y": 0.3 }, { "x": 0.5, "y": 0.7 } ] }, "timeOffset": { "nanos": 84000000 } } ] } } }]}
Video event
csv line:The four points are top-left, top-right, bottom-right, bottom-left. The second and fourth points are optional. Each point is represented by x,y. Each line will contain one bounding box. Each object in the video is represented by a unique instance_id.
UNASSIGNED,video_url,label,segment_start_time,segment_end_time
json line:
{ "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "videoPayload": { "mimeType": "VIDEO_MP4", "resolution": { width: 720, height: 360 } "frameRate": 24 }, "annotations": [ { "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id", "annotationValue": { "videoEventAnnotation": { "annotationSpec": { "displayName": "Callie" }, "timeSegment": { "startTimeOffset": { "seconds": 123 }, "endTimeOffset": { "seconds": 150 } } } } } ] } } }]}
Text classification
csv line:
UNASSIGNED,text_url,label_l
json line:
{ "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "textPayload": { "textContent": "dummy_text_content", "textUri": "gs://test_bucket/file.txt", "wordCount": 1 } "annotations": [ { "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/fake_annotation_id", "annotationValue": { "textClassificationAnnotation": { "annotationSpec": { "displayName": "news" } } } } ], }
Text entity extraction
For text entity extraction, only jsonl output is provided.
- json line:
{ "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id", "textPayload": { "textContent": "dummy_text_content", "textUri": "gs://test_bucket/file.txt", "wordCount": 1 } "annotations": [ { "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/fake_annotation_id", "annotationValue": { "textEntityExtractionAnnotation": { "annotationSpec": { "displayName": "equations" }, "textSegment": { "startOffset": 10, "endOffset": 20 } } } } ], }
ExportData is a long running operation. The API will return an operation id. You can use the operation id to call GetOperation to get the status for it later.
Web UI
Follow these steps to export the labeled data by using the Data Labeling Service UI.
Open the Data Labeling Service UI in the Google Cloud console.
The Datasets page shows the status of previously created datasets for the current project.
Click the dataset name of the dataset you want to export. This takes you to the Dataset detail page.
In the Labeled datasets section, click EXPORT in the Export status column.
In the Export labeled dataset dialog, enter the Cloud Storage path to use for the output file, and select the file format that you want.
Click EXPORT.
The Dataset detail page shows an in-progress status while your data is being exported. Once it is completed, you can find the export file at the Cloud Storage path that you specified.
Command-line
Set the following environment variables:PROJECT_ID
variable to your Google Cloud project ID.-
DATASET_ID
variable to the ID of your dataset, from the response when you created the dataset. The ID appears at the end of the full dataset name:projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID
-
ANNOTATED_DATASET_ID
variable to the ID of your annotated dataset resource name. The resource name is in the following format:projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID/annotatedDatasets/ANNOTATED_DATASET_ID
STORAGE_URI
variable to the URI of the Cloud Storage bucket where you want the results stored.
For all annotation requests except image
segmentation, the curl
request looks similar to the following:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json" \ https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \ -d '{ "annotatedDataset": "${ANNOTATED_DATASET_ID}", "outputConfig": { "gcsDestination": { "output_uri": "${STORAGE_URI}", "mimeType": "text/csv" } } }'
To export image segmentation data, the curl
request looks like the following:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json" \ https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \ -d '{ "annotatedDataset": "${ANNOTATED_DATASET_ID}", "outputConfig": { "gcsFolderDestination": { "output_folder_uri": "${STORAGE_URI}" } } }'
You should see output similar to the following:
{ "name": "projects/data-labeling-codelab/operations/5c73dd6b_0000_2b34_a920_883d24fa2064", "metadata": { "@type": "type.googleapis.com/google.cloud.data-labeling.v1beta1.ExportDataOperationResponse", "dataset": "projects/data-labeling-codelab/datasets/5c73db3d_0000_23e0_a25b_94eb2c119c4c" } }