此旧版 AI Platform Data Labeling 已弃用，2024 年 1 月 23 日之后将不再在 Google Cloud 上提供。旧版 AI Platform Data Labeling 的所有功能以及新功能均在 Vertex AI 平台上提供。请参阅迁移到 Vertex AI，了解如何迁移资源。

此页面由 Cloud Translation API 翻译。

导出已加标签的数据

在加标签操作完成后，您可以调用 ExportData 将已添加注释的数据集导出到 Google Cloud Storage 存储分区。

ExportData 支持返回 .csv 文件，其中每个注释或数据项对应一行数据。第一个字段表示此行的 ml 使用类别，默认为 UNASSIGNED。ExportData 还支持 jsonl 文件，其中每一行代表一个示例，此示例包含一个数据项和所有注释。以下是每种类型的示例。

图片分类

csv 行：

UNASSIGNED,image_url,label_1,label_2,...

json 行：

{
"name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"imagePayload":{
"mimeType":"IMAGE_PNG",
"imageUri":"gs://sample_bucket/image.png"
},
"annotations":[
{
     "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
   "annotationValue":{
      "imageClassificationAnnotation":{
       "annotationSpec":{
            "displayName":"tulip",
         }
      }
   }
}
]
}

图片边界框

csv 行：每行都包含一个边界框的相关信息，并使用 x,y 坐标表示每个框角。单个图片的多个框位于单独的行中。行格式为 UNASSIGNED, image_url, label, topleft_x, topleft_y, topright_x, topright_y, bottomright_x, bottomright_y, bottomleft_x, bottomleft_y。topright_x、topright_y、bottomleft_x 和 bottomleft_y 坐标可能是空字符串，因为它们提供冗余的信息。

UNASSIGNED,image_url,label,0.1,0.1,,,0.3,0.3,,

json 行：如果未设置 normalizedVertices 中的坐标，则该字段默认为 0。这也适用于任何基于坐标的注释。

{
 "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
 "imagePayload":{
    "mimeType":"IMAGE_PNG",
    "imageUri":"gs://sample_bucket/image.png"
 },
 "annotations":[
    {
         "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
       "annotationValue":{
         "image_bounding_poly_annotation": {
          "annotationSpec": {
            "displayName": "tulip"
          },
          "normalizedBoundingPoly": {
          "normalizedVertices": [ {
              "x": 0.1,
              "y": 0.2
            }, {
              "x": 0.9,
              "y": 0.9
            } ]
          }
       }
    }
  }
 ]
}

图片边界多边形、定向边界框和折线

csv 行：封闭多边形/折线中的每个点由 x,y 点表示，并由两个空的 csv 列分隔。如果折线没有封闭循环，最后一对 x,y 会连接回多边形的第一对 x,y。每行代表一个多边形/一条折线。

UNASSIGNED,image_url,label,0.1,0.1,,,0.3,0.3,,,0.6,0.6,,...

json 行：

{
"name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"imagePayload":{
"mimeType":"IMAGE_PNG",
"imageUri":"gs://sample_bucket/image.png"
},
"annotations":[
{
     "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
   "annotationValue":{
     "image_bounding_poly_annotation": {
      "annotationSpec": {
        "displayName": "tulip"
      },
      "normalizedBoundingPoly": {
        "normalizedVertices": [ {
          "x": 0.1,
          "y": 0.1
        }, {
          "x": 0.1,
          "y": 0.2
        }, {
          "x": 0.2,
          "y": 0.3
        }  ]
      }
   }
}
}
]
}

图片分割

对于图片分割，仅提供 jsonl 输出。

json 行：imageSegmentationAnnotation 中的 imageBytes 字段表示该图片的分割掩码。每个标签（即每只狗和猫）的颜色都显示在 annotationColors 字段中。

{
"name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"imagePayload":{
"mimeType":"IMAGE_PNG",
"imageUri":"gs://sample_bucket/image.png"
},
"annotations":[
{
     "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
   "annotationValue":{
     "imageSegmentationAnnotation": {
        "annotationColors": [ {
          "key": "rgb(0,0,255)",
          "value": {
            "display_name": "dog"
          }
        }, {
          "key": "rgb(0,255,0)",
          "value": {
            "display_name": "cat"
          }
        } ],
        "mimeType": "IMAGE_JPEG",
        "imageBytes": "/9j/4AAQSkZJRgABAQAAAQABAAD/2"
   }
}
}
]
}

视频分类

csv 行：

UNASSIGNED,video_url,label,segment_start_time,segment_end_time

json 行：

{
"name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"videoPayload": {
  "mimeType": "VIDEO_MP4",
  "resolution": {
    width: 720,
    height: 360
  }
  "frameRate": 24
},
"annotations": [ {
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
  "annotationSource": 3,
  "annotationValue": {
    "videoClassificationAnnotation": {
      "timeSegment": {
        "startTimeOffset": {
          "seconds": 10
        },
        "endTimeOffset": {
          "seconds": 20
        }
      },
      "annotationSpec": {
        "displayName": "dog"
      }
    }
  }
} ]
}

视频对象检测

csv 行：四个点分别位于左上角、右上角、右下角、左下角。第二个点和第四个点是可选的。每个点由 x,y 表示。每行将包含一个边界框。

UNASSIGNED,video_url,label,timestamp,0.1,0.1,,,0.3,0.3,,

json 行：

{
"name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"videoPayload": {
  "mimeType": "VIDEO_MP4",
  "resolution": {
    width: 720,
    height: 360
  }
  "frameRate": 24
},
"annotations": [ {
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
  "annotationSource": 3,
  "annotationValue": {
    "videoObjectTrackingAnnotation": {
  "annotationSpec": {
    "displayName": "tulip"
  },
  "timeSegment": {
    "startTimeOffset": {
      "seconds": 10
    },
    "endTimeOffset": {
      "seconds": 10
    }
  },
  "objectTrackingFrames": [ {
    "normalizedBoundingPoly": {
      "normalizedVertices": [ {
        "x": 0.2,
        "y": 0.3
      }, {
        "x": 0.9,
        "y": 0.5
      } ]
    },
  }, {
    "normalizedBoundingPoly": {
      "normalizedVertices": [ {
        "x": 0.3,
        "y": 0.3
      }, {
        "x": 0.5,
        "y": 0.7
      } ]
    },
  } ]
}
}
}]}

视频对象跟踪

csv 行：四个点分别位于左上角、右上角、右下角、左下角。第二个点和第四个点是可选的。每个点由 x,y 表示。每行将包含一个边界框。视频中的每个对象均由非重复的 instance_id 表示。

UNASSIGNED,video_url,label,instance_id,timestamp,0.1,0.1,,,0.3,0.3,,

json 行：

{
"name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"videoPayload": {
  "mimeType": "VIDEO_MP4",
  "resolution": {
    width: 720,
    height: 360
  }
  "frameRate": 24
},
"annotations": [ {
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
  "annotationSource": 3,
  "annotationValue": {
    "videoObjectTrackingAnnotation": {
  "annotationSpec": {
    "displayName": "tulip"
  },
  "timeSegment": {
    "startTimeOffset": {
      "seconds": 10
    },
    "endTimeOffset": {
      "seconds": 20
    }
  },
  "objectTrackingFrames": [ {
    "normalizedBoundingPoly": {
      "normalizedVertices": [ {
        "x": 0.2,
        "y": 0.3
      }, {
        "x": 0.9,
        "y": 0.5
      } ]
    },
    "timeOffset": {
      "nanos": 1000000
    }
  }, {
    "normalizedBoundingPoly": {
      "normalizedVertices": [ {
        "x": 0.3,
        "y": 0.3
      }, {
        "x": 0.5,
        "y": 0.7
      } ]
    },
    "timeOffset": {
      "nanos": 84000000
    }
  } ]
}
}
}]}

视频事件

csv 行：四个点分别位于左上角、右上角、右下角、左下角。第二个点和第四个点是可选的。每个点由 x,y 表示。每行将包含一个边界框。视频中的每个对象均由非重复的 instance_id 表示。

UNASSIGNED,video_url,label,segment_start_time,segment_end_time

json 行：

{
"name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"videoPayload": {
  "mimeType": "VIDEO_MP4",
  "resolution": {
    width: 720,
    height: 360
  }
  "frameRate": 24
},
"annotations": [ {
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
  "annotationValue": {
    "videoEventAnnotation": {
      "annotationSpec": {
        "displayName": "Callie"
      },
      "timeSegment": {
        "startTimeOffset": {
          "seconds": 123
        },
        "endTimeOffset": {
          "seconds": 150
        }
      }
    }
  }
 } ]
}
}
}]}

文本分类

csv 行：

UNASSIGNED,text_url,label_l

json 行：

{
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
  "textPayload": {
    "textContent": "dummy_text_content",
    "textUri": "gs://test_bucket/file.txt",
    "wordCount": 1
  }
  "annotations": [ {
    "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/fake_annotation_id",
    "annotationValue": {
      "textClassificationAnnotation": {
        "annotationSpec": {
          "displayName": "news"
        }
      }
    }
  } ],
}

文本实体提取

对于文本实体提取，仅提供 jsonl 输出。

json 行：

{
    "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "textPayload": {
      "textContent": "dummy_text_content",
      "textUri": "gs://test_bucket/file.txt",
      "wordCount": 1
    }
    "annotations": [ {
      "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/fake_annotation_id",
      "annotationValue": {
        "textEntityExtractionAnnotation": {
          "annotationSpec": {
            "displayName": "equations"
          },
          "textSegment": {
            "startOffset": 10,
            "endOffset": 20
          }
        }
      }
    } ],
  }

ExportData 是一项长时间运行的操作。API 将返回操作 ID。您稍后可以使用操作 ID 调用 GetOperation，以便获取其状态。

网页界面

如需使用数据标签服务界面导出已加标签的数据，请按照以下步骤操作。

在 Google Cloud 控制台中打开数据标签服务界面。

数据集页面会显示之前为当前项目创建的数据集的状态。
点击您要导出的数据集的名称。系统随即会转到数据集详情页面。
在已加标签的数据集部分中，点击导出状态列中的导出。
在导出已加标签的数据集对话框中，输入要用于输出文件的 Cloud Storage 路径，并选择您所需的文件格式。
点击导出。

数据集详情页面会在导出数据时显示“正在进行”的状态。导出完成后，您可以在指定的 Cloud Storage 路径中找到导出文件。

命令行

设置以下环境变量：

将 PROJECT_ID 变量设置为您的 Google Cloud 项目 ID。
将 DATASET_ID 变量设置为您的数据集 ID（来自创建数据集时的响应）。该 ID 显示在完整数据集名称的末尾：
```
projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID
```
将 ANNOTATED_DATASET_ID 变量设置为带注释的数据集资源名称的 ID。资源名称采用以下格式：
```
projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID/annotatedDatasets/ANNOTATED_DATASET_ID
```
将 STORAGE_URI 变量设置为要存储结果的 Cloud Storage 存储分区的 URI。

对于除图片分割之外的所有注释请求，curl 请求类似于以下代码：

curl -X POST \
   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
   -H "Content-Type: application/json" \
   https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \
   -d '{
     "annotatedDataset": "${ANNOTATED_DATASET_ID}",
     "outputConfig": {
       "gcsDestination": {
           "output_uri": "${STORAGE_URI}",
           "mimeType": "text/csv"
       }
     }
   }'

如需导出图片分割数据，curl 请求应类似于以下代码：

curl -X POST \
   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
   -H "Content-Type: application/json" \
   https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \
   -d '{
     "annotatedDataset": "${ANNOTATED_DATASET_ID}",
     "outputConfig": {
       "gcsFolderDestination": {
         "output_folder_uri": "${STORAGE_URI}"
       }
     }
   }'

您将看到如下所示的输出：

{
  "name": "projects/data-labeling-codelab/operations/5c73dd6b_0000_2b34_a920_883d24fa2064",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.data-labeling.v1beta1.ExportDataOperationResponse",
    "dataset": "projects/data-labeling-codelab/datasets/5c73db3d_0000_23e0_a25b_94eb2c119c4c"
  }
}

Python

您必须先安装 Python 客户端库，然后才能运行此代码示例。

def export_data(dataset_resource_name, annotated_dataset_resource_name, export_gcs_uri):
    """Exports a dataset from the given Google Cloud project."""
    from google.cloud import datalabeling_v1beta1 as datalabeling

    client = datalabeling.DataLabelingServiceClient()

    gcs_destination = datalabeling.GcsDestination(
        output_uri=export_gcs_uri, mime_type="text/csv"
    )

    output_config = datalabeling.OutputConfig(gcs_destination=gcs_destination)

    response = client.export_data(
        request={
            "name": dataset_resource_name,
            "annotated_dataset": annotated_dataset_resource_name,
            "output_config": output_config,
        }
    )

    print(f"Dataset ID: {response.result().dataset}\n")
    print("Output config:")
    print("\tGcs destination:")
    print(
        "\t\tOutput URI: {}\n".format(
            response.result().output_config.gcs_destination.output_uri
        )
    )

Java

在运行此代码示例之前，您必须先安装 Java 客户端库。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceClient;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceSettings;
import com.google.cloud.datalabeling.v1beta1.ExportDataOperationMetadata;
import com.google.cloud.datalabeling.v1beta1.ExportDataOperationResponse;
import com.google.cloud.datalabeling.v1beta1.ExportDataRequest;
import com.google.cloud.datalabeling.v1beta1.GcsDestination;
import com.google.cloud.datalabeling.v1beta1.LabelStats;
import com.google.cloud.datalabeling.v1beta1.OutputConfig;
import java.io.IOException;
import java.util.Map.Entry;
import java.util.Set;
import java.util.concurrent.ExecutionException;

class ExportData {

  // Export data from an annotated dataset.
  static void exportData(String datasetName, String annotatedDatasetName, String gcsOutputUri)
      throws IOException {
    // String datasetName = DataLabelingServiceClient.formatDatasetName(
    //     "YOUR_PROJECT_ID", "YOUR_DATASETS_UUID");
    // String annotatedDatasetName = DataLabelingServiceClient.formatAnnotatedDatasetName(
    //     "YOUR_PROJECT_ID",
    //     "YOUR_DATASET_UUID",
    //     "YOUR_ANNOTATED_DATASET_UUID");
    // String gcsOutputUri = "gs://YOUR_BUCKET_ID/export_path";


    DataLabelingServiceSettings settings =
        DataLabelingServiceSettings.newBuilder()
            .build();
    try (DataLabelingServiceClient dataLabelingServiceClient =
        DataLabelingServiceClient.create(settings)) {
      GcsDestination gcsDestination =
          GcsDestination.newBuilder().setOutputUri(gcsOutputUri).setMimeType("text/csv").build();

      OutputConfig outputConfig =
          OutputConfig.newBuilder().setGcsDestination(gcsDestination).build();

      ExportDataRequest exportDataRequest =
          ExportDataRequest.newBuilder()
              .setName(datasetName)
              .setOutputConfig(outputConfig)
              .setAnnotatedDataset(annotatedDatasetName)
              .build();

      OperationFuture<ExportDataOperationResponse, ExportDataOperationMetadata> operation =
          dataLabelingServiceClient.exportDataAsync(exportDataRequest);

      ExportDataOperationResponse response = operation.get();

      System.out.format("Exported item count: %d\n", response.getExportCount());
      LabelStats labelStats = response.getLabelStats();
      Set<Entry<String, Long>> entries = labelStats.getExampleCountMap().entrySet();
      for (Entry<String, Long> entry : entries) {
        System.out.format("\tLabel: %s\n", entry.getKey());
        System.out.format("\tCount: %d\n\n", entry.getValue());
      }
    } catch (IOException | InterruptedException | ExecutionException e) {
      e.printStackTrace();
    }
  }
}