Mengekspor data berlabel

Setelah operasi pemberian label selesai, Anda dapat mengekspor set data yang dianotasi ke bucket Google Cloud Storage dengan memanggil ExportData.

ExportData mendukung pengembalian file .csv yang berisi satu baris untuk setiap anotasi atau item data. Kolom pertama menunjukkan kategori penggunaan ml baris ini, yang secara default ditetapkan ke UNASSIGNED. ExportData juga mendukung file jsonl dengan setiap baris mewakili contoh yang menyertakan item data dan semua anotasi. Berikut adalah contoh untuk setiap jenis.

Klasifikasi gambar

  • baris csv:

    UNASSIGNED,image_url,label_1,label_2,...

  • baris json:

    {
    "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "imagePayload":{
    "mimeType":"IMAGE_PNG",
    "imageUri":"gs://sample_bucket/image.png"
    },
    "annotations":[
    {
         "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
       "annotationValue":{
          "imageClassificationAnnotation":{
           "annotationSpec":{
                "displayName":"tulip",
             }
          }
       }
    }
    ]
    }

Kotak pembatas image

  • baris csv: Setiap baris berisi informasi tentang satu kotak pembatas, menggunakan koordinat x,y untuk mewakili setiap sudut kotak. Beberapa kotak untuk satu gambar berada di baris terpisah. Format baris adalah UNASSIGNED, image_url, label, topleft_x, topleft_y, topright_x, topright_y, bottomright_x, bottomright_y, bottomleft_x, bottomleft_y. Koordinat topright_x, topright_y, bottomleft_x, dan bottomleft_y mungkin berupa string kosong, karena memberikan informasi yang redundan.

    UNASSIGNED,image_url,label,0.1,0.1,,,0.3,0.3,,

  • baris json: jika koordinat di normalizedVertices tidak ditetapkan, kolom tersebut akan bernilai 0 secara default. Hal ini juga berlaku untuk anotasi berbasis koordinat.

    {
     "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
     "imagePayload":{
        "mimeType":"IMAGE_PNG",
        "imageUri":"gs://sample_bucket/image.png"
     },
     "annotations":[
        {
             "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
           "annotationValue":{
             "image_bounding_poly_annotation": {
              "annotationSpec": {
                "displayName": "tulip"
              },
              "normalizedBoundingPoly": {
              "normalizedVertices": [ {
                  "x": 0.1,
                  "y": 0.2
                }, {
                  "x": 0.9,
                  "y": 0.9
                } ]
              }
           }
        }
      }
     ]
    }

Poligon pembatas gambar, kotak pembatas berorientasi, dan polyline

  • Baris csv: Setiap titik dalam poligon/polilini tertutup diwakili oleh titik x,y, yang dipisahkan oleh dua kolom csv kosong. Pasangan terakhir terhubung kembali ke pasangan pertama untuk poligon, sedangkan tidak ada siklus tertutup untuk polyline. Setiap garis mewakili satu poligon/poligaris.

    UNASSIGNED,image_url,label,0.1,0.1,,,0.3,0.3,,,0.6,0.6,,...

  • baris json:

    {
    "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "imagePayload":{
    "mimeType":"IMAGE_PNG",
    "imageUri":"gs://sample_bucket/image.png"
    },
    "annotations":[
    {
         "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
       "annotationValue":{
         "image_bounding_poly_annotation": {
          "annotationSpec": {
            "displayName": "tulip"
          },
          "normalizedBoundingPoly": {
            "normalizedVertices": [ {
              "x": 0.1,
              "y": 0.1
            }, {
              "x": 0.1,
              "y": 0.2
            }, {
              "x": 0.2,
              "y": 0.3
            }  ]
          }
       }
    }
    }
    ]
    }

Segmentasi gambar

Untuk segmentasi gambar, hanya output jsonl yang disediakan.

  • baris json: Kolom imageBytes di imageSegmentationAnnotation mewakili mask segmentasi untuk gambar tersebut. Warna untuk setiap label (yaitu, setiap dan kucing) ditampilkan di kolom annotationColors.
    {
    "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "imagePayload":{
    "mimeType":"IMAGE_PNG",
    "imageUri":"gs://sample_bucket/image.png"
    },
    "annotations":[
    {
         "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
       "annotationValue":{
         "imageSegmentationAnnotation": {
            "annotationColors": [ {
              "key": "rgb(0,0,255)",
              "value": {
                "display_name": "dog"
              }
            }, {
              "key": "rgb(0,255,0)",
              "value": {
                "display_name": "cat"
              }
            } ],
            "mimeType": "IMAGE_JPEG",
            "imageBytes": "/9j/4AAQSkZJRgABAQAAAQABAAD/2"
       }
    }
    }
    ]
    }

Klasifikasi video

  • baris csv:

    UNASSIGNED,video_url,label,segment_start_time,segment_end_time

  • baris json:

    {
    "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "videoPayload": {
      "mimeType": "VIDEO_MP4",
      "resolution": {
        width: 720,
        height: 360
      }
      "frameRate": 24
    },
    "annotations": [ {
      "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
      "annotationSource": 3,
      "annotationValue": {
        "videoClassificationAnnotation": {
          "timeSegment": {
            "startTimeOffset": {
              "seconds": 10
            },
            "endTimeOffset": {
              "seconds": 20
            }
          },
          "annotationSpec": {
            "displayName": "dog"
          }
        }
      }
    } ]
    }

Deteksi objek video

  • baris csv:Empat titik adalah kiri atas, kanan atas, kanan bawah, kiri bawah. Titik kedua dan keempat bersifat opsional. Setiap titik diwakili oleh x,y. Setiap baris akan berisi satu kotak pembatas.

    UNASSIGNED,video_url,label,timestamp,0.1,0.1,,,0.3,0.3,,

  • baris json:

    {
    "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "videoPayload": {
      "mimeType": "VIDEO_MP4",
      "resolution": {
        width: 720,
        height: 360
      }
      "frameRate": 24
    },
    "annotations": [ {
      "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
      "annotationSource": 3,
      "annotationValue": {
        "videoObjectTrackingAnnotation": {
      "annotationSpec": {
        "displayName": "tulip"
      },
      "timeSegment": {
        "startTimeOffset": {
          "seconds": 10
        },
        "endTimeOffset": {
          "seconds": 10
        }
      },
      "objectTrackingFrames": [ {
        "normalizedBoundingPoly": {
          "normalizedVertices": [ {
            "x": 0.2,
            "y": 0.3
          }, {
            "x": 0.9,
            "y": 0.5
          } ]
        },
      }, {
        "normalizedBoundingPoly": {
          "normalizedVertices": [ {
            "x": 0.3,
            "y": 0.3
          }, {
            "x": 0.5,
            "y": 0.7
          } ]
        },
      } ]
    }
    }
    }]}

Pelacakan objek video

  • baris csv:Empat titik adalah kiri atas, kanan atas, kanan bawah, kiri bawah. Titik kedua dan keempat bersifat opsional. Setiap titik diwakili oleh x,y. Setiap baris akan berisi satu kotak pembatas. Setiap objek dalam video diwakili oleh instance_id unik.

    UNASSIGNED,video_url,label,instance_id,timestamp,0.1,0.1,,,0.3,0.3,,

  • baris json:

    {
    "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "videoPayload": {
      "mimeType": "VIDEO_MP4",
      "resolution": {
        width: 720,
        height: 360
      }
      "frameRate": 24
    },
    "annotations": [ {
      "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
      "annotationSource": 3,
      "annotationValue": {
        "videoObjectTrackingAnnotation": {
      "annotationSpec": {
        "displayName": "tulip"
      },
      "timeSegment": {
        "startTimeOffset": {
          "seconds": 10
        },
        "endTimeOffset": {
          "seconds": 20
        }
      },
      "objectTrackingFrames": [ {
        "normalizedBoundingPoly": {
          "normalizedVertices": [ {
            "x": 0.2,
            "y": 0.3
          }, {
            "x": 0.9,
            "y": 0.5
          } ]
        },
        "timeOffset": {
          "nanos": 1000000
        }
      }, {
        "normalizedBoundingPoly": {
          "normalizedVertices": [ {
            "x": 0.3,
            "y": 0.3
          }, {
            "x": 0.5,
            "y": 0.7
          } ]
        },
        "timeOffset": {
          "nanos": 84000000
        }
      } ]
    }
    }
    }]}

Peristiwa video

  • baris csv:Empat titik adalah kiri atas, kanan atas, kanan bawah, kiri bawah. Titik kedua dan keempat bersifat opsional. Setiap titik diwakili oleh x,y. Setiap baris akan berisi satu kotak pembatas. Setiap objek dalam video diwakili oleh instance_id unik.

    UNASSIGNED,video_url,label,segment_start_time,segment_end_time

  • baris json:

    {
    "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "videoPayload": {
      "mimeType": "VIDEO_MP4",
      "resolution": {
        width: 720,
        height: 360
      }
      "frameRate": 24
    },
    "annotations": [ {
      "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
      "annotationValue": {
        "videoEventAnnotation": {
          "annotationSpec": {
            "displayName": "Callie"
          },
          "timeSegment": {
            "startTimeOffset": {
              "seconds": 123
            },
            "endTimeOffset": {
              "seconds": 150
            }
          }
        }
      }
     } ]
    }
    }
    }]}

Klasifikasi teks

  • baris csv:

    UNASSIGNED,text_url,label_l

  • baris json:

    {
      "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
      "textPayload": {
        "textContent": "dummy_text_content",
        "textUri": "gs://test_bucket/file.txt",
        "wordCount": 1
      }
      "annotations": [ {
        "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/fake_annotation_id",
        "annotationValue": {
          "textClassificationAnnotation": {
            "annotationSpec": {
              "displayName": "news"
            }
          }
        }
      } ],
    }

Ekstraksi entity teks

Untuk ekstraksi entity teks, hanya output jsonl yang disediakan.

  • baris json:
    {
        "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
        "textPayload": {
          "textContent": "dummy_text_content",
          "textUri": "gs://test_bucket/file.txt",
          "wordCount": 1
        }
        "annotations": [ {
          "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/fake_annotation_id",
          "annotationValue": {
            "textEntityExtractionAnnotation": {
              "annotationSpec": {
                "displayName": "equations"
              },
              "textSegment": {
                "startOffset": 10,
                "endOffset": 20
              }
            }
          }
        } ],
      }

ExportData adalah operasi yang berjalan lama. API akan menampilkan ID operasi. Anda dapat menggunakan ID operasi untuk memanggil GetOperation guna mendapatkan statusnya nanti.

UI Web

Ikuti langkah-langkah berikut untuk mengekspor data berlabel menggunakan UI Layanan Pemberian Label Data.

  1. Buka UI Layanan Pemberian Label Data di konsol Google Cloud.

    Halaman Set Data menampilkan status set data yang dibuat sebelumnya untuk project saat ini.

  2. Klik nama set data yang ingin Anda ekspor. Tindakan ini akan mengarahkan Anda ke halaman Detail set data.

  3. Di bagian Set data berlabel, klik EXPORT di kolom Export status.

  4. Di dialog Export labeled dataset, masukkan jalur Cloud Storage yang akan digunakan untuk file output, lalu pilih format file yang Anda inginkan.

  5. Klik EKSPOR.

    Halaman Detail set data menampilkan status sedang berlangsung saat data Anda diekspor. Setelah selesai, Anda dapat menemukan file ekspor di jalur Cloud Storage yang Anda tentukan.

Command-line

Tetapkan variabel lingkungan berikut:

  1. Variabel PROJECT_ID ke project ID Google Cloud Anda.
  2. variabel DATASET_ID ke ID set data Anda, dari respons saat Anda membuat set data. ID muncul di akhir nama set data lengkap:

    projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID
  3. Variabel ANNOTATED_DATASET_ID ke ID nama resource set data yang dianotasi. Nama resource dalam format berikut:

    projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID/annotatedDatasets/ANNOTATED_DATASET_ID
  4. Variabel STORAGE_URI ke URI bucket Cloud Storage tempat Anda ingin hasil disimpan.

Untuk semua permintaan anotasi kecuali segmentasi gambar, permintaan curl terlihat mirip dengan berikut ini:

curl -X POST \
   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
   -H "Content-Type: application/json" \
   https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \
   -d '{
     "annotatedDataset": "${ANNOTATED_DATASET_ID}",
     "outputConfig": {
       "gcsDestination": {
           "output_uri": "${STORAGE_URI}",
           "mimeType": "text/csv"
       }
     }
   }'

Untuk mengekspor data segmentasi gambar, permintaan curl akan terlihat seperti berikut:

curl -X POST \
   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
   -H "Content-Type: application/json" \
   https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \
   -d '{
     "annotatedDataset": "${ANNOTATED_DATASET_ID}",
     "outputConfig": {
       "gcsFolderDestination": {
         "output_folder_uri": "${STORAGE_URI}"
       }
     }
   }'

Anda akan melihat output yang mirip dengan berikut ini:

{
  "name": "projects/data-labeling-codelab/operations/5c73dd6b_0000_2b34_a920_883d24fa2064",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.data-labeling.v1beta1.ExportDataOperationResponse",
    "dataset": "projects/data-labeling-codelab/datasets/5c73db3d_0000_23e0_a25b_94eb2c119c4c"
  }
}

Python

Sebelum dapat menjalankan contoh kode ini, Anda harus menginstal Library Klien Python.

def export_data(dataset_resource_name, annotated_dataset_resource_name, export_gcs_uri):
    """Exports a dataset from the given Google Cloud project."""
    from google.cloud import datalabeling_v1beta1 as datalabeling

    client = datalabeling.DataLabelingServiceClient()

    gcs_destination = datalabeling.GcsDestination(
        output_uri=export_gcs_uri, mime_type="text/csv"
    )

    output_config = datalabeling.OutputConfig(gcs_destination=gcs_destination)

    response = client.export_data(
        request={
            "name": dataset_resource_name,
            "annotated_dataset": annotated_dataset_resource_name,
            "output_config": output_config,
        }
    )

    print(f"Dataset ID: {response.result().dataset}\n")
    print("Output config:")
    print("\tGcs destination:")
    print(
        "\t\tOutput URI: {}\n".format(
            response.result().output_config.gcs_destination.output_uri
        )
    )

Java

Sebelum dapat menjalankan contoh kode ini, Anda harus menginstal Library Klien Java.
import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceClient;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceSettings;
import com.google.cloud.datalabeling.v1beta1.ExportDataOperationMetadata;
import com.google.cloud.datalabeling.v1beta1.ExportDataOperationResponse;
import com.google.cloud.datalabeling.v1beta1.ExportDataRequest;
import com.google.cloud.datalabeling.v1beta1.GcsDestination;
import com.google.cloud.datalabeling.v1beta1.LabelStats;
import com.google.cloud.datalabeling.v1beta1.OutputConfig;
import java.io.IOException;
import java.util.Map.Entry;
import java.util.Set;
import java.util.concurrent.ExecutionException;

class ExportData {

  // Export data from an annotated dataset.
  static void exportData(String datasetName, String annotatedDatasetName, String gcsOutputUri)
      throws IOException {
    // String datasetName = DataLabelingServiceClient.formatDatasetName(
    //     "YOUR_PROJECT_ID", "YOUR_DATASETS_UUID");
    // String annotatedDatasetName = DataLabelingServiceClient.formatAnnotatedDatasetName(
    //     "YOUR_PROJECT_ID",
    //     "YOUR_DATASET_UUID",
    //     "YOUR_ANNOTATED_DATASET_UUID");
    // String gcsOutputUri = "gs://YOUR_BUCKET_ID/export_path";


    DataLabelingServiceSettings settings =
        DataLabelingServiceSettings.newBuilder()
            .build();
    try (DataLabelingServiceClient dataLabelingServiceClient =
        DataLabelingServiceClient.create(settings)) {
      GcsDestination gcsDestination =
          GcsDestination.newBuilder().setOutputUri(gcsOutputUri).setMimeType("text/csv").build();

      OutputConfig outputConfig =
          OutputConfig.newBuilder().setGcsDestination(gcsDestination).build();

      ExportDataRequest exportDataRequest =
          ExportDataRequest.newBuilder()
              .setName(datasetName)
              .setOutputConfig(outputConfig)
              .setAnnotatedDataset(annotatedDatasetName)
              .build();

      OperationFuture<ExportDataOperationResponse, ExportDataOperationMetadata> operation =
          dataLabelingServiceClient.exportDataAsync(exportDataRequest);

      ExportDataOperationResponse response = operation.get();

      System.out.format("Exported item count: %d\n", response.getExportCount());
      LabelStats labelStats = response.getLabelStats();
      Set<Entry<String, Long>> entries = labelStats.getExampleCountMap().entrySet();
      for (Entry<String, Long> entry : entries) {
        System.out.format("\tLabel: %s\n", entry.getKey());
        System.out.format("\tCount: %d\n\n", entry.getValue());
      }
    } catch (IOException | InterruptedException | ExecutionException e) {
      e.printStackTrace();
    }
  }
}