Esta versión heredada de AI Platform Data Labeling está obsoleta y dejará de estar disponible en Google Cloud después del 23 de enero de 2024. Todas las funciones del etiquetado de datos heredado de AI Platform y las funciones nuevas están disponibles en la plataforma de Vertex AI. Consulta Migra a Vertex AI si quieres obtener información para migrar tus recursos.

Se usó la API de Cloud Translation para traducir esta página.

Exporta datos etiquetados

Cuando se completa la operación de etiquetado, puedes exportar el conjunto de datos con anotaciones al depósito de Google Cloud Storage mediante una llamada a ExportData.

ExportData admite que se muestre un archivo .csv que contiene una fila para cada elemento de datos o anotación. El primer campo indica la categoría de uso de ml de esta línea, cuyo valor predeterminado es UNASSIGNED. ExportData también admite un archivo JSONL, en el que cada línea representa un ejemplo que incluye un elemento de datos y todas las anotaciones. A continuación, se muestran ejemplos de cada tipo.

Clasificación de imágenes

Línea CSV:

UNASSIGNED,image_url,label_1,label_2,...

Línea JSON:

{
"name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"imagePayload":{
"mimeType":"IMAGE_PNG",
"imageUri":"gs://sample_bucket/image.png"
},
"annotations":[
{
     "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
   "annotationValue":{
      "imageClassificationAnnotation":{
       "annotationSpec":{
            "displayName":"tulip",
         }
      }
   }
}
]
}

Cuadro de límite de imágenes

Línea CSV: Cada línea contiene información sobre un cuadro de límite, con coordenadas x,y para representar cada esquina del cuadro. Varios cuadros para una sola imagen están en líneas separadas. El formato de la línea es UNASSIGNED, image_url, label, topleft_x, topleft_y, topright_x, topright_y, bottomright_x, bottomright_y, bottomleft_x, bottomleft_y. Las coordenadas topright_x, topright_y, bottomleft_x y bottomleft_y pueden ser strings vacías porque proporcionan información redundante.

UNASSIGNED,image_url,label,0.1,0.1,,,0.3,0.3,,

Línea JSON: Si no se establece una coordenada en normalizedVertices, el valor de ese campo es 0 de forma predeterminada. Esto también se aplica a las anotaciones basadas en coordenadas.

{
 "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
 "imagePayload":{
    "mimeType":"IMAGE_PNG",
    "imageUri":"gs://sample_bucket/image.png"
 },
 "annotations":[
    {
         "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
       "annotationValue":{
         "image_bounding_poly_annotation": {
          "annotationSpec": {
            "displayName": "tulip"
          },
          "normalizedBoundingPoly": {
          "normalizedVertices": [ {
              "x": 0.1,
              "y": 0.2
            }, {
              "x": 0.9,
              "y": 0.9
            } ]
          }
       }
    }
  }
 ]
}

Polígono de limitación de imágenes, cuadro de límite orientado y polilínea

Línea CSV: Cada punto del polígono o polilínea cerrados se representa con el punto x,y, separado por dos columnas CSV vacías. En el polígono, el último par se vuelve a conectar con el primero; en la polilínea, no hay un ciclo cerrado. Cada línea representa un polígono o polilínea.

UNASSIGNED,image_url,label,0.1,0.1,,,0.3,0.3,,,0.6,0.6,,...

Línea JSON:

{
"name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"imagePayload":{
"mimeType":"IMAGE_PNG",
"imageUri":"gs://sample_bucket/image.png"
},
"annotations":[
{
     "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
   "annotationValue":{
     "image_bounding_poly_annotation": {
      "annotationSpec": {
        "displayName": "tulip"
      },
      "normalizedBoundingPoly": {
        "normalizedVertices": [ {
          "x": 0.1,
          "y": 0.1
        }, {
          "x": 0.1,
          "y": 0.2
        }, {
          "x": 0.2,
          "y": 0.3
        }  ]
      }
   }
}
}
]
}

Segmentación de imágenes

En la segmentación de imágenes, solo se proporciona el resultado JSONL.

Línea JSON: El campo imageBytes en imageSegmentationAnnotation representa la máscara de segmentación de esa imagen. El color de cada etiqueta (es decir, cada perro y gato) se muestra en el campo annotationColors.

{
"name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"imagePayload":{
"mimeType":"IMAGE_PNG",
"imageUri":"gs://sample_bucket/image.png"
},
"annotations":[
{
     "name":"projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
   "annotationValue":{
     "imageSegmentationAnnotation": {
        "annotationColors": [ {
          "key": "rgb(0,0,255)",
          "value": {
            "display_name": "dog"
          }
        }, {
          "key": "rgb(0,255,0)",
          "value": {
            "display_name": "cat"
          }
        } ],
        "mimeType": "IMAGE_JPEG",
        "imageBytes": "/9j/4AAQSkZJRgABAQAAAQABAAD/2"
   }
}
}
]
}

Clasificación de videos

Línea CSV:

UNASSIGNED,video_url,label,segment_start_time,segment_end_time

Línea JSON:

{
"name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"videoPayload": {
  "mimeType": "VIDEO_MP4",
  "resolution": {
    width: 720,
    height: 360
  }
  "frameRate": 24
},
"annotations": [ {
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
  "annotationSource": 3,
  "annotationValue": {
    "videoClassificationAnnotation": {
      "timeSegment": {
        "startTimeOffset": {
          "seconds": 10
        },
        "endTimeOffset": {
          "seconds": 20
        }
      },
      "annotationSpec": {
        "displayName": "dog"
      }
    }
  }
} ]
}

Detección de objetos de videos

Línea CSV: Los cuatro puntos son la parte superior izquierda, la parte superior derecha, la parte inferior derecha y la parte inferior izquierda. El segundo y el cuarto punto son opcionales. Cada punto está representado por x,y. Cada línea contendrá un cuadro de límite.

UNASSIGNED,video_url,label,timestamp,0.1,0.1,,,0.3,0.3,,

Línea JSON:

{
"name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"videoPayload": {
  "mimeType": "VIDEO_MP4",
  "resolution": {
    width: 720,
    height: 360
  }
  "frameRate": 24
},
"annotations": [ {
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
  "annotationSource": 3,
  "annotationValue": {
    "videoObjectTrackingAnnotation": {
  "annotationSpec": {
    "displayName": "tulip"
  },
  "timeSegment": {
    "startTimeOffset": {
      "seconds": 10
    },
    "endTimeOffset": {
      "seconds": 10
    }
  },
  "objectTrackingFrames": [ {
    "normalizedBoundingPoly": {
      "normalizedVertices": [ {
        "x": 0.2,
        "y": 0.3
      }, {
        "x": 0.9,
        "y": 0.5
      } ]
    },
  }, {
    "normalizedBoundingPoly": {
      "normalizedVertices": [ {
        "x": 0.3,
        "y": 0.3
      }, {
        "x": 0.5,
        "y": 0.7
      } ]
    },
  } ]
}
}
}]}

Seguimiento de objetos de videos

Línea CSV: Los cuatro puntos son la parte superior izquierda, la parte superior derecha, la parte inferior derecha y la parte inferior izquierda. El segundo y el cuarto punto son opcionales. Cada punto está representado por x,y. Cada línea contendrá un cuadro de límite. Cada objeto en el video está representado por un instance_id único.

UNASSIGNED,video_url,label,instance_id,timestamp,0.1,0.1,,,0.3,0.3,,

Línea JSON:

{
"name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"videoPayload": {
  "mimeType": "VIDEO_MP4",
  "resolution": {
    width: 720,
    height: 360
  }
  "frameRate": 24
},
"annotations": [ {
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
  "annotationSource": 3,
  "annotationValue": {
    "videoObjectTrackingAnnotation": {
  "annotationSpec": {
    "displayName": "tulip"
  },
  "timeSegment": {
    "startTimeOffset": {
      "seconds": 10
    },
    "endTimeOffset": {
      "seconds": 20
    }
  },
  "objectTrackingFrames": [ {
    "normalizedBoundingPoly": {
      "normalizedVertices": [ {
        "x": 0.2,
        "y": 0.3
      }, {
        "x": 0.9,
        "y": 0.5
      } ]
    },
    "timeOffset": {
      "nanos": 1000000
    }
  }, {
    "normalizedBoundingPoly": {
      "normalizedVertices": [ {
        "x": 0.3,
        "y": 0.3
      }, {
        "x": 0.5,
        "y": 0.7
      } ]
    },
    "timeOffset": {
      "nanos": 84000000
    }
  } ]
}
}
}]}

Evento de video

Línea CSV: Los cuatro puntos son la parte superior izquierda, la parte superior derecha, la parte inferior derecha y la parte inferior izquierda. El segundo punto y el cuarto son opcionales. Cada punto está representado por x,y. Cada línea contendrá un cuadro de límite. Cada objeto en el video está representado por un instance_id único.

UNASSIGNED,video_url,label,segment_start_time,segment_end_time

Línea JSON:

{
"name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
"videoPayload": {
  "mimeType": "VIDEO_MP4",
  "resolution": {
    width: 720,
    height: 360
  }
  "frameRate": 24
},
"annotations": [ {
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/annotation_id",
  "annotationValue": {
    "videoEventAnnotation": {
      "annotationSpec": {
        "displayName": "Callie"
      },
      "timeSegment": {
        "startTimeOffset": {
          "seconds": 123
        },
        "endTimeOffset": {
          "seconds": 150
        }
      }
    }
  }
 } ]
}
}
}]}

Clasificación de texto

Línea CSV:

UNASSIGNED,text_url,label_l

Línea JSON:

{
  "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
  "textPayload": {
    "textContent": "dummy_text_content",
    "textUri": "gs://test_bucket/file.txt",
    "wordCount": 1
  }
  "annotations": [ {
    "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/fake_annotation_id",
    "annotationValue": {
      "textClassificationAnnotation": {
        "annotationSpec": {
          "displayName": "news"
        }
      }
    }
  } ],
}

Extracción de entidades de texto

Para la extracción de entidades de texto, solo se proporciona el resultado de jsonl.

Línea JSON:

{
    "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id",
    "textPayload": {
      "textContent": "dummy_text_content",
      "textUri": "gs://test_bucket/file.txt",
      "wordCount": 1
    }
    "annotations": [ {
      "name": "projects/project_id/datasets/dataset_id/annotatedDatasets/annotated_dataset_id/examples/example_id/annotations/fake_annotation_id",
      "annotationValue": {
        "textEntityExtractionAnnotation": {
          "annotationSpec": {
            "displayName": "equations"
          },
          "textSegment": {
            "startOffset": 10,
            "endOffset": 20
          }
        }
      }
    } ],
  }

ExportData es una operación de larga duración. La API mostrará un ID de operación. Puedes usar el ID de la operación para llamar a GetOperation y obtener el estado correspondiente más adelante.

IU web

Sigue estos pasos para exportar los datos etiquetados mediante la IU del Servicio de etiquetado de datos.

Abre la IU del Servicio de etiquetado de datos en la consola de Google Cloud.

En la página Conjuntos de datos, se muestra el estado de los conjuntos de datos que se crearon antes para el proyecto actual.
Haz clic en el nombre del conjunto de datos que deseas exportar. Esto te llevará a la página Detalle del conjunto de datos.
En la sección Conjuntos de datos etiquetados, haz clic en EXPORTAR en la columna Estado de exportación.
En el diálogo Exportar conjunto de datos etiquetados, ingresa la ruta de Cloud Storage que se usará para el archivo de salida y selecciona el formato de archivo que deseas.
Haz clic en EXPORTAR.

La página de Detalle de conjunto de datos muestra el estado del proceso en curso mientras se exportan tus datos. Una vez que ha terminado, puedes encontrar el archivo de exportación en la ruta de Cloud Storage que especificaste.

Línea de comandos

Configura las siguientes variables de entorno:

La variable PROJECT_ID para el ID del proyecto de Google Cloud
La variable DATASET_ID para el ID del conjunto de datos, a partir de la respuesta obtenida cuando creaste el conjunto de datos. El ID aparece al final del nombre completo del conjunto de datos:
```
projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID
```
La variable ANNOTATED_DATASET_ID para el ID del nombre de tu recurso del conjunto de datos anotado. El nombre del recurso está en el siguiente formato:
```
projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID/annotatedDatasets/ANNOTATED_DATASET_ID
```
La variable STORAGE_URI para el URI del depósito de Cloud Storage en el que deseas almacenar los resultados

En todas las solicitudes de anotación, excepto la segmentación de imágenes, la solicitud curl se ve de la siguiente manera:

curl -X POST \
   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
   -H "Content-Type: application/json" \
   https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \
   -d '{
     "annotatedDataset": "${ANNOTATED_DATASET_ID}",
     "outputConfig": {
       "gcsDestination": {
           "output_uri": "${STORAGE_URI}",
           "mimeType": "text/csv"
       }
     }
   }'

Para exportar datos de segmentación de imágenes, la solicitud curl se ve de la siguiente manera:

curl -X POST \
   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
   -H "Content-Type: application/json" \
   https://datalabeling.googleapis.com/v1beta1/projects/${PROJECT_ID}/datasets/${DATASET_ID}:exportData \
   -d '{
     "annotatedDataset": "${ANNOTATED_DATASET_ID}",
     "outputConfig": {
       "gcsFolderDestination": {
         "output_folder_uri": "${STORAGE_URI}"
       }
     }
   }'

Debería ver un resultado similar al siguiente:

{
  "name": "projects/data-labeling-codelab/operations/5c73dd6b_0000_2b34_a920_883d24fa2064",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.data-labeling.v1beta1.ExportDataOperationResponse",
    "dataset": "projects/data-labeling-codelab/datasets/5c73db3d_0000_23e0_a25b_94eb2c119c4c"
  }
}

Python

Antes de que puedas ejecutar este ejemplo de código, debes instalar las bibliotecas cliente de Python.

def export_data(dataset_resource_name, annotated_dataset_resource_name, export_gcs_uri):
    """Exports a dataset from the given Google Cloud project."""
    from google.cloud import datalabeling_v1beta1 as datalabeling

    client = datalabeling.DataLabelingServiceClient()

    gcs_destination = datalabeling.GcsDestination(
        output_uri=export_gcs_uri, mime_type="text/csv"
    )

    output_config = datalabeling.OutputConfig(gcs_destination=gcs_destination)

    response = client.export_data(
        request={
            "name": dataset_resource_name,
            "annotated_dataset": annotated_dataset_resource_name,
            "output_config": output_config,
        }
    )

    print(f"Dataset ID: {response.result().dataset}\n")
    print("Output config:")
    print("\tGcs destination:")
    print(
        "\t\tOutput URI: {}\n".format(
            response.result().output_config.gcs_destination.output_uri
        )
    )

Java

Antes de que puedas ejecutar este ejemplo de código, debes instalar las bibliotecas cliente de Java.

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceClient;
import com.google.cloud.datalabeling.v1beta1.DataLabelingServiceSettings;
import com.google.cloud.datalabeling.v1beta1.ExportDataOperationMetadata;
import com.google.cloud.datalabeling.v1beta1.ExportDataOperationResponse;
import com.google.cloud.datalabeling.v1beta1.ExportDataRequest;
import com.google.cloud.datalabeling.v1beta1.GcsDestination;
import com.google.cloud.datalabeling.v1beta1.LabelStats;
import com.google.cloud.datalabeling.v1beta1.OutputConfig;
import java.io.IOException;
import java.util.Map.Entry;
import java.util.Set;
import java.util.concurrent.ExecutionException;

class ExportData {

  // Export data from an annotated dataset.
  static void exportData(String datasetName, String annotatedDatasetName, String gcsOutputUri)
      throws IOException {
    // String datasetName = DataLabelingServiceClient.formatDatasetName(
    //     "YOUR_PROJECT_ID", "YOUR_DATASETS_UUID");
    // String annotatedDatasetName = DataLabelingServiceClient.formatAnnotatedDatasetName(
    //     "YOUR_PROJECT_ID",
    //     "YOUR_DATASET_UUID",
    //     "YOUR_ANNOTATED_DATASET_UUID");
    // String gcsOutputUri = "gs://YOUR_BUCKET_ID/export_path";


    DataLabelingServiceSettings settings =
        DataLabelingServiceSettings.newBuilder()
            .build();
    try (DataLabelingServiceClient dataLabelingServiceClient =
        DataLabelingServiceClient.create(settings)) {
      GcsDestination gcsDestination =
          GcsDestination.newBuilder().setOutputUri(gcsOutputUri).setMimeType("text/csv").build();

      OutputConfig outputConfig =
          OutputConfig.newBuilder().setGcsDestination(gcsDestination).build();

      ExportDataRequest exportDataRequest =
          ExportDataRequest.newBuilder()
              .setName(datasetName)
              .setOutputConfig(outputConfig)
              .setAnnotatedDataset(annotatedDatasetName)
              .build();

      OperationFuture<ExportDataOperationResponse, ExportDataOperationMetadata> operation =
          dataLabelingServiceClient.exportDataAsync(exportDataRequest);

      ExportDataOperationResponse response = operation.get();

      System.out.format("Exported item count: %d\n", response.getExportCount());
      LabelStats labelStats = response.getLabelStats();
      Set<Entry<String, Long>> entries = labelStats.getExampleCountMap().entrySet();
      for (Entry<String, Long> entry : entries) {
        System.out.format("\tLabel: %s\n", entry.getKey());
        System.out.format("\tCount: %d\n\n", entry.getValue());
      }
    } catch (IOException | InterruptedException | ExecutionException e) {
      e.printStackTrace();
    }
  }
}