Managing datasets

A dataset contains representative samples of the type of content you want to label, with the bounding box labels you want your model to use. The dataset serves as the input for training a model.

The main steps for building a dataset are:

  1. Create a dataset and specify whether to allow multiple labels on each item.
  2. Import data items into the dataset.

Before training, be sure that you prepare your data before training a model.

A project can have multiple datasets, each used to train a separate model. You can get a list of the available datasets and can delete datasets you no longer need.

Creating a dataset

The first step in creating a model is to create an empty dataset that will eventually hold the training data for the model.

Web UI

The AutoML Video Object Tracking UI enables you to create a new dataset and import items into it from the same page.

  1. Open the AutoML Video Object Tracking UI. The Datasets page shows the status of previously created datasets for the current project. List of datasets for the project in the Google Cloud console To add a dataset for a different project, select the project from the drop-down list in the upper right of the title bar.
  2. On the Datasets page, click Create Dataset.
  3. In the Create new dataset dialog, do the following:
    • Specify a name for this dataset.
    • Select Video Object Tracking.
    • Click Create Dataset.
  4. On the page for your dataset, provide the Cloud Storage URI of the CSV file that contains the URIs of your training data, without the gs:// prefix at the beginning.
  5. Also on the page for your dataset, click Continue to begin importing. Page for dataset titled 'my_dataset'


The following example creates a dataset named my_dataset01 that supports object tracking use cases. The newly created dataset doesn't contain any data until you import items into it.

Save the "name" of the new dataset (from the response) for use with other operations, such as importing items into your dataset and training a model.

Before using any of the request data, make the following replacements:

  • dataset-name: the name of your target dataset.
    For example, my_dataset_01
  • Note:
    • project-number: number of your project
    • location-id: the Cloud region where annotation should take place. Supported cloud regions are: us-east1, us-west1, europe-west1, asia-east1. If no region is specified, a region will be determined based on video file location.

HTTP method and URL:


Request JSON body:

    "displayName": "dataset-name",
    "videoObjectTrackingDatasetMetadata": { }

To send your request, choose one of these options:


Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-number" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \


Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-number" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "" | Select-Object -Expand Content
If the response is successful, the AutoML Video Intelligence Object Tracking API returns the name for your operation. The following is an example of such a response, where project-number is the number of your project and operation-id is the ID of the long-running operation created for the request. For example VOT12345....


To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


class VideoObjectTrackingCreateDataset {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);

  // Create a dataset
  static void createDataset(String projectId, String displayName) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");
      VideoObjectTrackingDatasetMetadata metadata =
      Dataset dataset =

      Dataset createdDataset = client.createDataset(projectLocation, dataset);

      // Display the dataset information.
      System.out.format("Dataset name: %s%n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s%n", datasetId);


To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

 * TODO(developer): Uncomment these variables before running the sample.
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1beta1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      videoObjectTrackingDatasetMetadata: {},

  // Create dataset
  const [response] = await client.createDataset(request);

  console.log(`Dataset name: ${}`);
    Dataset id: ${
        ['/').length - 1].split('\n')[0]



To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from import automl_v1beta1 as automl

def create_dataset(
    project_id="YOUR_PROJECT_ID", display_name="your_datasets_display_name"
    """Create a automl video object tracking dataset."""
    client = automl.AutoMlClient()

    # A resource that represents Google Cloud Platform location.
    project_location = f"projects/{project_id}/locations/us-central1"
    metadata = automl.VideoObjectTrackingDatasetMetadata()
    dataset = automl.Dataset(

    # Create a dataset with the dataset metadata in the region.
    created_dataset = client.create_dataset(parent=project_location, dataset=dataset)
    # Display the dataset information
    print(f"Dataset name: {}")
    print("Dataset id: {}".format("/")[-1]))

Importing items into a dataset

After you have created a dataset, you can import labeled data from CSV files stored in a Cloud Storage bucket. For details on preparing your data and creating a CSV files for import, see Preparing your training data.

You can import items into an empty dataset or import additional items into an existing dataset.

Web UI

Typically, you import your data when you create your dataset.

However, if you need to import your data after creating your dataset, do the following:

  1. Open the AutoML Video Object Tracking UI. The Datasets page shows the status of previously created datasets for the current project. List of datasets for the project in the Google Cloud console
  2. From the list, click the dataset that you want to import data into.
  3. On the Import tab, provide the Cloud Storage URI of the CSV file that contains the URIs of your training data, without the gs:// prefix at the beginning.
  4. Also on the Import tab for your dataset, click Continue to begin importing. Page for dataset titled 'my_dataset'


For importing your training data, use the importData method. This method requires that you provide two parameters:

Before using any of the request data, make the following replacements:

  • dataset-id: the ID of your dataset. The ID is the last element of the name of your dataset. For example:
    • dataset name: projects/project-number/locations/location-id/datasets/3104518874390609379
    • dataset id: 3104518874390609379
  • bucket-name: replace with the name of the Cloud Storage bucket where you have stored your model training file list CSV file.
  • csv-file-name: replace with the name of your model training file list CSV file.
  • Note:
    • project-number: number of your project
    • location-id: the Cloud region where annotation should take place. Supported cloud regions are: us-east1, us-west1, europe-west1, asia-east1. If no region is specified, a region will be determined based on video file location.

HTTP method and URL:


Request JSON body:

  "inputConfig": {
    "gcsSource": {
      "inputUris": ["gs://bucket-name/csv-file-name.csv"]

To send your request, choose one of these options:


Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-number" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \


Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-number" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "" | Select-Object -Expand Content
You should receive an operation ID for your import data operation. The example shows a response that contains the import operation ID VOT7506374678919774208.


To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import java.util.Arrays;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import org.threeten.bp.Duration;

class ImportDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String path = "gs://BUCKET_ID/path_to_training_data.csv";
    importDataset(projectId, datasetId, path);

  // Import a dataset
  static void importDataset(String projectId, String datasetId, String path)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    Duration totalTimeout = Duration.ofMinutes(45);
    RetrySettings retrySettings = RetrySettings.newBuilder().setTotalTimeout(totalTimeout).build();
    AutoMlSettings.Builder builder = AutoMlSettings.newBuilder();
    AutoMlSettings settings =;

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create(settings)) {
      // Get the complete path of the dataset.
      DatasetName datasetFullId = DatasetName.of(projectId, "us-central1", datasetId);

      // Get multiple Google Cloud Storage URIs to import data from
      GcsSource gcsSource =

      // Import data from the input URI
      InputConfig inputConfig = InputConfig.newBuilder().setGcsSource(gcsSource).build();
      System.out.println("Processing import...");

      // Start the import job
      OperationFuture<Empty, OperationMetadata> operation =
          client.importDataAsync(datasetFullId, inputConfig);

      System.out.format("Operation name: %s%n", operation.getName());

      // If you want to wait for the operation to finish, adjust the timeout appropriately. The
      // operation will still run if you choose not to wait for it to complete. You can check the
      // status of your operation using the operation's name.
      Empty response = operation.get(45, TimeUnit.MINUTES);
      System.out.format("Dataset imported. %s%n", response);
    } catch (TimeoutException e) {
      System.out.println("The operation's polling period was not long enough.");
      System.out.println("You can use the Operation's name to get the current status.");
      System.out.println("The import job is still running and will complete as expected.");
      throw e;


To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

 * TODO(developer): Uncomment these variables before running the sample.
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const datasetId = 'YOUR_DISPLAY_ID';
// const path = 'gs://BUCKET_ID/path_to_training_data.csv';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1beta1;

// Instantiates a client
const client = new AutoMlClient();

async function importDataset() {
  // Construct request
  const request = {
    name: client.datasetPath(projectId, location, datasetId),
    inputConfig: {
      gcsSource: {
        inputUris: path.split(','),

  // Import dataset
  console.log('Proccessing import');
  const [operation] = await client.importData(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(`Dataset imported: ${response}`);



To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from import automl_v1beta1 as automl

def import_dataset(
    """Import a dataset."""
    client = automl.AutoMlClient()
    # Get the full path of the dataset.
    dataset_full_id = client.dataset_path(project_id, "us-central1", dataset_id)
    # Get the multiple Google Cloud Storage URIs
    input_uris = path.split(",")
    gcs_source = automl.GcsSource(input_uris=input_uris)
    input_config = automl.InputConfig(gcs_source=gcs_source)
    # Import data from the input URI
    response = client.import_data(name=dataset_full_id, input_config=input_config)

    print("Processing import...")
    print(f"Data imported. {response.result()}")

Labeling training items

To be useful for training a model, each item in a dataset must contain at least one bounding box and one category label assigned to it. You can provide labels and bounding boxes for your training items in two ways:

  • Include labels and bounding boxes in your CSV file
  • Apply labels and bounding boxes your items in the AutoML Video Object Tracking UI.

For details about labeling items in your CSV file, see Preparing your training data.

To label items in the AutoML Video Object Tracking UI, select the dataset from the dataset listing page to see its details. The display name of the selected dataset appears in the title bar, and the page lists the individual items in the dataset along with their labels. The navigation bar along the left summarizes the number of labeled and unlabeled items. It also enables you to filter the item list by label.

Videos in a dataset

To assign labels and bounding boxes to unlabeled videos or to change video labels and bounding boxes, do the following:

  1. On the page for the dataset, click the video that you want to add labels for.
  2. On the page for the video, do the following:

    1. Run the video until you see the item that you want to label.
    2. Drag the cursor to draw a bounding box around the item.
    3. After drawing the bounding box, select the label that you want to use.
    4. Click Save.

Drawing bounding box around cow in a video

If you need to add a new label for the dataset, on the page for the dataset, above the list of existing labels, click the three dots next to Filter labels and then click Add new label.

Changing labels in data

You can also change the labels applied to videos in a dataset. In the AutoML Video Object Tracking UI, do the following:

  1. On the page for the dataset, click the video that you want to change labels for.
  2. On the page for the video, do the following:

    1. In the list of labels on the left, select the label that you want to change.
    2. On the preview of the video, right-click the bounding box on the video and select the label that you want.
    3. Click Save.

Changing label applied to sedan in a video

Listing datasets

A project can include numerous datasets. This section describes how to retrieve a list of the available datasets for a project.

Web UI

To see a list of the available datasets using the AutoML Video Object Tracking UI, navigate to the Datasets page.

List of datasets in the project

To see the datasets for a different project, select the project from the drop-down list in the upper right of the title bar.


Use the following curl or PowerShell commands to get a list of your datasets and the number of sample videos that were imported into the dataset.

Before using any of the request data, make the following replacements:

  • project-number: the number of your project
  • location-id: the Cloud region where annotation should take place. Supported cloud regions are: us-east1, us-west1, europe-west1, asia-east1. If no region is specified, a region will be determined based on video file location.

HTTP method and URL:


To send your request, choose one of these options:


Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-number" \
" "


Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-number" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri " " | Select-Object -Expand Content
In the response below, VOT3940649673949184000, is the operation ID of the long-running operation created for the request and provided in the response when you started the operation.


To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


class ListDatasets {

  static void listDatasets() throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";

  // List the datasets
  static void listDatasets(String projectId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");
      ListDatasetsRequest request =

      // List all the datasets available in the region by applying filter.
      System.out.println("List of datasets:");
      for (Dataset dataset : client.listDatasets(request).iterateAll()) {
        // Display the dataset information
        System.out.format("%nDataset name: %s%n", dataset.getName());
        // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
        // required for other methods.
        // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
        String[] names = dataset.getName().split("/");
        String retrievedDatasetId = names[names.length - 1];
        System.out.format("Dataset id: %s%n", retrievedDatasetId);
        System.out.format("Dataset display name: %s%n", dataset.getDisplayName());
        System.out.println("Dataset create time:");
        System.out.format("\tseconds: %s%n", dataset.getCreateTime().getSeconds());
        System.out.format("\tnanos: %s%n", dataset.getCreateTime().getNanos());

            "Video object tracking dataset metadata: %s%n",


To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

 * TODO(developer): Uncomment these variables before running the sample.
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1beta1;

// Instantiates a client
const client = new AutoMlClient();

async function listDatasets() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    filter: 'translation_dataset_metadata:*',

  const [response] = await client.listDatasets(request);

  console.log('List of datasets:');
  for (const dataset of response) {
    console.log(`Dataset name: ${}`);
      `Dataset id: ${'/')['/').length - 1]
    console.log(`Dataset display name: ${dataset.displayName}`);
    console.log('Dataset create time');
    console.log(`\tseconds ${dataset.createTime.seconds}`);
    console.log(`\tnanos ${dataset.createTime.nanos / 1e9}`);

      `Video object tracking dataset metadata: ${dataset.videoObjectTrackingDatasetMetadata}`



To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from import automl_v1beta1 as automl

def list_datasets(project_id="YOUR_PROJECT_ID"):
    """List datasets."""
    client = automl.AutoMlClient()
    # A resource that represents Google Cloud Platform location.
    project_location = f"projects/{project_id}/locations/us-central1"

    # List all the datasets available in the region.
    request = automl.ListDatasetsRequest(parent=project_location, filter="")
    response = client.list_datasets(request=request)

    print("List of datasets:")
    for dataset in response:
        print(f"Dataset name: {}")
        print("Dataset id: {}".format("/")[-1]))
        print(f"Dataset display name: {dataset.display_name}")
        print(f"Dataset create time: {dataset.create_time}")
            "Video object tracking dataset metadata: {}".format(

Deleting a dataset

The following code demonstrates how to delete a dataset.

Web UI

  1. Navigate to the Datasets page in the AutoML Video Object Tracking UI.

    Datasets tab
  2. Click the three-dot menu at the far right of the row that you want to delete and select Delete dataset.
  3. Click Confirm in the confirmation dialog box.


Before using any of the request data, make the following replacements:

  • project-number: the number of your project
  • location-id: the Cloud region where annotation should take place. Supported cloud regions are: us-east1, us-west1, europe-west1, asia-east1. If no region is specified, a region will be determined based on video file location.
  • datase-id: replace with the identifier for your dataset id.

HTTP method and URL:


To send your request, expand one of these options:

You should receive a successful status code (2xx) and an empty response.


To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import java.util.concurrent.ExecutionException;

class DeleteDataset {

  static void deleteDataset() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    deleteDataset(projectId, datasetId);

  // Delete a dataset
  static void deleteDataset(String projectId, String datasetId)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the full path of the dataset.
      DatasetName datasetFullId = DatasetName.of(projectId, "us-central1", datasetId);
      Empty response = client.deleteDatasetAsync(datasetFullId).get();
      System.out.format("Dataset deleted. %s%n", response);


To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

 * TODO(developer): Uncomment these variables before running the sample.
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const datasetId = 'YOUR_DATASET_ID';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1beta1;

// Instantiates a client
const client = new AutoMlClient();

async function deleteDataset() {
  // Construct request
  const request = {
    name: client.datasetPath(projectId, location, datasetId),

  const [operation] = await client.deleteDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(`Dataset deleted: ${response}`);



To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from import automl_v1beta1 as automl

def delete_dataset(project_id="YOUR_PROJECT_ID", dataset_id="YOUR_DATASET_ID"):
    """Delete a dataset."""
    client = automl.AutoMlClient()
    # Get the full path of the dataset
    dataset_full_id = client.dataset_path(project_id, "us-central1", dataset_id)
    response = client.delete_dataset(name=dataset_full_id)

    print(f"Dataset deleted. {response.result()}")