Managing datasets
A dataset contains representative samples of the type of content you want to label, with the bounding box labels you want your model to use. The dataset serves as the input for training a model.
The main steps for building a dataset are:
- Create a dataset and specify whether to allow multiple labels on each item.
- Import data items into the dataset.
Before training, be sure that you prepare your data before training a model.
A project can have multiple datasets, each used to train a separate model. You can get a list of the available datasets and can delete datasets you no longer need.
Creating a dataset
The first step in creating a model is to create an empty dataset that will eventually hold the training data for the model.
Web UI
The AutoML Video Object Tracking UI enables you to create a new dataset and import items into it from the same page.- Open the AutoML Video Object Tracking UI. The Datasets page shows the status of previously created datasets for the current project. To add a dataset for a different project, select the project from the drop-down list in the upper right of the title bar.
- On the Datasets page, click Create Dataset.
- In the Create new dataset dialog, do the following:
- Specify a name for this dataset.
- Select Video Object Tracking.
- Click Create Dataset.
- On the page for your dataset, provide the Cloud Storage URI of
the CSV file that contains the URIs of your training data,
without the
gs://
prefix at the beginning. - Also on the page for your dataset, click Continue to begin importing.
REST
The following example creates a dataset named my_dataset01
that
supports object tracking use cases. The newly created dataset doesn't
contain any data until you import items into it.
Save the "name"
of the new dataset (from the response) for use with
other operations, such as importing items into your dataset and
training a model.
Before using any of the request data, make the following replacements:
- dataset-name: the name of your target dataset.
For example,my_dataset_01
- Note:
- project-number: number of your project
- location-id: the Cloud region where annotation
should take place. Supported cloud regions are:
us-east1
,us-west1
,europe-west1
,asia-east1
. If no region is specified, a region will be determined based on video file location.
HTTP method and URL:
POST https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets
Request JSON body:
{ "displayName": "dataset-name", "videoObjectTrackingDatasetMetadata": { } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-number" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-number" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets" | Select-Object -Expand Content
VOT12345....
Java
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Importing items into a dataset
After you have created a dataset, you can import labeled data from CSV files stored in a Cloud Storage bucket. For details on preparing your data and creating a CSV files for import, see Preparing your training data.
You can import items into an empty dataset or import additional items into an existing dataset.
Web UI
Typically, you import your data when you create your dataset.However, if you need to import your data after creating your dataset, do the following:
- Open the AutoML Video Object Tracking UI. The Datasets page shows the status of previously created datasets for the current project.
- From the list, click the dataset that you want to import data into.
- On the Import tab, provide the Cloud Storage URI of
the CSV file that contains the URIs of your training data,
without the
gs://
prefix at the beginning. - Also on the Import tab for your dataset, click Continue to begin importing.
REST
For importing your training data, use theimportData
method. This
method requires that you provide two parameters:
Before using any of the request data, make the following replacements:
- dataset-id: the ID of your dataset. The ID is the last element of the name of your
dataset. For example:
- dataset name:
projects/project-number/locations/location-id/datasets/3104518874390609379
- dataset id:
3104518874390609379
- dataset name:
- bucket-name: replace with the name of the Cloud Storage bucket where you have stored your model training file list CSV file.
- csv-file-name: replace with the name of your model training file list CSV file.
- Note:
- project-number: number of your project
- location-id: the Cloud region where annotation
should take place. Supported cloud regions are:
us-east1
,us-west1
,europe-west1
,asia-east1
. If no region is specified, a region will be determined based on video file location.
HTTP method and URL:
POST https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets/dataset-id:importData
Request JSON body:
{ "inputConfig": { "gcsSource": { "inputUris": ["gs://bucket-name/csv-file-name.csv"] } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-number" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets/dataset-id:importData"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-number" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets/dataset-id:importData" | Select-Object -Expand Content
VOT7506374678919774208
.
Java
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Labeling training items
To be useful for training a model, each item in a dataset must contain at least one bounding box and one category label assigned to it. You can provide labels and bounding boxes for your training items in two ways:
- Include labels and bounding boxes in your CSV file
- Apply labels and bounding boxes your items in the AutoML Video Object Tracking UI.
For details about labeling items in your CSV file, see Preparing your training data.
To label items in the AutoML Video Object Tracking UI, select the dataset from the dataset listing page to see its details. The display name of the selected dataset appears in the title bar, and the page lists the individual items in the dataset along with their labels. The navigation bar along the left summarizes the number of labeled and unlabeled items. It also enables you to filter the item list by label.
To assign labels and bounding boxes to unlabeled videos or to change video labels and bounding boxes, do the following:
- On the page for the dataset, click the video that you want to add labels for.
On the page for the video, do the following:
- Run the video until you see the item that you want to label.
- Drag the cursor to draw a bounding box around the item.
- After drawing the bounding box, select the label that you want to use.
- Click Save.
If you need to add a new label for the dataset, on the page for the dataset, above the list of existing labels, click the three dots next to Filter labels and then click Add new label.
Changing labels in data
You can also change the labels applied to videos in a dataset. In the AutoML Video Object Tracking UI, do the following:
- On the page for the dataset, click the video that you want to change labels for.
On the page for the video, do the following:
- In the list of labels on the left, select the label that you want to change.
- On the preview of the video, right-click the bounding box on the video and select the label that you want.
- Click Save.
Listing datasets
A project can include numerous datasets. This section describes how to retrieve a list of the available datasets for a project.
Web UI
To see a list of the available datasets using the AutoML Video Object Tracking UI, navigate to the Datasets page.
To see the datasets for a different project, select the project from the drop-down list in the upper right of the title bar.
REST
Use the followingcurl
or PowerShell commands to get a list of your datasets
and the number of sample videos that were imported into the dataset.
Before using any of the request data, make the following replacements:
- project-number: the number of your project
- location-id: the Cloud region where annotation should take
place. Supported cloud regions are:
us-east1
,us-west1
,europe-west1
,asia-east1
. If no region is specified, a region will be determined based on video file location.
HTTP method and URL:
GET https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-number" \
"https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets "
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-number" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets " | Select-Object -Expand Content
VOT3940649673949184000
, is the operation ID of the
long-running operation created for the request and provided in the response when you started the
operation.
Java
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Deleting a dataset
The following code demonstrates how to delete a dataset.
Web UI
-
Navigate to the Datasets page in the AutoML Video Object Tracking UI.
- Click the three-dot menu at the far right of the row that you want to delete and select Delete dataset.
- Click Confirm in the confirmation dialog box.
REST
Before using any of the request data, make the following replacements:
- project-number: the number of your project
- location-id: the Cloud region where annotation should take
place. Supported cloud regions are:
us-east1
,us-west1
,europe-west1
,asia-east1
. If no region is specified, a region will be determined based on video file location. - datase-id: replace with the identifier for your dataset id.
HTTP method and URL:
DELETE https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets/dataset-id
To send your request, expand one of these options:
You should receive a successful status code (2xx) and an empty response.
Java
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To authenticate to AutoML Video Object Tracking, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.