Managing datasets
A dataset contains representative samples of the type of content you want to classify, labeled with the category labels you want your custom model to use. The dataset serves as the input for training a model.
The main steps for building a dataset are:
- Create a dataset and specify whether to allow multiple labels on each item.
- Import data items into the dataset.
- Label the items.
A project can have multiple datasets, each used to train a separate model. You can get a list of the available datasets and can delete datasets you no longer need.
Creating a dataset
The first step in creating a custom model is to create an empty dataset that will eventually hold the training data for the model.
Web UI
The AutoML Video UI enables you to create a new dataset and import items into it from the same page.- Open the AutoML Video UI.
The Datasets page shows the status of previously created
datasets for the current project.
To add a dataset for a different project, select the project from the drop-down list in the upper right of the title bar.
- On the Datasets page, click Create Dataset.
The following screen appears: - Enter information about the dataset:
- Specify a name for this dataset.
- Select Video Classification.
- Click Create Dataset.
The following screen appears:
- Specify a name for this dataset.
- Enter the following information:
- Provide the Cloud Storage URI of the CSV file that contains the URIs of
your training data (see Prepare data).
In this quickstart, use:
automl-video-demo-data/hmdb_split1.csv
- Click Continue to begin importing your data.
The following screen appears:
- Provide the Cloud Storage URI of the CSV file that contains the URIs of
your training data (see Prepare data).
The import process can take a while to complete, depending on the number and length of the videos that you've provided.
REST
Before using any of the request data, make the following replacements:
- dataset-name: name of the dataset to show in the interface
- Note:
- project-number: number of your project
- location-id: the Cloud region where annotation
should take place. Supported cloud regions are:
us-east1
,us-west1
,europe-west1
,asia-east1
. If no region is specified, a region will be determined based on video file location.
HTTP method and URL:
POST https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets
Request JSON body:
{ "displayName": "dataset-name", "videoClassificationDatasetMetadata": { } }
To send your request, choose one of these options:
curl
Save the request body in a file called request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-number" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
" https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets"
PowerShell
Save the request body in a file called request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-number" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri " https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets" | Select-Object -Expand Content
name
for your
operation. The following shows an example of such a response, where
project-number
is the number of your project and operation-id
is the ID of the
long-running operation created for the request.
Java
Node.js
Python
Importing items into a dataset
After you have created a dataset, you can import labeled data from CSV files stored in a Cloud Storage bucket. For details on preparing your data and creating a CSV files for import, see Preparing your training data.
You can import items into an empty dataset or import additional items into an existing dataset.
Web UI
Your data is imported when you create your dataset.REST
Before using any of the request data, make the following replacements:
- input-uri: a Cloud Storage bucket that contains the file you want to annotate,
including the file name. Must start with gs://. For example:
"inputUris": ["gs://automl-video-demo-data/hmdb_split1.csv"]
- dataset-id: replace with the dataset identifier for your dataset (not the
display name). For example:
VCN4798585402963263488
- Note:
- project-number: number of your project
- location-id: the Cloud region where annotation
should take place. Supported cloud regions are:
us-east1
,us-west1
,europe-west1
,asia-east1
. If no region is specified, a region will be determined based on video file location.
HTTP method and URL:
POST https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets/dataset-id:importData
Request JSON body:
{ "inputConfig": { "gcsSource": { "inputUris": input-uri } } }
To send your request, choose one of these options:
curl
Save the request body in a file called request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-number" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
" https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets/dataset-id:importData"
PowerShell
Save the request body in a file called request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-number" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri " https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets/dataset-id:importData" | Select-Object -Expand Content
VCN7506374678919774208
.
You can use the operation ID to get the status of the task. For an example, see Getting the status of an operation.
Java
Node.js
Python
Labeling training items
To be useful for training a model, each item in a dataset must have at least one category label assigned to it. AutoML Video ignores items without a category label. You can provide labels for your training items in two ways:
- Include labels in your CSV file
- Label your items in the AutoML Video UI
For details about labeling items in your CSV file, see Preparing your training data.
To label items in the AutoML Video UI, select the dataset from the dataset listing page to see its details. The display name of the selected dataset appears in the title bar, and the page lists the individual items in the dataset along with their labels. The navigation bar along the left summarizes the number of labeled and unlabeled items. It also enables you to filter the item list by label.
To assign labels to unlabeled videos or change video labels, do the following:
- On the page for the dataset, click the video that you want to add or change labels for.
On the page for the video, do the following:
- Click Add Segment.
- Drag the arrows on either side of the video timeline to define the region that you want to label. By default, the entire duration of the video is selected.
- From the list of labels, click the labels that you want to apply to the video. The color bar for the label turns solid after you select it.
- Click Save.
If you need to add a new label for the dataset, on the page for the dataset, above the list of existing labels, click the three dots next to Filter labels and then click Add new label.
Listing datasets
A project can include numerous datasets. This section describes how to retrieve a list of the available datasets for a project.
Web UI
To see a list of the available datasets using the AutoML Video UI, navigate to the Datasets page.To see the datasets for a different project, select the project from the drop-down list in the upper right of the title bar.
REST
Before using any of the request data, make the following replacements:
- project-number: number of your project
- location-id: the Cloud region where annotation
should take place. Supported cloud regions are:
us-east1
,us-west1
,europe-west1
,asia-east1
. If no region is specified, a region is determined based on video file location.
HTTP method and URL:
https://automl.googleapis.com/v1beta1/projects/project-number/locations/location-id/datasets
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
Java
Node.js
Python
Deleting a dataset
The following code demonstrates how to delete a dataset.
Web UI
-
Navigate to the Datasets page in the AutoML Video UI.
- Click the three-dot menu at the far right of the row that you want to delete and select Delete dataset.
- Click Confirm in the confirmation dialog box.
REST
Before using any of the request data, make the following replacements:
- dataset-name: the full name of your dataset, from the response when you created
the dataset. The full name has the format:
projects/project-number/locations/location-id/datasets/dataset-id
- project-number: number of your project
- location-id: the Cloud region where annotation
should take place. Supported cloud regions are:
us-east1
,us-west1
,europe-west1
,asia-east1
. If no region is specified, a region is determined based on video file location. - dataset-id: the id provided when you created the dataset
HTTP method and URL:
DELETE https://automl.googleapis.com/v1beta1/dataset-name
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
Java
Node.js
Python