A dataset contains representative samples of the type of content you want to classify, labeled with the category labels you want your custom model to use. The dataset serves as the input for training a model.
The main steps for building a dataset are:
- Create a dataset and specify whether to allow multiple labels on each item.
- Import data items into the dataset.
- Label the items.
When you import items with already-assigned labels, steps 2 and 3 are combined.
Creating a dataset
The first step in creating a custom model is to create an empty dataset that will eventually hold the training data for the model. When you create a dataset, you specify the type of classification you want your custom model to perform:
- MULTICLASS assigns a single label to each classified image
- MULTILABEL allows an image to be assigned multiple labels
As of the v1 version of the AutoML API this request returns the ID of a long-running operation.
After the long-running operation completes you can import images into it. The newly created dataset doesn't contain any data until you import images into it.
Save the dataset ID of the new dataset (from the response) for use with other operations, such as importing images into your dataset and training a model.
Web UI
Open the Vision Dashboard.
You can also access this page from the console via the left navigation menu item Artificial Intelligence > Vision. This will take you to the integrated Vision dashboard. Select the AutoML Vision card.
Select Datasets from the left navigation menu.
Select the New Dataset button at the top, update the dataset name (optional), and select
single-label or multi-label classification based on the data you have.After specifying the classification type, select Create Dataset.
On the Create Dataset page you can choose a CSV file from Google Cloud Storage, or local image files to import into the dataset.
Select Continue to begin image import into your dataset. While import occurs the dataset will show a status of Running: Importing images.
You receive an email when import has finished.
REST
The following example creates a dataset that supports one label per item (see MULTICLASS).
The newly created dataset doesn't contain any data until you import items into it.
Save the "name"
of the new dataset (from the response) for use with other
operations, such as importing items into your dataset and training a model.
Before using any of the request data, make the following replacements:
- project-id: your GCP project ID.
- display-name: a string display name of your choosing.
HTTP method and URL:
POST https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets
Request JSON body:
{ "displayName": "DISPLAY_NAME", "imageClassificationDatasetMetadata": { "classificationType": "MULTICLASS" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets" | Select-Object -Expand Content
You should see output similar to the following. You can use the operation ID
(ICN3819960680614725486
, in this case) to get the status of the task. For an
example, see Working with long-running operations:
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/ICN3819960680614725486", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata", "createTime": "2019-11-14T16:49:13.667526Z", "updateTime": "2019-11-14T16:49:13.667526Z", "createDatasetDetails": {} } }
After the long-running operation completes you can get the dataset's ID with the same operation status request. The response should look similar to the following:
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/ICN3819960680614725486", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata", "createTime": "2019-11-14T16:49:13.667526Z", "updateTime": "2019-11-14T16:49:17.975314Z", "createDatasetDetails": {} }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.automl.v1.Dataset", "name": "projects/PROJECT_ID/locations/us-central1/datasets/ICN5496445433112696489" } }
Go
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Java
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Node.js
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Python
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for Ruby.
Importing items into a dataset
After you have created a dataset, you can import item URIs and labels for items from a CSV file stored in a Google Cloud Storage bucket. For details on preparing your data and creating a CSV file for import, see Preparing your training data.
You can import items into an empty dataset or import additional items into an existing dataset.
Web UI
The AutoML Vision UI enables you to create a new dataset and import items into it from the same page; see Creating a dataset. The steps below import items into an existing dataset.
Open the Vision Dashboard and select the dataset from the Datasets page.
On the Images page, click Add items in the title bar and select the import method from the drop-down list.
You can:
Upload a .csv file that contains the training images and their associated category labels from your local computer or from Google Cloud Storage.
Upload .txt or .zip files that contain the training images from your local computer.
Select the file(s) to import.
REST
Before using any of the request data, make the following replacements:
- project-id: your GCP project ID.
- dataset-id: the ID of your dataset. The ID is the last element of the name
of your dataset. For example:
- dataset name:
projects/project-id/locations/location-id/datasets/3104518874390609379
- dataset id:
3104518874390609379
- dataset name:
- input-storage-path: the path to a CSV file stored on Google Cloud Storage. The requesting user must have at least read permission to the bucket.
HTTP method and URL:
POST https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID:importData
Request JSON body:
{ "inputConfig": { "gcsSource": { "inputUris": [INPUT_STORAGE_PATH] } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID:importData"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID:importData" | Select-Object -Expand Content
You should see output similar to the following. You can use the operation ID
(ICN3819960680614725486
, in this case) to get the status of the task. For an
example, see Working with long-running operations.
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata", "createTime": "2018-10-29T15:56:29.176485Z", "updateTime": "2018-10-29T15:56:29.176485Z", "importDataDetails": {} } }
Go
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Java
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Node.js
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Python
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Labeling training items
To be useful for training a model, each item in a dataset must have at least one category label assigned to it. AutoML Vision ignores items without a category label. You can provide labels for your training items in three ways:
- Include labels in your .csv file
- For details about labeling items in your .csv file, see Preparing your training data.
- Label your items in the AutoML Vision UI
- Request labeling from human labeling service such as Google AI Platform Data Labeling Service.
Labeling in the UI
Web UI
To label items in the AutoML Vision UI, select the dataset from the Datasets listing page to see its details.
The side bar summarizes the number of labeled and unlabeled items. Here you can filter the item list by label, or select Add new label to create a new label.
From this screen you can also add or change an image's label.
Select an image to add or change its label.
Request labeling
You can leverage Google's AI Platform Data Labeling Service service to label your images. See the product documentation for more information.
Working with long-running operations
You can get the status of a long-running operation by using the following code samples.
REST
Before using any of the request data, make the following replacements:
- project-id: your GCP project ID.
- operation-id: the ID of your operation. The ID is the last element of the name
of your operation. For example:
- operation name:
projects/project-id/locations/location-id/operations/IOD5281059901324392598
- operation id:
IOD5281059901324392598
- operation name:
HTTP method and URL:
GET https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID" | Select-Object -Expand Content
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata", "createTime": "2018-10-29T15:56:29.176485Z", "updateTime": "2018-10-29T16:10:41.326614Z", "importDataDetails": {} }, "done": true, "response": { "@type": "type.googleapis.com/google.protobuf.Empty" } }
You should see output similar to the following for a completed create model operation:
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata", "createTime": "2019-07-22T18:35:06.881193Z", "updateTime": "2019-07-22T19:58:44.972235Z", "createModelDetails": {} }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.automl.v1.Model", "name": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID" } }
Go
Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.
Java
Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.
Node.js
Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.
Python
Before trying this sample, follow the setup instructions for this language on the APIs & Reference > Client Libraries page.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for Ruby.