A project can have multiple datasets, each used to train a separate model. You can get a list of the available datasets, get a specific dataset, export a dataset and can delete a dataset you no longer need.
Listing datasets
This section describes how to retrieve a list of the available datasets for a project.
Web UI
To see a list of the available datasets using the Vision Dashboard, click the Datasets link at the top of the left navigation menu.
To see the datasets for a different project, select the project from the list in the drop-down located in the upper right of the title bar.
REST
Before using any of the request data, make the following replacements:
- project-id: your GCP project ID.
HTTP method and URL:
GET https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "datasets": [ { "name": "projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID", "displayName": "my_new_dataset", "createTime": "2019-10-29T17:31:12.010290Z", "etag": "AB3BwFpNUaguCwKeQWtUKLBPQhZr7omCCUBz77pACPIINFpyFe7vbGhp9oZLEEGhIeM=", "exampleCount": 3667, "imageClassificationDatasetMetadata": { "classificationType": "MULTICLASS" } }, { "name": "projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID", "displayName": "new_dataset", "createTime": "2019-10-02T00:44:57.821275Z", "etag": "AB3BwFpU_ueMZtTD_8dt-9r8BWqunqMC76YbAbmQYQsQEbtQTxs6U3rPpgAMDCXhYPGq", "imageClassificationDatasetMetadata": { "classificationType": "MULTICLASS" } } ] }
Go
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Java
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Node.js
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Python
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the AutoML Vision reference documentation for Ruby.
Get a dataset
You can also get a specific dataset using a dataset ID.
Web UI
To see a list of the available datasets using the AutoML Vision UI, click the Datasets link at the top of the left navigation menu.
To see the datasets for a different project, select the project from the drop-down list on the left side of the title bar.
Access a specific dataset by selecting its name from the list.
REST
Before using any of the request data, make the following replacements:
- project-id: your GCP project ID.
- dataset-id: the ID of your dataset. The ID is the last element of the name
of your dataset. For example:
- dataset name:
projects/project-id/locations/location-id/datasets/3104518874390609379
- dataset id:
3104518874390609379
- dataset name:
HTTP method and URL:
GET https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID", "displayName": "DISPLAY_NAME", "createTime": "2019-10-29T17:31:12.010290Z", "etag": "AB3BwFoP09ffuRNnaWMx4UGi8uvYFctvOBjns84OercuMRIdXr0YINNiUqeW85SB3g4=", "exampleCount": 3667, "imageClassificationDatasetMetadata": { "classificationType": "MULTICLASS" } }
Go
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Java
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Node.js
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Python
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Export a dataset
You can export a CSV file with all a dataset's information to a Google Cloud Storage bucket.
Web UI
To export a non-empty dataset, complete the following steps:
Select the non-empty dataset from the Datasets page.
Selecting the non-empty dataset will take you to the Dataset details page.
Select the Export data option at the top of the Dataset details page.
This opens a window where you can choose a Google Cloud Storage bucket location, or create a new bucket and designate it as the location to store the CSV file.
Select Export CSV after you have selected a new or existing Google Cloud Storage bucket location.
You will receive an email when the data export process has completed.
REST
Before using any of the request data, make the following replacements:
- project-id: your GCP project ID.
- dataset-id: the ID of your dataset. The ID is the last element of the name
of your dataset. For example:
- dataset name:
projects/project-id/locations/location-id/datasets/3104518874390609379
- dataset id:
3104518874390609379
- dataset name:
- output-storage-bucket: a Google Cloud Storage
bucket/directory to save output files to, expressed in the following form:
gs://bucket/directory/
. The requesting user must have write permission to the bucket.
HTTP method and URL:
POST https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID:exportData
Request JSON body:
{ "outputConfig": { "gcsDestination": { "outputUriPrefix": "CLOUD_STORAGE_BUCKET" } } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID:exportData"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID:exportData" | Select-Object -Expand Content
You should see output similar to the following. You can use the operation ID to get the status of the task. For an example, see Working with long-running operations
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata", "createTime": "2019-08-12T18:52:31.637075Z", "updateTime": "2019-08-12T18:52:31.637075Z", "exportDataDetails": { "outputInfo": { "gcsOutputDirectory": "CLOUD_STORAGE_BUCKET/export_data-DATASET_NAME-TIMESTAMP_OF_EXPORT_CALL/" } } } }
Java
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Node.js
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Python
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Exported CSV format
The exported CSV file contains the same formatting as the training data import CSV:
set,path,label0[,label1,label2,...]
This CSV file is saved in a created export folder that is distinguished by a unique timestamp. Below are some sample lines from an exported CSV file:
my-storage-bucket/export_data-my_dataset_name-2019-11-08T22:28:13.081Z/image_classification_1.csv
:
TRAIN,gs://my-storage-bucket/export_data-my_dataset_name-2019-11-08T22:28:13.081Z/files/img874.jpg,dandelion VALIDATION,gs://my-storage-bucket/export_data-my_dataset_name-2019-11-08T22:28:13.081Z/files/img447.jpg,roses TRAIN,gs://my-storage-bucket/export_data-my_dataset_name-2019-11-08T22:28:13.081Z/files/img672.jpg,dandelion VALIDATION,gs://my-storage-bucket/export_data-my_dataset_name-2019-11-08T22:28:13.081Z/files/img421.jpg,sunflowers TRAIN,gs://my-storage-bucket/export_data-my_dataset_name-2019-11-08T22:28:13.081Z/files/img495.jpg,tulips TEST,gs://my-storage-bucket/export_data-my_dataset_name-2019-11-08T22:28:13.081Z/files/img014.jpg,sunflowers
Deleting a dataset
You can delete a dataset you no longer need in the UI, or with the dataset's ID via the following code samples.
Web UI
In the Vision Dashboard, click the Datasets link at the top of the left navigation menu to display the list of available datasets.
Click the three-dot menu at the far right of the row you want to delete and select Delete dataset.
Click Delete in the confirmation dialog box.
REST
Before using any of the request data, make the following replacements:
- project-id: your GCP project ID.
- dataset-id: the ID of your dataset. The ID is the last element of the name
of your dataset. For example:
- dataset name:
projects/project-id/locations/location-id/datasets/3104518874390609379
- dataset id:
3104518874390609379
- dataset name:
HTTP method and URL:
DELETE https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID
To send your request, choose one of these options:
curl
Execute the following command:
curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
"https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/datasets/DATASET_ID" | Select-Object -Expand Content
You should see output similar to the following. You can use the operation ID to get the status of the task. For an example, see Working with long-running operations
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata", "createTime": "2019-11-08T22:37:19.822128Z", "updateTime": "2019-11-08T22:37:19.822128Z", "deleteDetails": {} }, "done": true, "response": { "@type": "type.googleapis.com/google.protobuf.Empty" } }
Go
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Java
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Node.js
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.
Python
Before trying this sample, follow the setup instructions for this language on the Client Libraries page.