This page describes how to create a dataset and import your tabular data into it. You can then use AutoML Tables to train a model on that dataset.
Introduction
A dataset is a Google Cloud object that contains your source table data, along with schema information that determines model training parameters. The dataset serves as the input for training a model.
A project can have multiple datasets. You can get a list of the available datasets and can delete datasets you no longer need.
When you update a dataset or its schema information, you affect any future model that uses that dataset. Models that have already begun training are unaffected.
Before you begin
Before you can use AutoML Tables, you must have set up your project as described in Before you begin. Before you can create a dataset, you must have created your training data as described in Preparing your training data.
Creating a dataset
Console
Visit the AutoML Tables page in the Google Cloud console to begin the process of creating your dataset.
Select Datasets, and then select New dataset.
Enter the name of your dataset and specify the Region where the dataset will be created.
For more information, see Locations.
Click Create dataset.
The Import tab is displayed. You can now import your data.
REST
To create a dataset, you use the datasets.create method.
Before using any of the request data, make the following replacements:
-
endpoint:
automl.googleapis.com
for the global location, andeu-automl.googleapis.com
for the EU region. - project-id: your Google Cloud project ID.
- location: the location for the resource:
us-central1
for Global oreu
for the European Union. - dataset-display-name: the display name of your dataset.
HTTP method and URL:
POST https://endpoint/v1beta1/projects/project-id/locations/location/datasets
Request JSON body:
{ "displayName": "dataset-display-name", "tablesDatasetMetadata": { }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://endpoint/v1beta1/projects/project-id/locations/location/datasets"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://endpoint/v1beta1/projects/project-id/locations/location/datasets" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/1234/locations/us-central1/datasets/TBL6543", "displayName": "sample_dataset", "createTime": "2019-12-23T23:03:34.139313Z", "updateTime": "2019-12-23T23:03:34.139313Z", "etag": "AB3BwFq6VkX64fx7z2Y4T4z-0jUQLKgFvvtD1RcZ2oikA=", "tablesDatasetMetadata": { "areStatsFresh": true "statsUpdateTime": "1970-01-01T00:00:00Z", "tablesDatasetType": "BASIC" } }
Save the name
of the new dataset (from the response) for use with other
operations, such as importing items into your dataset and training a model.
You can now import your data.
Java
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Node.js
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Python
The client library for AutoML Tables includes additional Python methods that simplify using the AutoML Tables API. These methods refer to datasets and models by name instead of id. Your dataset and model names must be unique. For more information, see the Client reference.
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Importing data into a dataset
You cannot import data into a dataset that already contains data. You must first create a new dataset.
Console
If needed, select your dataset from list on the Datasets page to open its Import tab.
Choose the import source for your data: BigQuery, Cloud Storage, or your local computer. Provide the information required.
If you load your CSV files from your local computer, you must provide a Cloud Storage bucket. Your files are loaded to that bucket before they are imported into AutoML Tables. The files remain there after the data import unless you remove them.
The bucket must be in the same location as your dataset. Learn more.
Click Import to start the import process.
When the import process finishes, the Train tab is displayed, and you are ready to train your model.
REST
Import your data, using the datasets.importData method.
Make sure your import source conforms to the requirements described in Preparing your import source.
Before using any of the request data, make the following replacements:
-
endpoint:
automl.googleapis.com
for the global location, andeu-automl.googleapis.com
for the EU region. - project-id: your Google Cloud project ID.
- location: the location for the resource:
us-central1
for Global oreu
for the European Union. - dataset-id: the ID of your dataset. For example,
TBL6543
. - input-config: your data source location information:
- For BigQuery: { "bigquerySource": { "inputUri": "bq://projectId.bqDatasetId.bqTableId } }"
- For Cloud Storage: { "gcsSource": { "inputUris": ["gs://bucket-name/csv-file-name.csv"] } }
HTTP method and URL:
POST https://endpoint/v1beta1/projects/project-id/locations/location/datasets/dataset-id:importData
Request JSON body:
{ "inputConfig": input-config, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: project-id" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://endpoint/v1beta1/projects/project-id/locations/location/datasets/dataset-id:importData"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://endpoint/v1beta1/projects/project-id/locations/location/datasets/dataset-id:importData" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/292381/locations/us-central1/operations/TBL6543", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata", "createTime": "2019-12-26T20:42:06.092180Z", "updateTime": "2019-12-26T20:42:06.092180Z", "cancellable": true, "worksOn": [ "projects/292381/locations/us-central1/datasets/TBL6543" ], "importDataDetails": {}, "state": "RUNNING" } }
Importing data into a dataset is a long-running operation. You can poll for the operation status or wait for the operation to return. Learn more.
When the import process is complete, you are ready to train your model.
Java
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Node.js
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
Python
The client library for AutoML Tables includes additional Python methods that simplify using the AutoML Tables API. These methods refer to datasets and models by name instead of id. Your dataset and model names must be unique. For more information, see the Client reference.
If your resources are located in the EU region, you must explicitly set the endpoint. Learn more.
What's next
- Train your model.
- Manage your datasets.
- Learn more about using long-running operations.