Method: projects.locations.datasets.importData

Imports data into a dataset.

You can only call this method for an empty Dataset.

For more information, see Importing items into a dataset

HTTP request

POST https://automl.googleapis.com/v1beta1/{name}:importData

Path parameters

Parameters
name

string

Required. Dataset name. Dataset must already exist. All imported annotations and examples will be added.

Authorization requires the following Google IAM permission on the specified resource name:

  • automl.datasets.import

Request body

The request body contains data with the following structure:

JSON representation
{
  "inputConfig": {
    object(InputConfig)
  }
}
Fields
inputConfig

object(InputConfig)

Required. The desired input location and its domain specific semantics, if any.

Response body

If successful, the response body contains an instance of Operation.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

InputConfig

Input configuration for datasets.importData action.

The format of input depends on dataset_metadata the Dataset into which the import is happening has. As input source the gcsSource is expected, unless specified otherwise. If a file with identical content (even if it had different GCS_FILE_PATH) is mentioned multiple times , then its label, bounding boxes etc. are appended. The same file should be always provided with the same ML_USE and GCS_FILE_PATH, if it is not then these values are nondeterministically selected from the given ones.

The formats are represented in EBNF with commas being literal and with non-terminal symbols defined near the end of this comment. The formats are:

See Preparing your training data for more information.

A CSV file(s) with each line in format:

ML_USE,GCS_FILE_PATH
  • ML_USE - Identifies the data set that the current row (file) applies to. This value can be one of the following:

    • TRAIN - Rows in this file are used to train the model.
    • TEST - Rows in this file are used to test the model during training.
    • UNASSIGNED - Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing.
  • GCS_FILE_PATH - Identifies a file stored in Google Cloud Storage that contains the model training information.

After the training data set has been determined from the TRAIN and UNASSIGNED CSV files, the training data is divided into train and validation data sets. 70% for training and 30% for validation.

Each CSV file specified using the GCS_FILE_PATH field has the following format:

GCS_FILE_PATH,LABEL,TIME_SEGMENT_START,TIME_SEGMENT_END
  • GCS_FILE_PATH - The path to a video stored in Google Cloud Storage. The video can be up to 1h duration. Supported extensions: .MOV, .MPEG4, .MP4, .AVI.

  • LABEL - A label that identifies the object of the video segment.

  • TIME_SEGMENT_START and TIME_SEGMENT_END - The start and end timestamps in seconds for the segment of video to be annotated. The values must be within the length of the video, and TIME_SEGMENT_END must be after the TIME_SEGMENT_START.

You can specify videos in the CSV file without any labels. You must then use the AutoML Video Intelligence UI to apply labels to the video before you train your model. To specify a video segment in this way, provide the Google Cloud Storage URI for the video followed by three commas.

Sample file:

TRAIN,gs:folder/train_videos.csv
TEST,gs:folder/test_videos.csv
UNASSIGNED,gs:folder/other_videos.csv

Here is an example of the format of one of the CSV files identified by the gcsSource "top level" file.

gs://folder/video1.avi,car,120,180.000021
gs://folder/video1.avi,bike,150,180.000021
gs://folder/vid2.avi,car,0,60.5
gs://folder/vid3.avi,,,

Errors:

If any of the provided CSV files can't be parsed or if more than certain percent of CSV rows cannot be processed then the operation fails and nothing is imported. Regardless of overall success or failure the per-row failures, up to a certain count cap, will be listed in Operation.metadata.partial_failures.

JSON representation
{
  "gcsSource": {
    object(GcsSource)
  }
}
Fields
gcsSource

object(GcsSource)

The Google Cloud Storage location for the input content.