Method: projects.locations.datasets.importData

Imports data into a dataset.

You can only call this method for an empty Dataset.

HTTP request

POST https://automl.googleapis.com/v1beta1/{name}:importData

Path parameters

Parameters
name

string

Required. Dataset name. Dataset must already exist. All imported annotations and examples will be added.

Authorization requires the following Google IAM permission on the specified resource name:

  • automl.datasets.import

Request body

The request body contains data with the following structure:

JSON representation
{
  "inputConfig": {
    object(InputConfig)
  }
}
Fields
inputConfig

object(InputConfig)

Required. The desired input location and its domain specific semantics, if any.

Response body

If successful, the response body contains an instance of Operation.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

InputConfig

Input configuration for datasets.importData action.

The format of input depends on dataset_metadata the Dataset into which the import is happening has. As input source the gcsSource is expected, unless specified otherwise. If a file with identical content (even if it had different GCS_FILE_PATH) is mentioned multiple times , then its label, bounding boxes etc. are appended. The same file should be always provided with the same ML_USE and GCS_FILE_PATH, if it is not then these values are nondeterministically selected from the given ones.

The formats are represented in EBNF with commas being literal and with non-terminal symbols defined near the end of this comment. The formats are:

See Preparing your training data for more information.

A CSV file(s) with each line in format:

ML_USE,GCS_FILE_PATH
  • ML_USE - Identifies the data set that the current row (file) applies to. This value can be one of the following:

    • TRAIN - Rows in this file are used to train the model.
    • TEST - Rows in this file are used to test the model during training.
    • UNASSIGNED - Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing.
  • GCS_FILE_PATH - Identifies a file stored in Google Cloud Storage that contains the model training information.

For example file:

TRAIN,gs:folder/train_videos.csv
TEST,gs:folder/test_videos.csv
UNASSIGNED,gs:folder/other_videos.csv

After the training data set has been determined from the TRAIN and UNASSIGNED CSV files, the training data is divided into train and validation data sets. 70% for training and 30% for validation.

Each CSV file specified using the GCS_FILE_PATH field has the following format:

GCS_FILE_PATH,LABEL,[INSTANCE_ID],TIMESTAMP,BOUNDING_BOX
  • GCS_FILE_PATH - The path to a video stored in Google Cloud Storage. The video can be up to 1h duration. Supported extensions: .MOV, .MPEG4, .MP4, .AVI.

  • LABEL - A label that identifies the object of the video segment.

  • [INSTANCE_ID] - You can provide an instance id or leave this field blank. Providing instance ids can help to obtain a better model. That is, you can identify a specific labeled entity in a video frame with an instance id. If that entity leaves the video frame, and shows up at a later timestamp, you can identify the identity with the same instance id to help train a more accurate model.

  • TIMESTAMP - The time, in seconds that identifies the frame of video with the labeled object. TIMESTAMP must be greater than zero and less than or equal to the length of the video. AutoML Video Intelligence uses the video frame that is closest to the TIMESTAMP to train the model.

  • BOUNDING_BOX - The coordinates of the labeled object in the video frame. You can specify up to 500 bounding boxes per video frame. A bounding box consists of four pairs of horizontal and vertical (x,y) coordinates that form a square region of the video that contains an object to be tracked. For example: 0.8,0.8,0.9,0.8,0.9,0.9,0.8,0.9. An empty field is equivalent to a value of 0.

Here is an example of the format of one of the CSV files identified by the gcsSource "top level" file.

 gs://folder/video1.avi,car,1,12.10,0.8,0.8,0.9,0.8,0.9,0.9,0.8,0.9
 gs://folder/video1.avi,car,1,12.90,0.4,0.8,0.5,0.8,0.5,0.9,0.4,0.9
 gs://folder/video1.avi,car,2,12.10,.4,.2,.5,.2,.5,.3,.4,.3
 gs://folder/video1.avi,car,2,12.90,.8,.2,,,.9,.3,,
 gs://folder/video1.avi,bike,,12,50,.45,.45,,,.55,.55
 gs://folder/video2.avi,car,1,0,.1,.9,,,.9,.1

Errors:

If any of the provided CSV files can't be parsed or if more than certain percent of CSV rows cannot be processed then the operation fails and nothing is imported. Regardless of overall success or failure the per-row failures, up to a certain count cap, will be listed in Operation.metadata.partial_failures.

JSON representation
{
  "gcsSource": {
    object(GcsSource)
  }
}
Fields
gcsSource

object(GcsSource)

The Google Cloud Storage location for the input content.