Imports data into a dataset.
You can only call this method for an empty Dataset.
For more information, see Importing items into a dataset
HTTP request
POST https://automl.googleapis.com/v1beta1/{name}:importData
Path parameters
Parameters | |
---|---|
name |
Required. Dataset name. Dataset must already exist. All imported annotations and examples will be added. Authorization requires the following Google IAM permission on the specified resource
|
Request body
The request body contains data with the following structure:
JSON representation | |
---|---|
{
"inputConfig": {
object( |
Fields | |
---|---|
inputConfig |
Required. The desired input location and its domain specific semantics, if any. |
Response body
If successful, the response body contains an instance of Operation
.
Authorization Scopes
Requires the following OAuth scope:
https://www.googleapis.com/auth/cloud-platform
For more information, see the Authentication Overview.
InputConfig
Input configuration for datasets.importData
action.
The format of input depends on dataset_metadata the Dataset into which the import is happening has. As input source the gcsSource
is expected, unless specified otherwise. If a file with identical content (even if it had different GCS_FILE_PATH
) is mentioned multiple times , then its label, bounding boxes etc. are appended. The same file should be always provided with the same ML_USE
and GCS_FILE_PATH
, if it is not then these values are nondeterministically selected from the given ones.
The formats are represented in EBNF with commas being literal and with non-terminal symbols defined near the end of this comment. The formats are:
See Preparing your training data for more information.
A CSV file(s) with each line in format:
ML_USE,GCS_FILE_PATH
ML_USE
- Identifies the data set that the current row (file) applies to. This value can be one of the following:TRAIN
- Rows in this file are used to train the model.TEST
- Rows in this file are used to test the model during training.UNASSIGNED
- Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing.
GCS_FILE_PATH
- Identifies a file stored in Google Cloud Storage that contains the model training information.
After the training data set has been determined from the TRAIN
and UNASSIGNED
CSV files, the training data is divided into train and validation data sets. 70% for training and 30% for validation.
Each CSV file specified using the GCS_FILE_PATH
field has the following format:
GCS_FILE_PATH,LABEL,TIME_SEGMENT_START,TIME_SEGMENT_END
GCS_FILE_PATH
- The path to a video stored in Google Cloud Storage. The video can be up to 1h duration. Supported extensions: .MOV, .MPEG4, .MP4, .AVI.LABEL
- A label that identifies the object of the video segment.TIME_SEGMENT_START
andTIME_SEGMENT_END
- The start and end timestamps in seconds for the segment of video to be annotated. The values must be within the length of the video, andTIME_SEGMENT_END
must be after theTIME_SEGMENT_START
.
You can specify videos in the CSV file without any labels. You must then use the AutoML Video Intelligence UI to apply labels to the video before you train your model. To specify a video segment in this way, provide the Google Cloud Storage URI for the video followed by three commas.
Sample file:
TRAIN,gs:folder/train_videos.csv
TEST,gs:folder/test_videos.csv
UNASSIGNED,gs:folder/other_videos.csv
Here is an example of the format of one of the CSV files identified by the gcsSource
"top level" file.
gs://folder/video1.avi,car,120,180.000021
gs://folder/video1.avi,bike,150,180.000021
gs://folder/vid2.avi,car,0,60.5
gs://folder/vid3.avi,,,
Errors:
If any of the provided CSV files can't be parsed or if more than certain percent of CSV rows cannot be processed then the operation fails and nothing is imported. Regardless of overall success or failure the per-row failures, up to a certain count cap, will be listed in Operation.metadata.partial_failures.
JSON representation | |
---|---|
{
"gcsSource": {
object( |
Fields | |
---|---|
gcsSource |
The Google Cloud Storage location for the input content. |