Creating datasets and importing images

A dataset contains representative samples of the type of content you want to classify, labeled with the category labels you want your custom model to use. The dataset serves as the input for training a model.

The main steps for building a dataset are:

  1. Create a dataset and specify whether to allow multiple labels on each item.
  2. Import data items into the dataset.
  3. Label the items.

In many cases, steps 2 and 3 are combined: you import data items with their labels already assigned.

Creating a dataset

The first step in creating a custom model is to create an empty dataset that will eventually hold the training data for the model. When you create a dataset, you specify the type of classification you want your custom model to perform:

  • MULTICLASS assigns a single label to each classified image
  • MULTILABEL allows an image to be assigned multiple labels

As of the v1 version of the AutoML API this request returns the ID of a long-running operation.

After the long-running operation completes you can import images into it. The newly created dataset doesn't contain any data until you import images into it.

Save the dataset ID of the new dataset (from the response) for use with other operations, such as importing images into your dataset and training a model.

Web UI

The AutoML Vision UI enables you to create a new dataset and import items into it from the same page. If you would rather import items later, select Import images later at step 3 below.

  1. Open the AutoML Vision UI.

    The Datasets page shows the status of previously created datasets for the current project.

    Dataset list page

    To add a dataset for a different project, select the project from the drop-down list in the upper right of the title bar.

  2. Click the New Dataset button in the title bar.

  3. On the Create dataset page, enter a name for the dataset and specify where to find the labeled images to use for training the model.

    You can:

    • Upload a .csv file that contains the training images and their associated category labels from your local computer or from Google Cloud Storage.

    • Upload a collection of .txt or .zip files that contain the training images from your local computer.

    • Postpone uploading images and labels until later. Use this option for manual labeling through the UI.

  4. Specify whether to enable multi-label classification.

    Click the check box if you want the model to assign multiple labels to a document.

  5. Click Create dataset.

    You're returned to the Datasets page; your dataset will show an in progress animation while your documents are being imported. This process should take approximately 10 minutes per 1000 documents, but may take more or less time.

    If the service returns a 405 error, reduce the number of documents you're uploading at once. You'll need to refresh the page before trying again.

Integrated UI

  1. Open the Vision Dashboard.

    You can also access this page from the console via the left navigation menu item Artificial Intelligence > Vision. This will take you to the integrated Vision dashboard. Select the AutoML Vision card.

    Integrated UI Vision dashboard

  2. Select Datasets from the left navigation menu.

  3. Select the New Dataset button at the top, update the dataset name (optional), and select radio_button_checked single-label or multi-label classification based on the data you have.

    select model type for dataset page

  4. After specifying the classification type, select Create Dataset.

  5. On the Create Dataset page you can choose a CSV file from Google Cloud Storage, or local image files to import into the dataset.

    select import csv window

    Select Continue to begin image import into your dataset. While import occurs the dataset will show a status of Running: Importing images.

  6. You receive an email when import has finished.

REST & CMD LINE

The following example creates a dataset that supports one label per item (see MULTICLASS).

The newly created dataset doesn't contain any data until you import items into it.

Save the "name" of the new dataset (from the response) for use with other operations, such as importing items into your dataset and training a model.

Before using any of the request data below, make the following replacements:

  • project-id: your GCP project ID.
  • display-name: a string display name of your choosing.

HTTP method and URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets

Request JSON body:

{
  "displayName": "display-name",
  "imageClassificationDatasetMetadata": {
    "classificationType": "MULTICLASS"
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets" | Select-Object -Expand Content

You should see output similar to the following. You can use the operation ID (ICN3819960680614725486, in this case) to get the status of the task. For an example, see Getting an operation's status:

{
  "name": "projects/project-id/locations/us-central1/operations/ICN3819960680614725486",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-11-14T16:49:13.667526Z",
    "updateTime": "2019-11-14T16:49:13.667526Z",
    "createDatasetDetails": {}
  }
}

After the long-running operation completes you can get the dataset's ID with the same operation status request. The response should look similar to the following:

{
  "name": "projects/project-id/locations/us-central1/operations/ICN3819960680614725486",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-11-14T16:49:13.667526Z",
    "updateTime": "2019-11-14T16:49:17.975314Z",
    "createDatasetDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.Dataset",
    "name": "projects/project-id/locations/us-central1/datasets/ICN5496445433112696489"
  }
}

C#

/// <summary>
///  Demonstrates using the AutoML client to create a dataset.
/// </summary>
/// <param name="projectId">GCP Project ID.</param>
/// <param name="displayName">The name of the dataset to be created.</param>
public static void VisionClassificationCreateDataset(string projectId = "YOUR-PROJECT-ID",
    string displayName = "YOUR-DATASET-NAME")
{
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    AutoMlClient client = AutoMlClient.Create();

    // A resource that represents Google Cloud Platform location.
    string projectLocation = LocationName.Format(projectId, "us-central1");

    // Specify the classification type
    // Types:
    // MultiLabel: Multiple labels are allowed for one example.
    // MultiClass: At most one label is allowed per example.
    ClassificationType classificationType = ClassificationType.Multilabel;
    ImageClassificationDatasetMetadata metadata =
       new ImageClassificationDatasetMetadata
       {
           ClassificationType = classificationType
       };


    Dataset dataset = new
        Dataset
    {
        DisplayName = displayName,
        ImageClassificationDatasetMetadata = metadata
    };

    Dataset createdDataset = client
        .CreateDatasetAsync(projectLocation, dataset).Result.PollUntilCompleted().Result;

    // Display the dataset information.
    Console.WriteLine($"Dataset name: {createdDataset.Name}");
    // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
    // required for other methods.
    // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
    string[] names = createdDataset.Name.Split("/");
    string datasetId = names[names.Length - 1];
    Console.WriteLine($"Dataset id: {datasetId}");
}

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1beta1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1beta1"
)

// visionClassificationCreateDataset creates a dataset for image classification.
func visionClassificationCreateDataset(w io.Writer, projectID string, location string, datasetName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_ImageClassificationDatasetMetadata{
				ImageClassificationDatasetMetadata: &automlpb.ImageClassificationDatasetMetadata{
					// Specify the classification type:
					// - MULTILABEL: Multiple labels are allowed for one example.
					// - MULTICLASS: At most one label is allowed per example.
					ClassificationType: automlpb.ClassificationType_MULTILABEL,
				},
			},
		},
	}

	dataset, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %v", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.ClassificationType;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.ImageClassificationDatasetMetadata;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;

import java.io.IOException;
import java.util.concurrent.ExecutionException;

class VisionClassificationCreateDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");

      // Specify the classification type
      // Types:
      // MultiLabel: Multiple labels are allowed for one example.
      // MultiClass: At most one label is allowed per example.
      ClassificationType classificationType = ClassificationType.MULTILABEL;
      ImageClassificationDatasetMetadata metadata =
          ImageClassificationDatasetMetadata.newBuilder()
              .setClassificationType(classificationType)
              .build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setImageClassificationDatasetMetadata(metadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require(`@google-cloud/automl`).v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  // Specify the classification type
  // Types:
  // MultiLabel: Multiple labels are allowed for one example.
  // MultiClass: At most one label is allowed per example.
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      imageClassificationDatasetMetadata: {
        classificationType: 'MULTILABEL',
      },
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

PHP

use Google\Cloud\AutoMl\V1\AutoMlClient;
use Google\Cloud\AutoMl\V1\ClassificationType;
use Google\Cloud\AutoMl\V1\Dataset;
use Google\Cloud\AutoMl\V1\ImageClassificationDatasetMetadata;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $displayName = 'your_dataset_name';

$client = new AutoMlClient();

try {
    // resource that represents Google Cloud Platform location
    $formattedParent = $client->locationName(
        $projectId,
        $location
    );

    $metadata = (new ImageClassificationDatasetMetadata())
        ->setClassificationType(ClassificationType::MULTILABEL);
    $dataset = (new Dataset())
        ->setDisplayName($displayName)
        ->setImageClassificationDatasetMetadata($metadata);

    // create dataset with the above location and metadata
    $operationResponse = $client->createDataset($formattedParent, $dataset);
    $operationResponse->pollUntilComplete();
    if ($operationResponse->operationSucceeded()) {
        $result = $operationResponse->getResult();

        // display dataset information
        $splitName = explode('/', $result->getName());
        printf('Dataset name: %s' . PHP_EOL, $result->getName());
        printf('Dataset id: %s' . PHP_EOL, end($splitName));
    } else {
        $error = $operationResponse->getError();
        // handleError($error)
    }
} finally {
    $client->close();
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = 'YOUR_PROJECT_ID'
# display_name = 'YOUR_DATASET_NAME'

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = client.location_path(project_id, 'us-central1')
# Specify the classification type
# Types:
# MultiLabel: Multiple labels are allowed for one example.
# MultiClass: At most one label is allowed per example.
metadata = automl.types.ImageClassificationDatasetMetadata(
    classification_type=automl.enums.ClassificationType.MULTILABEL)
dataset = automl.types.Dataset(
    display_name=display_name,
    image_classification_dataset_metadata=metadata)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(project_location, dataset)

created_dataset = response.result()

# Display the dataset information
print(u'Dataset name: {}'.format(created_dataset.name))
print(u'Dataset id: {}'.format(created_dataset.name.split("/")[-1]))

Ruby

require "google/cloud/automl"

project_id = "YOUR_PROJECT_ID"
display_name = "YOUR_DATASET_NAME"

client = Google::Cloud::AutoML::AutoML.new

# A resource that represents Google Cloud Platform location.
project_location = client.class.location_path project_id, "us-central1"
dataset = {
  display_name:                          display_name,
  image_classification_dataset_metadata: {
    # Specify the classification type
    # Types:
    # MultiLabel: Multiple labels are allowed for one example.
    # MultiClass: At most one label is allowed per example.
    classification_type: :MULTILABEL
  }
}

# Create a dataset with the dataset metadata in the region.
created_dataset = client.create_dataset project_location, dataset

# Display the dataset information
puts "Dataset name: #{created_dataset.name}"
puts "Dataset id: #{created_dataset.name.split('/').last}"

Importing items into a dataset

After you have created a dataset, you can import item URIs and labels for items from a CSV file stored in a Google Cloud Storage bucket. For details on preparing your data and creating a CSV file for import, see Preparing your training data.

You can import items into an empty dataset or import additional items into an existing dataset.

Web UI

The AutoML Vision UI enables you to create a new dataset and import items into it from the same page; see Creating a dataset. The steps below import items into an existing dataset.

  1. Open the AutoML Vision UI and select the dataset from the Datasets page.

    Dataset list page

  2. On the Images page, click Add items in the title bar and select the import method from the drop-down list.

    You can:

    • Upload a .csv file that contains the training images and their associated category labels from your local computer or from Google Cloud Storage.

    • Upload .txt or .zip files that contain the training images from your local computer.

  3. Select the file(s) to import.

Integrated UI

The AutoML Vision UI enables you to create a new dataset and import items into it from the same page; see Creating a dataset. The steps below import items into an existing dataset.

  1. Open the Vision Dashboard and select the dataset from the Datasets page.

    Dataset list page

  2. On the Images page, click Add items in the title bar and select the import method from the drop-down list.

    You can:

    • Upload a .csv file that contains the training images and their associated category labels from your local computer or from Google Cloud Storage.

    • Upload .txt or .zip files that contain the training images from your local computer.

  3. Select the file(s) to import.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • project-id: your GCP project ID.
  • dataset-id: the ID of your dataset. The ID is the last element of the name of your dataset. For example:
    • dataset name: projects/project-id/locations/location-id/datasets/3104518874390609379
    • dataset id: 3104518874390609379
  • input-storage-path: the path to a CSV file stored on Google Cloud Storage. The requesting user must have at least read permission to the bucket.

HTTP method and URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets/dataset-id:importData

Request JSON body:

{
  "inputConfig": {
    "gcsSource": {
      "inputUris": [input-storage-path]
    }
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets/dataset-id:importData

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets/dataset-id:importData" | Select-Object -Expand Content

You should see output similar to the following. You can use the operation ID (ICN3819960680614725486, in this case) to get the status of the task. For an example, see Getting an operation's status.

{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2018-10-29T15:56:29.176485Z",
    "updateTime": "2018-10-29T15:56:29.176485Z",
    "importDataDetails": {}
  }
}

C#

/// <summary>
/// Import labeled items.
/// </summary>
/// <param name="projectId">GCP Project ID.</param>
/// <param name="datasetId">the Id of the dataset.</param>
/// <param name="path">Google Cloud Storage URIs.
/// Target files must be in AutoML CSV format.</param>
public static object ImportDataset(string projectId = "YOUR-PROJECT-ID",
    string datasetId = "YOUR-DATASET-ID",
    string path = "gs://BUCKET_ID/path_to_training_data.csv")
{
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    AutoMlClient client = AutoMlClient.Create();

    // Get the complete path of the dataset.
    string datasetFullId = DatasetName.Format(projectId, "us-central1", datasetId);

    // Get multiple Google Cloud Storage URIs to import data from
    GcsSource gcsSource = new
        GcsSource
    {
        InputUris = { path.Split(",") }
    };

    // Import data from the input URI
    InputConfig inputConfig = new InputConfig
    {
        GcsSource = gcsSource
    };

    Console.WriteLine("Processing import...");

    Empty response = client.ImportDataAsync(datasetFullId, inputConfig).Result.PollUntilCompleted().Result;
    Console.WriteLine($"Data imported. {response}");
    return 0;
}

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1beta1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1beta1"
)

// importDataIntoDataset imports data into a dataset.
func importDataIntoDataset(w io.Writer, projectID string, location string, datasetID string, inputURI string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TRL123456789..."
	// inputURI := "gs://BUCKET_ID/path_to_training_data.csv"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.ImportDataRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/datasets/%s", projectID, location, datasetID),
		InputConfig: &automlpb.InputConfig{
			Source: &automlpb.InputConfig_GcsSource{
				GcsSource: &automlpb.GcsSource{
					InputUris: []string{inputURI},
				},
			},
		},
	}

	op, err := client.ImportData(ctx, req)
	if err != nil {
		return fmt.Errorf("ImportData: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	if err := op.Wait(ctx); err != nil {
		return fmt.Errorf("Wait: %v", err)
	}

	fmt.Fprintf(w, "Data imported.\n")

	return nil
}

Java

import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.DatasetName;
import com.google.cloud.automl.v1.GcsSource;
import com.google.cloud.automl.v1.InputConfig;
import com.google.protobuf.Empty;

import java.io.IOException;
import java.util.Arrays;
import java.util.concurrent.ExecutionException;

class ImportDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String path = "gs://BUCKET_ID/path_to_training_data.csv";
    importDataset(projectId, datasetId, path);
  }

  // Import a dataset
  static void importDataset(String projectId, String datasetId, String path)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the complete path of the dataset.
      DatasetName datasetFullId = DatasetName.of(projectId, "us-central1", datasetId);

      // Get multiple Google Cloud Storage URIs to import data from
      GcsSource gcsSource =
          GcsSource.newBuilder().addAllInputUris(Arrays.asList(path.split(","))).build();

      // Import data from the input URI
      InputConfig inputConfig = InputConfig.newBuilder().setGcsSource(gcsSource).build();
      System.out.println("Processing import...");

      Empty response = client.importDataAsync(datasetFullId, inputConfig).get();
      System.out.format("Dataset imported. %s\n", response);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const datasetId = 'YOUR_DISPLAY_ID';
// const path = 'gs://BUCKET_ID/path_to_training_data.csv';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require(`@google-cloud/automl`).v1;

// Instantiates a client
const client = new AutoMlClient();

async function importDataset() {
  // Construct request
  const request = {
    name: client.datasetPath(projectId, location, datasetId),
    inputConfig: {
      gcsSource: {
        inputUris: path.split(','),
      },
    },
  };

  // Import dataset
  console.log(`Proccessing import`);
  const [operation] = await client.importData(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(`Dataset imported: ${response}`);
}

importDataset();

PHP

use Google\Cloud\AutoMl\V1\AutoMlClient;
use Google\Cloud\AutoMl\V1\GcsSource;
use Google\Cloud\AutoMl\V1\InputConfig;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $datasetId = 'my_dataset_id_123';
// $gcsUri = 'gs://BUCKET_ID/path_to_training_data/'

$client = new AutoMlClient();

try {
    // get full path of dataset
    $formattedName = $client->datasetName(
        $projectId,
        $location,
        $datasetId
    );

    // set GCS uri
    $gcsSource = (new GcsSource())
        ->setInputUri($gcsUri);
    $inputConfig = (new InputConfig())
        ->setGcsSource($gcsSource);

    // import data from input uri
    $operationResponse = $client->importData($formattedName, $inputConfig);
    $operationResponse->pollUntilComplete();
    if ($operationResponse->operationSucceeded()) {
        $result = $operationResponse->getResult();
        printf('Dataset imported.' . PHP_EOL);
    } else {
        $error = $operationResponse->getError();
        // handleError($error)
    }
} finally {
    $client->close();
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = 'YOUR_PROJECT_ID'
# dataset_id = 'YOUR_DATASET_ID'
# path = 'gs://BUCKET_ID/path_to_training_data.csv'

client = automl.AutoMlClient()
# Get the full path of the dataset.
dataset_full_id = client.dataset_path(
    project_id, 'us-central1', dataset_id)
# Get the multiple Google Cloud Storage URIs
input_uris = path.split(',')
gcs_source = automl.types.GcsSource(
    input_uris=input_uris)
input_config = automl.types.InputConfig(
    gcs_source=gcs_source)
# Import data from the input URI
response = client.import_data(dataset_full_id, input_config)

print('Processing import...')
print(u'Data imported. {}'.format(response.result()))

Ruby

require "google/cloud/automl"

project_id = "YOUR_PROJECT_ID"
dataset_id = "YOUR_DATASET_ID"
path = "gs://BUCKET_ID/path_to_training_data.csv"

client = Google::Cloud::AutoML::AutoML.new

# Get the full path of the dataset.
dataset_full_id = client.class.dataset_path project_id, "us-central1", dataset_id
input_config = {
  gcs_source: {
    # Get the multiple Google Cloud Storage URIs
    input_uris: path.split(",")
  }
}

# Import data from the input URI
operation = client.import_data dataset_full_id, input_config

puts "Processing import..."

# Wait until the long running operation is done
operation.wait_until_done!

puts "Data imported."

Labeling training items

To be useful for training a model, each item in a dataset must have at least one category label assigned to it. AutoML Vision ignores items without a category label. You can provide labels for your training items in three ways:

  • Include labels in your .csv file
  • Label your items in the AutoML Vision UI
  • Request labeling from human labeling service such as Google AI Platform Data Labeling Service.

For details about labeling items in your .csv file, see Preparing your training data.

To label items in the AutoML Vision UI, select the dataset from the dataset listing page to see its details. The display name of the selected dataset appears in the title bar, and the page lists the individual items in the dataset along with their labels. The navigation bar along the left summarizes the number of labeled and unlabeled items and enables you to filter the item list by label.

Images page

To assign labels to unlabeled items or change item labels, select the items you want to update and the label(s) you want to assign to them.

Request labeling

You can leverage Google's AI Platform Data Labeling Service service to label your images. See the product documentation for more information.

Getting the status of an operation

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • project-id: your GCP project ID.
  • operation-id: the ID of your operation. The ID is the last element of the name of your operation. For example:
    • operation name: projects/project-id/locations/location-id/operations/IOD5281059901324392598
    • operation id: IOD5281059901324392598

HTTP method and URL:

GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id

PowerShell

Execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id" | Select-Object -Expand Content
You should see output similar to the following for a completed import operation:
{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2018-10-29T15:56:29.176485Z",
    "updateTime": "2018-10-29T16:10:41.326614Z",
    "importDataDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.protobuf.Empty"
  }
}

You should see output similar to the following for a completed create model operation:

{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-07-22T18:35:06.881193Z",
    "updateTime": "2019-07-22T19:58:44.972235Z",
    "createModelDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.Model",
    "name": "projects/project-id/locations/us-central1/models/model-id"
  }
}

C#

/// <summary>
/// Demonstrates using the AutoML client to get operation status.
/// </summary>
/// <param name="operationFullId">the complete name of a operation. For example, the name of your
/// operation is projects/[projectId]/locations/us-central1/operations/[operationId].</param>
public static object GetOperationStatus(string operationFullId
    = "projects/[projectId]/locations/us-central1/operations/[operationId]")
{
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    AutoMlClient client = AutoMlClient.Create();

    // Get the latest state of a long-running operation.
    //TODO: I dont know why there is no 'GetOperationsClient'
    Operation operation = client.CreateModelOperationsClient.GetOperation(operationFullId);

    // Display operation details.
    Console.WriteLine("Operation details:");
    Console.WriteLine($"\tName: {operation.Name}");
    Console.WriteLine($"\tMetadata Type Url: {operation.Metadata.TypeUrl}");
    Console.WriteLine($"\tDone: {operation.Done}");
    if (operation.Response != null)
    {
        Console.WriteLine($"\tResponse Type Url: {operation.Response.TypeUrl}");
    }
    if (operation.Error != null)
    {
        Console.WriteLine("\tResponse:");
        Console.WriteLine($"\t\tError code: {operation.Error.Code}");
        Console.WriteLine($"\t\tError message: {operation.Error.Message}");
    }

    return 0;
}

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1beta1"
	"google.golang.org/genproto/googleapis/longrunning"
)

// getOperationStatus gets an operation's status.
func getOperationStatus(w io.Writer, projectID string, location string, operationID string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// operationID := "TRL123456789..."

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &longrunning.GetOperationRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/operations/%s", projectID, location, operationID),
	}

	op, err := client.LROClient.GetOperation(ctx, req)
	if err != nil {
		return fmt.Errorf("GetOperation: %v", err)
	}

	fmt.Fprintf(w, "Name: %v\n", op.GetName())
	fmt.Fprintf(w, "Operation details:\n")
	fmt.Fprintf(w, "%v", op)

	return nil
}

Java

import com.google.cloud.automl.v1.AutoMlClient;
import com.google.longrunning.Operation;

import java.io.IOException;

class GetOperationStatus {

  static void getOperationStatus() throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String operationFullId = "projects/[projectId]/locations/us-central1/operations/[operationId]";
    getOperationStatus(operationFullId);
  }

  // Get the status of an operation
  static void getOperationStatus(String operationFullId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the latest state of a long-running operation.
      Operation operation = client.getOperationsClient().getOperation(operationFullId);

      // Display operation details.
      System.out.println("Operation details:");
      System.out.format("\tName: %s\n", operation.getName());
      System.out.format("\tMetadata Type Url: %s\n", operation.getMetadata().getTypeUrl());
      System.out.format("\tDone: %s\n", operation.getDone());
      if (operation.hasResponse()) {
        System.out.format("\tResponse Type Url: %s\n", operation.getResponse().getTypeUrl());
      }
      if (operation.hasError()) {
        System.out.println("\tResponse:");
        System.out.format("\t\tError code: %s\n", operation.getError().getCode());
        System.out.format("\t\tError message: %s\n", operation.getError().getMessage());
      }
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const operationFullId = 'projects/[projectId]/locations/us-central1/operations/[operationId]';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require(`@google-cloud/automl`).v1;

// Instantiates a client
const client = new AutoMlClient();

async function getOperationStatus() {
  // Construct request
  const request = {
    name: operationFullId,
  };

  const [response] = await client.operationsClient.getOperation(request);

  console.log(`Name: ${response.name}`);
  console.log(`Operation details:`);
  console.log(`${response}`);
}

getOperationStatus();

PHP

use Google\ApiCore\LongRunning\OperationsClient;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $operationId = 'my_operation_id_123';

$client = new OperationsClient();

try {
    // full name of operation
    $formattedName = 'projects/' . $projectId . '/locations/us-central1/operations/' . $operationId;

    // get latest state of long running operation
    $operation = $client->getOperation($name);
    printf('Operation name: %s' . PHP_EOL, $operation->getName());
    print('Operation details: ');
    print($operation);
} finally {
    if (isset($client)) {
        $client->close();
    }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# operation_full_id = \
#     'projects/[projectId]/locations/us-central1/operations/[operationId]'

client = automl.AutoMlClient()
# Get the latest state of a long-running operation.
response = client.transport._operations_client.get_operation(
    operation_full_id
)

print(u'Name: {}'.format(response.name))
print('Operation details:')
print(response)
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud AutoML Vision
Need help? Visit our support page.