创建数据集并导入图片

数据集包含您想要分类的内容类型的代表性样本,这些样本使用标签和边界框进行注释。数据集用作训练模型的输入。

构建数据集的主要步骤如下:

  1. 创建数据集并为其指定一个容易记住的名称。
  2. 导入数据示例到数据集中。
  3. 修改导入的图片注释(可选),以添加、删除或修改图片中的边界框和标签。

创建数据集

如需使用 AutoML API 创建自定义模型,首先需要创建一个空数据集,该数据集最终将保存模型的训练数据。

自 Cloud AutoML Vision Object Detection 正式版 (GA) 发布之日起,此请求会返回长时间运行的操作的 ID。

长时间运行的操作完成后,您可以将图片导入到该数据集中。 在您将图片导入到新创建的数据集之前,该数据集不包含任何数据。

保存响应中新数据集的 ID 以用于其他操作,例如,将图片导入到数据集中并训练模型。

网页界面

在 Cloud AutoML Vision Object Detection 界面中,您可以创建新数据集并在同一页面将图片导入其中。

  1. 打开 Cloud AutoML Vision 对象检测界面

    数据集页面会显示之前为当前项目创建的数据集的状态。 创建数据集图片

    如需为其他项目添加数据集,请从标题栏右上角的下拉列表中选择项目。

  2. 点击标题栏中的新建数据集按钮。

  3. 新建数据集弹出式窗口中,输入数据集的名称,然后选择“创建数据集”选项。

    创建数据集和新数据集名称的窗口

    创建空数据集后,您将进入数据集详情页面中的导入标签页。然后,您可以指定 .csv 文件的 Google Cloud Storage 位置,该文件列出了要包含在数据集中的训练图片。这些训练图片本身必须同样存储在 Google Cloud Storage 存储分区中。

    创建数据集上传 CSV 图片

    如需创建数据集,您必须从 Google Cloud Storage 中上传包含训练图片及其关联边界框和标签的 .csv 文件。

    导入完成后,您可以在界面中添加、移除或修改任何注释

  4. 选择导入

    此时您将返回到数据集页面;您的数据集会在图片导入期间显示一个进行中动画。此过程所需的时间大约为每 1000 个样本 10 分钟,但有可能更长或更短。

REST 和命令行

在使用下面的任何请求数据之前,请先进行以下替换:

  • project-id:您的 GCP 项目 ID。
  • display-name:您选择的字符串显示名。

HTTP 方法和网址:

POST https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets

请求 JSON 正文:

{
  "displayName": "display-name",
  "imageObjectDetectionDatasetMetadata": {
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets" | Select-Object -Expand Content

您应该会看到类似如下所示的输出。可以使用操作 ID(本例中为 IOD3819960680614725486)来获取任务的状态。如需查看示例,请参阅处理长时间运行的操作

{
  "name": "projects/project-id/locations/us-central1/operations/IOD3819960680614725486",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-11-14T16:49:13.667526Z",
    "updateTime": "2019-11-14T16:49:13.667526Z",
    "createDatasetDetails": {}
  }
}

长时间运行的操作完成后,您可以使用同样的操作状态请求来获取数据集的 ID。响应应类似如下所示:

{
  "name": "projects/project-id/locations/us-central1/operations/IOD3819960680614725486",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-11-14T16:49:13.667526Z",
    "updateTime": "2019-11-14T16:49:17.975314Z",
    "createDatasetDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.Dataset",
    "name": "projects/project-id/locations/us-central1/datasets/IOD5496445433112696489"
  }
}

C#

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

/// <summary>
/// Demonstrates using the AutoML client to create a dataset.
/// </summary>
/// <param name="projectId">GCP Project ID.</param>
/// <param name="displayName">the Id of the dataset.</param>
public static object VisionObjectDetectionCreateDataset(string projectId = "YOUR-PROJECT-ID",
    string displayName = "YOUR-DATASET-NAME")
{
    // Initialize the client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    AutoMlClient client = AutoMlClient.Create();

    // A resource that represents Google Cloud Platform location.
    string projectLocation = LocationName.Format(projectId, "us-central1");

    ImageObjectDetectionDatasetMetadata metadata =
       new ImageObjectDetectionDatasetMetadata
       { };

    Dataset dataset = new Dataset
    {
        DisplayName = displayName,
        ImageObjectDetectionDatasetMetadata = metadata
    };

    var result = Task.Run(() => client.CreateDatasetAsync(projectLocation, dataset)).Result;
    Dataset createdDataset = result.PollUntilCompleted().Result;

    // Display the dataset information.
    Console.WriteLine($"Dataset name: {createdDataset.Name}");
    // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
    // required for other methods.
    // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
    string[] names = createdDataset.Name.Split("/");
    string datasetId = names[names.Length - 1];
    Console.WriteLine($"Dataset id: {datasetId}");
    return 0;
}

Go

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// visionObjectDetectionCreateDataset creates a dataset for image object detection.
func visionObjectDetectionCreateDataset(w io.Writer, projectID string, location string, datasetName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_ImageObjectDetectionDatasetMetadata{
				ImageObjectDetectionDatasetMetadata: &automlpb.ImageObjectDetectionDatasetMetadata{},
			},
		},
	}

	op, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	dataset, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %v", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

Java

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.ImageObjectDetectionDatasetMetadata;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class VisionObjectDetectionCreateDataset {

  static void createDataset() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");

      ImageObjectDetectionDatasetMetadata metadata =
          ImageObjectDetectionDatasetMetadata.newBuilder().build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setImageObjectDetectionDatasetMetadata(metadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      imageObjectDetectionDatasetMetadata: {},
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

PHP

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

use Google\Cloud\AutoMl\V1\AutoMlClient;
use Google\Cloud\AutoMl\V1\Dataset;
use Google\Cloud\AutoMl\V1\ImageObjectDetectionDatasetMetadata;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $displayName = 'your_dataset_name';

$client = new AutoMlClient();

try {
    // resource that represents Google Cloud Platform location
    $formattedParent = $client->locationName(
        $projectId,
        $location
    );

    $metadata = new ImageObjectDetectionDatasetMetadata();
    $dataset = (new Dataset())
        ->setDisplayName($displayName)
        ->setImageObjectDetectionDatasetMetadata($metadata);

    // create dataset with the above location and metadata
    $operationResponse = $client->createDataset($formattedParent, $dataset);
    $operationResponse->pollUntilComplete();
    if ($operationResponse->operationSucceeded()) {
        $result = $operationResponse->getResult();

        // display dataset information
        $splitName = explode('/', $result->getName());
        printf('Dataset name: %s' . PHP_EOL, $result->getName());
        printf('Dataset id: %s' . PHP_EOL, end($splitName));
    } else {
        $error = $operationResponse->getError();
        // handleError($error)
    }
} finally {
    $client->close();
}

Python

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "your_datasets_display_name"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = client.location_path(project_id, "us-central1")
metadata = automl.types.ImageObjectDetectionDatasetMetadata()
dataset = automl.types.Dataset(
    display_name=display_name,
    image_object_detection_dataset_metadata=metadata,
)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(project_location, dataset)

created_dataset = response.result()

# Display the dataset information
print("Dataset name: {}".format(created_dataset.name))
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))

Ruby

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

require "google/cloud/automl"

project_id = "YOUR_PROJECT_ID"
display_name = "YOUR_DATASET_NAME"

client = Google::Cloud::AutoML::AutoML.new

# A resource that represents Google Cloud Platform location.
project_location = client.class.location_path project_id, "us-central1"
dataset = {
  display_name:                            display_name,
  image_object_detection_dataset_metadata: {}
}

# Create a dataset with the dataset metadata in the region.
created_dataset = client.create_dataset project_location, dataset

# Display the dataset information
puts "Dataset name: #{created_dataset.name}"
puts "Dataset id: #{created_dataset.name.split('/').last}"

将图片导入数据集

创建数据集后,您可以从存储在 Google Cloud Storage 存储分区的 CSV 文件中导入各图片的 URI 和带标签的边界框。

如需详细了解如何准备数据并创建 CSV 文件以供导入,请参阅准备训练数据。 如需详细了解如何在导入图片后修改图片注释,请参阅为导入的训练图片添加注释

您可以将图片导入到空数据集中或已包含训练图片的数据集中。

网页界面

在 AutoML Vision Object Detection 界面中,数据集创建步骤和图片导入步骤会按连续步骤组合执行。

将图片导入到空数据集中

对于后续的数据集创建过程,系统会在创建空数据集后提示您直接导入图片,但此时不需要执行这一导入步骤。

如需将图片导入到空数据集中,请完成以下步骤:

  1. 数据集页面中,选择空数据集。

    列出数据集图片

  2. 导入页面上,添加 .csv 文件的 Google Cloud Storage 位置。指明 .csv 文件在 Google Cloud Storage 中的位置后,选择导入以开始文件导入过程。

    创建数据集上传 CSV 图片

将图片导入到非空数据集中

您可以选择向已包含训练图片的数据集中添加更多训练图片。

如需将训练图片添加到非空数据集中,请完成以下步骤:

  1. 数据集页面中,选择非空数据集。

    列出数据集图片

    选择非空数据集,即可进入数据集详情 (Dataset details) 页面。

    为训练图片添加标签的界面

  2. 数据集详情页面上,选择导入标签页。

    导入到非空数据集

    选择导入标签页即可进入创建数据集页面。 然后,您可以指定 .csv 文件的 Google Cloud Storage 位置,并选择导入以开始图片导入过程。

    创建数据集上传 CSV 图片

REST 和命令行

在使用下面的任何请求数据之前,请先进行以下替换:

  • project-id:您的 GCP 项目 ID。
  • dataset-id:您的数据集的 ID。此 ID 是数据集名称的最后一个元素。例如:
    • 数据集名称:projects/project-id/locations/location-id/datasets/3104518874390609379
    • 数据集 ID:3104518874390609379
  • input-storage-path:存储在 Google Cloud Storage 中的 CSV 文件的路径。 发出请求的用户必须至少具有相应存储分区的读取权限。

HTTP 方法和网址:

POST https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets/dataset-id:importData

请求 JSON 正文:

{
  "inputConfig": {
    "gcsSource": {
       "inputUris": ["input-storage-path"]
    }
  }
}

如需发送请求,请选择以下方式之一:

curl

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets/dataset-id:importData

PowerShell

将请求正文保存在名为 request.json 的文件中,然后执行以下命令:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets/dataset-id:importData" | Select-Object -Expand Content

您应该会看到类似如下所示的输出。可以使用操作 ID 来获取任务的状态。如需查看示例,请参阅处理长时间运行的操作

{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2018-10-29T15:56:29.176485Z",
    "updateTime": "2018-10-29T15:56:29.176485Z",
    "importDataDetails": {}
  }
}

C#

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

/// <summary>
/// Import labeled items.
/// </summary>
/// <param name="projectId">GCP Project ID.</param>
/// <param name="datasetId">the Id of the dataset.</param>
/// <param name="path">Google Cloud Storage URIs.
/// Target files must be in AutoML CSV format.</param>
public static object ImportDataset(string projectId = "YOUR-PROJECT-ID",
    string datasetId = "YOUR-DATASET-ID",
    string path = "gs://BUCKET_ID/path_to_training_data.csv")
{
    // Initialize the client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    AutoMlClient client = AutoMlClient.Create();

    // Get the complete path of the dataset.
    string datasetFullId = DatasetName.Format(projectId, "us-central1", datasetId);

    // Get multiple Google Cloud Storage URIs to import data from
    GcsSource gcsSource = new GcsSource
    {
        InputUris = { path.Split(",") }
    };

    // Import data from the input URI
    InputConfig inputConfig = new InputConfig
    {
        GcsSource = gcsSource
    };

    var result = Task.Run(() => client.ImportDataAsync(datasetFullId, inputConfig)).Result;
    Console.WriteLine("Processing import...");
    result.PollUntilCompleted();
    Console.WriteLine($"Data imported.");
    return 0;
}

Go

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// importDataIntoDataset imports data into a dataset.
func importDataIntoDataset(w io.Writer, projectID string, location string, datasetID string, inputURI string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TRL123456789..."
	// inputURI := "gs://BUCKET_ID/path_to_training_data.csv"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.ImportDataRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/datasets/%s", projectID, location, datasetID),
		InputConfig: &automlpb.InputConfig{
			Source: &automlpb.InputConfig_GcsSource{
				GcsSource: &automlpb.GcsSource{
					InputUris: []string{inputURI},
				},
			},
		},
	}

	op, err := client.ImportData(ctx, req)
	if err != nil {
		return fmt.Errorf("ImportData: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	if err := op.Wait(ctx); err != nil {
		return fmt.Errorf("Wait: %v", err)
	}

	fmt.Fprintf(w, "Data imported.\n")

	return nil
}

Java

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.DatasetName;
import com.google.cloud.automl.v1.GcsSource;
import com.google.cloud.automl.v1.InputConfig;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.protobuf.Empty;
import java.io.IOException;
import java.util.Arrays;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

class ImportDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String path = "gs://BUCKET_ID/path_to_training_data.csv";
    importDataset(projectId, datasetId, path);
  }

  // Import a dataset
  static void importDataset(String projectId, String datasetId, String path)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the complete path of the dataset.
      DatasetName datasetFullId = DatasetName.of(projectId, "us-central1", datasetId);

      // Get multiple Google Cloud Storage URIs to import data from
      GcsSource gcsSource =
          GcsSource.newBuilder().addAllInputUris(Arrays.asList(path.split(","))).build();

      // Import data from the input URI
      InputConfig inputConfig = InputConfig.newBuilder().setGcsSource(gcsSource).build();
      System.out.println("Processing import...");

      // Start the import job
      OperationFuture<Empty, OperationMetadata> operation =
          client.importDataAsync(datasetFullId, inputConfig);

      System.out.format("Operation name: %s%n", operation.getName());

      // If you want to wait for the operation to finish, adjust the timeout appropriately. The
      // operation will still run if you choose not to wait for it to complete. You can check the
      // status of your operation using the operation's name.
      Empty response = operation.get(45, TimeUnit.MINUTES);
      System.out.format("Dataset imported. %s%n", response);
    } catch (TimeoutException e) {
      System.out.println("The operation's polling period was not long enough.");
      System.out.println("You can use the Operation's name to get the current status.");
      System.out.println("The import job is still running and will complete as expected.");
      throw e;
    }
  }
}

Node.js

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const datasetId = 'YOUR_DISPLAY_ID';
// const path = 'gs://BUCKET_ID/path_to_training_data.csv';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function importDataset() {
  // Construct request
  const request = {
    name: client.datasetPath(projectId, location, datasetId),
    inputConfig: {
      gcsSource: {
        inputUris: path.split(','),
      },
    },
  };

  // Import dataset
  console.log('Proccessing import');
  const [operation] = await client.importData(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(`Dataset imported: ${response}`);
}

importDataset();

PHP

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

use Google\Cloud\AutoMl\V1\AutoMlClient;
use Google\Cloud\AutoMl\V1\GcsSource;
use Google\Cloud\AutoMl\V1\InputConfig;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $datasetId = 'my_dataset_id_123';
// $gcsUri = 'gs://BUCKET_ID/path_to_training_data/'

$client = new AutoMlClient();

try {
    // get full path of dataset
    $formattedName = $client->datasetName(
        $projectId,
        $location,
        $datasetId
    );

    // set GCS uri
    $gcsSource = (new GcsSource())
        ->setInputUri($gcsUri);
    $inputConfig = (new InputConfig())
        ->setGcsSource($gcsSource);

    // import data from input uri
    $operationResponse = $client->importData($formattedName, $inputConfig);
    $operationResponse->pollUntilComplete();
    if ($operationResponse->operationSucceeded()) {
        $result = $operationResponse->getResult();
        printf('Dataset imported.' . PHP_EOL);
    } else {
        $error = $operationResponse->getError();
        // handleError($error)
    }
} finally {
    $client->close();
}

Python

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# dataset_id = "YOUR_DATASET_ID"
# path = "gs://YOUR_BUCKET_ID/path/to/data.csv"

client = automl.AutoMlClient()
# Get the full path of the dataset.
dataset_full_id = client.dataset_path(
    project_id, "us-central1", dataset_id
)
# Get the multiple Google Cloud Storage URIs
input_uris = path.split(",")
gcs_source = automl.types.GcsSource(input_uris=input_uris)
input_config = automl.types.InputConfig(gcs_source=gcs_source)
# Import data from the input URI
response = client.import_data(dataset_full_id, input_config)

print("Processing import...")
print("Data imported. {}".format(response.result()))

Ruby

在试用此示例之前,请按照客户端库页面中与此编程语言对应的设置说明执行操作。

require "google/cloud/automl"

project_id = "YOUR_PROJECT_ID"
dataset_id = "YOUR_DATASET_ID"
path = "gs://BUCKET_ID/path_to_training_data.csv"

client = Google::Cloud::AutoML::AutoML.new

# Get the full path of the dataset.
dataset_full_id = client.class.dataset_path project_id, "us-central1", dataset_id
input_config = {
  gcs_source: {
    # Get the multiple Google Cloud Storage URIs
    input_uris: path.split(",")
  }
}

# Import data from the input URI
operation = client.import_data dataset_full_id, input_config

puts "Processing import..."

# Wait until the long running operation is done
operation.wait_until_done!

puts "Data imported."

为导入的训练图片添加注释主题介绍了如何在界面中手动为图片添加边界框和标签,以及如何列出标签统计信息。

管理数据集主题包含有关如何使用数据集资源的详细信息,例如如何列出、获取、导出或删除数据集。

处理长时间运行的操作

REST 和命令行

在使用下面的任何请求数据之前,请先进行以下替换:

  • project-id:您的 GCP 项目 ID。
  • operation-id:您的操作的 ID。此 ID 是操作名称的最后一个元素。例如:
    • 操作名称:projects/project-id/locations/location-id/operations/IOD5281059901324392598
    • 操作 ID:IOD5281059901324392598

HTTP 方法和网址:

GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id

PowerShell

执行以下命令:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/operations/operation-id" | Select-Object -Expand Content
完成导入操作后,您应该会看到类似如下所示的输出:
{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2018-10-29T15:56:29.176485Z",
    "updateTime": "2018-10-29T16:10:41.326614Z",
    "importDataDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.protobuf.Empty"
  }
}

完成创建模型操作后,您应该会看到类似如下所示的输出:

{
  "name": "projects/project-id/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-07-22T18:35:06.881193Z",
    "updateTime": "2019-07-22T19:58:44.972235Z",
    "createModelDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.Model",
    "name": "projects/project-id/locations/us-central1/models/model-id"
  }
}

C#

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

/// <summary>
/// Demonstrates using the AutoML client to get operation status.
/// </summary>
/// <param name="operationFullId">the complete name of a operation. For example, the name of your
/// operation is projects/[projectId]/locations/us-central1/operations/[operationId].</param>
public static object GetOperationStatus(string operationFullId
    = "projects/[projectId]/locations/us-central1/operations/[operationId]")
{
    // Initialize the client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    AutoMlClient client = AutoMlClient.Create();

    // Get the latest state of a long-running operation.
    Operation operation = client.CreateModelOperationsClient.GetOperation(operationFullId);

    // Display operation details.
    Console.WriteLine("Operation details:");
    Console.WriteLine($"\tName: {operation.Name}");
    Console.WriteLine($"\tMetadata Type Url: {operation.Metadata.TypeUrl}");
    Console.WriteLine($"\tDone: {operation.Done}");
    if (operation.Response != null)
    {
        Console.WriteLine($"\tResponse Type Url: {operation.Response.TypeUrl}");
    }
    if (operation.Error != null)
    {
        Console.WriteLine("\tResponse:");
        Console.WriteLine($"\t\tError code: {operation.Error.Code}");
        Console.WriteLine($"\t\tError message: {operation.Error.Message}");
    }

    return 0;
}

Go

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"google.golang.org/genproto/googleapis/longrunning"
)

// getOperationStatus gets an operation's status.
func getOperationStatus(w io.Writer, projectID string, location string, operationID string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// operationID := "TRL123456789..."

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &longrunning.GetOperationRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/operations/%s", projectID, location, operationID),
	}

	op, err := client.LROClient.GetOperation(ctx, req)
	if err != nil {
		return fmt.Errorf("GetOperation: %v", err)
	}

	fmt.Fprintf(w, "Name: %v\n", op.GetName())
	fmt.Fprintf(w, "Operation details:\n")
	fmt.Fprintf(w, "%v", op)

	return nil
}

Java

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

import com.google.cloud.automl.v1.AutoMlClient;
import com.google.longrunning.Operation;
import java.io.IOException;

class GetOperationStatus {

  static void getOperationStatus() throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String operationFullId = "projects/[projectId]/locations/us-central1/operations/[operationId]";
    getOperationStatus(operationFullId);
  }

  // Get the status of an operation
  static void getOperationStatus(String operationFullId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the latest state of a long-running operation.
      Operation operation = client.getOperationsClient().getOperation(operationFullId);

      // Display operation details.
      System.out.println("Operation details:");
      System.out.format("\tName: %s\n", operation.getName());
      System.out.format("\tMetadata Type Url: %s\n", operation.getMetadata().getTypeUrl());
      System.out.format("\tDone: %s\n", operation.getDone());
      if (operation.hasResponse()) {
        System.out.format("\tResponse Type Url: %s\n", operation.getResponse().getTypeUrl());
      }
      if (operation.hasError()) {
        System.out.println("\tResponse:");
        System.out.format("\t\tError code: %s\n", operation.getError().getCode());
        System.out.format("\t\tError message: %s\n", operation.getError().getMessage());
      }
    }
  }
}

Node.js

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const operationId = 'YOUR_OPERATION_ID';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function getOperationStatus() {
  // Construct request
  const request = {
    name: `projects/${projectId}/locations/${location}/operations/${operationId}`,
  };

  const [response] = await client.operationsClient.getOperation(request);

  console.log(`Name: ${response.name}`);
  console.log('Operation details:');
  console.log(`${response}`);
}

getOperationStatus();

PHP

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

use Google\ApiCore\LongRunning\OperationsClient;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $operationId = 'my_operation_id_123';

$client = new OperationsClient();

try {
    // full name of operation
    $formattedName = 'projects/' . $projectId . '/locations/us-central1/operations/' . $operationId;

    // get latest state of long running operation
    $operation = $client->getOperation($name);
    printf('Operation name: %s' . PHP_EOL, $operation->getName());
    print('Operation details: ');
    print($operation);
} finally {
    if (isset($client)) {
        $client->close();
    }
}

Python

在试用此示例之前,请按照 API 与参考文档 > 客户端库页面上与此编程语言对应的设置说明进行操作。

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# operation_full_id = \
#     "projects/[projectId]/locations/us-central1/operations/[operationId]"

client = automl.AutoMlClient()
# Get the latest state of a long-running operation.
response = client.transport._operations_client.get_operation(
    operation_full_id
)

print("Name: {}".format(response.name))
print("Operation details:")
print(response)