创建和管理数据集

数据集包含要翻译的内容类型的代表性样本,即源语言和目标语言中的匹配句对。数据集用作训练模型的输入。

构建数据集的主要步骤如下:

  1. 创建数据集并确定源语言和目标语言。
  2. 导入句对到数据集中。

项目可以有多个数据集,每个数据集用于训练单独的模型。您可以获取可用数据集列表,并且可以删除不再需要的数据集

创建数据集

要创建自定义模型,首先需要创建一个空数据集,该数据集最终将保存模型的训练数据。创建数据集时,需要确定模型的源语言和目标语言。 如需详细了解支持的语言和变体,请参阅自定义模型的语言支持

网页界面

在 AutoML Translation 界面中,您可以创建新数据集并在同一页面将训练项导入其中。

  1. 访问 AutoML Translation 界面

  2. 从标题栏右上角的下拉列表中选择您为其启用了 AutoML Translation 的项目。

  3. 数据集标签页上,点击创建数据集

    显示了一个数据集的“数据集”页面

  4. 创建数据集对话框中,执行以下操作:

    • 输入数据集的名称。
    • 从下拉列表中选择源语言和目标语言。选择源语言后,系统就会显示可用的目标语言

    • 点击创建。系统会打开导入标签页。

REST 和命令行

发送创建数据集请求

下面演示了如何向 project.locations.datasets/create 方法发送 POST 请求。本示例针对通过 Cloud SDK 为项目设置的服务帐号使用访问令牌。

在使用任何请求数据之前,请先进行以下替换:

  • project-id:您的 Google Cloud Platform 项目 ID
  • dataset-name:新数据集的名称
  • source-language-code:您要翻译的语言,以 ISO 639-1 代码形式表示,例如“en”
  • target-language-code:要翻译为的目标语言,如 ISO 639-1 代码,例如“es”

HTTP 方法和网址:

POST https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets

请求 JSON 正文:

{
    "displayName": "dataset-name",
    "translationDatasetMetadata": {
       "sourceLanguageCode": "source-language-code",
       "targetLanguageCode": "target-language-code"
     }
}

如需发送您的请求,请展开以下选项之一:

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/project-number/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-10-01T22:13:48.155710Z",
    "updateTime": "2019-10-01T22:13:48.155710Z",
    "createDatasetDetails": {}
  }
}

获取结果

如需获取请求的结果,您必须向 operations 资源发送 GET 请求。下面演示了如何发送此类请求。

在使用任何请求数据之前,请先进行以下替换:

  • operation-name:在对 API 的原始调用的响应中返回的操作名称

HTTP 方法和网址:

GET https://automl.googleapis.com/v1/operation-name

如需发送您的请求,请展开以下选项之一:

您应该收到类似以下内容的 JSON 响应:

{
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-10-01T22:13:48.155710Z",
    "updateTime": "2019-10-01T22:13:52.321072Z",
    ...
  },
  "done": true,
  "response": {
    "@type": "resource-type",
    "name": "resource-name"
  }
}

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// translateCreateDataset creates a dataset for translate.
func translateCreateDataset(w io.Writer, projectID string, location string, datasetName string, sourceLanguageCode string, targetLanguageCode string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	// Supported languages:
	//   https://cloud.google.com/translate/automl/docs/languages
	// sourceLanguageCode := "en"
	// targetLanguageCode := "ja"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_TranslationDatasetMetadata{
				TranslationDatasetMetadata: &automlpb.TranslationDatasetMetadata{
					SourceLanguageCode: sourceLanguageCode,
					TargetLanguageCode: targetLanguageCode,
				},
			},
		},
	}

	op, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	dataset, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %v", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TranslationDatasetMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class TranslateCreateDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");

      // Specify the source and target language.
      TranslationDatasetMetadata translationDatasetMetadata =
          TranslationDatasetMetadata.newBuilder()
              .setSourceLanguageCode("en")
              .setTargetLanguageCode("ja")
              .build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setTranslationDatasetMetadata(translationDatasetMetadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      translationDatasetMetadata: {
        sourceLanguageCode: 'en',
        targetLanguageCode: 'ja',
      },
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

Python

在运行此代码示例之前,您必须先安装 Python 客户端库

保存新数据集的全名以用于其他操作(例如,将训练项导入数据集)。

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "YOUR_DATASET_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
# For a list of supported languages, see:
# https://cloud.google.com/translate/automl/docs/languages
dataset_metadata = automl.TranslationDatasetMetadata(
    source_language_code="en", target_language_code="ja"
)
dataset = automl.Dataset(
    display_name=display_name,
    translation_dataset_metadata=dataset_metadata,
)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)

created_dataset = response.result()

# Display the dataset information
print("Dataset name: {}".format(created_dataset.name))
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))

其他语言

C#:请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Translation 参考文档

PHP:请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Translation 参考文档

Ruby:请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Translation 参考文档

将训练项导入数据集

创建数据集后,您可以将训练句对导入其中。 如需详细了解如何准备训练数据,请参阅准备训练数据

网页界面

在 AutoML Translation 界面中,您可以创建新数据集并在同一页面将训练项导入其中(请参阅创建数据集)。要将训练项导入现有数据集,请按以下步骤操作。

创建数据集文件夹之后,您就可以上传数据了。

  1. 上传要用于训练模型的句对。

    导入标签页上,您可以从本地计算机或 Cloud Storage 上传 TSV 或 TMX 文件。对于本地导入的文件,请在选择文件后点击浏览。系统会显示文件夹列表。选择您要将文件上传到其中的目标文件夹。为了保证数据驻留,必须在 Cloud Storage 上托管此目录。

    如果您要上传包含句对的不同文件,请选中使用不同的文件进行训练、验证和测试(高级)复选框。如果数据集中的句对数超过 100000,则建议使用此选项。您最多只能为验证集和测试集分配 10000 个句对;否则,AutoML Translation 会返回错误。

    “导入”标签页

  2. 点击继续

    您会返回到数据集页面。您的数据集会在文档导入期间显示一个进行中动画。成功上传数据集后,我们会向您注册程序时使用的电子邮件地址发送一封邮件。

  3. 查看数据集。

    成功导入数据后,请从数据集标签页选择相应数据集以查看其详细信息。句子 (Sentence) 标签页已启用,并显示了该数据集的名称。系统会列出句对。每个句对都带有“训练”、“验证”或“测试”标识,指示将使用该句对的处理阶段。

REST 和命令行

在使用任何请求数据之前,请先进行以下替换:

  • dataset-name:创建数据集时由 API 返回的数据集名称
  • bucket-name:包含描述数据集的输入 CSV 的 Cloud Storage 存储分区
  • csv-file-name:描述您的数据集的 inpug CSV 文件的名称

HTTP 方法和网址:

POST https://automl.googleapis.com/v1/dataset-name:import

请求 JSON 正文:

{
  "inputUris": "gs://bucket-name/csv-file-name"
}

如需发送您的请求,请展开以下选项之一:

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/project-number/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
    "createTime": "2018-04-27T01:28:36.128120Z",
    "updateTime": "2018-04-27T01:28:36.128150Z",
    "cancellable": true
  }
}

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// importDataIntoDataset imports data into a dataset.
func importDataIntoDataset(w io.Writer, projectID string, location string, datasetID string, inputURI string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TRL123456789..."
	// inputURI := "gs://BUCKET_ID/path_to_training_data.csv"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.ImportDataRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/datasets/%s", projectID, location, datasetID),
		InputConfig: &automlpb.InputConfig{
			Source: &automlpb.InputConfig_GcsSource{
				GcsSource: &automlpb.GcsSource{
					InputUris: []string{inputURI},
				},
			},
		},
	}

	op, err := client.ImportData(ctx, req)
	if err != nil {
		return fmt.Errorf("ImportData: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	if err := op.Wait(ctx); err != nil {
		return fmt.Errorf("Wait: %v", err)
	}

	fmt.Fprintf(w, "Data imported.\n")

	return nil
}

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.DatasetName;
import com.google.cloud.automl.v1.GcsSource;
import com.google.cloud.automl.v1.InputConfig;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.protobuf.Empty;
import java.io.IOException;
import java.util.Arrays;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

class ImportDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String path = "gs://BUCKET_ID/path_to_training_data.csv";
    importDataset(projectId, datasetId, path);
  }

  // Import a dataset
  static void importDataset(String projectId, String datasetId, String path)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the complete path of the dataset.
      DatasetName datasetFullId = DatasetName.of(projectId, "us-central1", datasetId);

      // Get multiple Google Cloud Storage URIs to import data from
      GcsSource gcsSource =
          GcsSource.newBuilder().addAllInputUris(Arrays.asList(path.split(","))).build();

      // Import data from the input URI
      InputConfig inputConfig = InputConfig.newBuilder().setGcsSource(gcsSource).build();
      System.out.println("Processing import...");

      // Start the import job
      OperationFuture<Empty, OperationMetadata> operation =
          client.importDataAsync(datasetFullId, inputConfig);

      System.out.format("Operation name: %s%n", operation.getName());

      // If you want to wait for the operation to finish, adjust the timeout appropriately. The
      // operation will still run if you choose not to wait for it to complete. You can check the
      // status of your operation using the operation's name.
      Empty response = operation.get(45, TimeUnit.MINUTES);
      System.out.format("Dataset imported. %s%n", response);
    } catch (TimeoutException e) {
      System.out.println("The operation's polling period was not long enough.");
      System.out.println("You can use the Operation's name to get the current status.");
      System.out.println("The import job is still running and will complete as expected.");
      throw e;
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const datasetId = 'YOUR_DISPLAY_ID';
// const path = 'gs://BUCKET_ID/path_to_training_data.csv';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function importDataset() {
  // Construct request
  const request = {
    name: client.datasetPath(projectId, location, datasetId),
    inputConfig: {
      gcsSource: {
        inputUris: path.split(','),
      },
    },
  };

  // Import dataset
  console.log('Proccessing import');
  const [operation] = await client.importData(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(`Dataset imported: ${response}`);
}

importDataset();

Python

在运行此代码示例之前,您必须先安装 Python 客户端库。 AutoML API 仅支持使用 Google Cloud Storage 存储分区中的 .csv 文件导入数据。

  • dataset_full_id 是数据集的全名,格式为 projects/{project-id}/locations/us-central1/datasets/{dataset-id}

  • input_uris 的值必须是与此项目关联的 Google Cloud Storage 存储分区中 CSV 文件的路径。格式为 gs://{project-id}-lcm/{document-name}.csv

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# dataset_id = "YOUR_DATASET_ID"
# path = "gs://YOUR_BUCKET_ID/path/to/data.csv"

client = automl.AutoMlClient()
# Get the full path of the dataset.
dataset_full_id = client.dataset_path(project_id, "us-central1", dataset_id)
# Get the multiple Google Cloud Storage URIs
input_uris = path.split(",")
gcs_source = automl.GcsSource(input_uris=input_uris)
input_config = automl.InputConfig(gcs_source=gcs_source)
# Import data from the input URI
response = client.import_data(name=dataset_full_id, input_config=input_config)

print("Processing import...")
print("Data imported. {}".format(response.result()))

其他语言

C#:请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Translation 参考文档

PHP:请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Translation 参考文档

Ruby:请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Translation 参考文档

创建并填充数据集后,即可训练模型(请参阅创建和管理模型)。

管理数据集

列出数据集

一个项目可以包含许多数据集。本部分介绍如何检索项目的可用数据集列表。

网页界面

如需使用 AutoML Translation 界面查看可用数据集的列表,请点击左侧导航菜单顶部的数据集链接。

显示了一个数据集的“数据集”页面

要查看其他项目的数据集,请从标题栏右上角的下拉列表中选择该项目。

REST 和命令行

在使用任何请求数据之前,请先进行以下替换:

  • project-id:您的 Google Cloud Platform 项目 ID

HTTP 方法和网址:

GET https://automl.googleapis.com/v1/projects/project-id/locations/us-central1/datasets

如需发送您的请求,请展开以下选项之一:

您应该收到类似以下内容的 JSON 响应:

{
  "datasets": [
    {
      "name": "projects/project-number/locations/us-central1/datasets/dataset-id",
      "displayName": "dataset-display-name",
      "createTime": "2019-10-01T22:47:38.347689Z",
      "etag": "AB3BwFpPWn6klFqJ867nz98aXr_JHcfYFQBMYTf7rcO-JMi8Ez4iDSNrRW4Vv501i488",
      "translationDatasetMetadata": {
        "sourceLanguageCode": "source-language",
        "targetLanguageCode": "target-language"
      }
    },
    ...
  ]
}

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"google.golang.org/api/iterator"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// listDatasets lists existing datasets.
func listDatasets(w io.Writer, projectID string, location string) error {
	// projectID := "my-project-id"
	// location := "us-central1"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.ListDatasetsRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
	}

	it := client.ListDatasets(ctx, req)

	// Iterate over all results
	for {
		dataset, err := it.Next()
		if err == iterator.Done {
			break
		}
		if err != nil {
			return fmt.Errorf("ListGlossaries.Next: %v", err)
		}

		fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())
		fmt.Fprintf(w, "Dataset display name: %v\n", dataset.GetDisplayName())
		fmt.Fprintf(w, "Dataset create time:\n")
		fmt.Fprintf(w, "\tseconds: %v\n", dataset.GetCreateTime().GetSeconds())
		fmt.Fprintf(w, "\tnanos: %v\n", dataset.GetCreateTime().GetNanos())

		// Translate
		if metadata := dataset.GetTranslationDatasetMetadata(); metadata != nil {
			fmt.Fprintf(w, "Translation dataset metadata:\n")
			fmt.Fprintf(w, "\tsource_language_code: %v\n", metadata.GetSourceLanguageCode())
			fmt.Fprintf(w, "\ttarget_language_code: %v\n", metadata.GetTargetLanguageCode())
		}

	}

	return nil
}

Java

import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.ListDatasetsRequest;
import com.google.cloud.automl.v1.LocationName;
import java.io.IOException;

class ListDatasets {

  static void listDatasets() throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    listDatasets(projectId);
  }

  // List the datasets
  static void listDatasets(String projectId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");
      ListDatasetsRequest request =
          ListDatasetsRequest.newBuilder().setParent(projectLocation.toString()).build();

      // List all the datasets available in the region by applying filter.
      System.out.println("List of datasets:");
      for (Dataset dataset : client.listDatasets(request).iterateAll()) {
        // Display the dataset information
        System.out.format("\nDataset name: %s\n", dataset.getName());
        // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
        // required for other methods.
        // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
        String[] names = dataset.getName().split("/");
        String retrievedDatasetId = names[names.length - 1];
        System.out.format("Dataset id: %s\n", retrievedDatasetId);
        System.out.format("Dataset display name: %s\n", dataset.getDisplayName());
        System.out.println("Dataset create time:");
        System.out.format("\tseconds: %s\n", dataset.getCreateTime().getSeconds());
        System.out.format("\tnanos: %s\n", dataset.getCreateTime().getNanos());
        System.out.println("Translation dataset metadata:");
        System.out.format(
            "\tSource language code: %s\n",
            dataset.getTranslationDatasetMetadata().getSourceLanguageCode());
        System.out.format(
            "\tTarget language code: %s\n",
            dataset.getTranslationDatasetMetadata().getTargetLanguageCode());
      }
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function listDatasets() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    filter: 'translation_dataset_metadata:*',
  };

  const [response] = await client.listDatasets(request);

  console.log('List of datasets:');
  for (const dataset of response) {
    console.log(`Dataset name: ${dataset.name}`);
    console.log(
      `Dataset id: ${
        dataset.name.split('/')[dataset.name.split('/').length - 1]
      }`
    );
    console.log(`Dataset display name: ${dataset.displayName}`);
    console.log('Dataset create time');
    console.log(`\tseconds ${dataset.createTime.seconds}`);
    console.log(`\tnanos ${dataset.createTime.nanos / 1e9}`);
    if (dataset.translationDatasetMetadata !== undefined) {
      console.log('Translation dataset metadata:');
      console.log(
        `\tSource language code: ${dataset.translationDatasetMetadata.sourceLanguageCode}`
      );
      console.log(
        `\tTarget language code: ${dataset.translationDatasetMetadata.targetLanguageCode}`
      );
    }
  }
}

listDatasets();

Python

在运行此代码示例之前,您必须先安装 Python 客户端库
from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"

client = automl.AutoMlClient()
# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"

# List all the datasets available in the region.
request = automl.ListDatasetsRequest(parent=project_location, filter="")
response = client.list_datasets(request=request)

print("List of datasets:")
for dataset in response:
    print("Dataset name: {}".format(dataset.name))
    print("Dataset id: {}".format(dataset.name.split("/")[-1]))
    print("Dataset display name: {}".format(dataset.display_name))
    print("Dataset create time: {}".format(dataset.create_time))
    print("Translation dataset metadata:")
    print(
        "\tsource_language_code: {}".format(
            dataset.translation_dataset_metadata.source_language_code
        )
    )
    print(
        "\ttarget_language_code: {}".format(
            dataset.translation_dataset_metadata.target_language_code
        )
    )

其他语言

C#:请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Translation 参考文档

PHP:请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Translation 参考文档

Ruby:请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Translation 参考文档

删除数据集

网页界面

  1. AutoML Translation 界面中,点击左侧导航菜单顶部的数据集链接以显示可用数据集的列表。

    显示了一个数据集的“数据集”页面

  2. 点击要删除的行最右侧的三点状菜单,然后选择删除

  3. 在确认对话框中点击确认

REST 和命令行

  • 在创建数据集时,将 dataset-name 替换为数据集的完整名称。全名的格式为 projects/{project-id}/locations/us-central1/datasets/{dataset-id}

在使用任何请求数据之前,请先进行以下替换:

  • dataset-name:要删除的数据集的名称,格式为 project/project-id/locations/us-central1/datasets/dataset-id

HTTP 方法和网址:

DELETE https://automl.googleapis.com/v1/dataset-name

如需发送您的请求,请展开以下选项之一:

您应该收到类似以下内容的 JSON 响应:

{
  "name": "projects/project-number/locations/us-central1/operations/operation-id",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1.OperationMetadata",
    "createTime": "2019-10-02T16:43:03.923442Z",
    "updateTime": "2019-10-02T16:43:03.923442Z",
    "deleteDetails": {}
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.protobuf.Empty"
  }
}

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// deleteDataset deletes a dataset.
func deleteDataset(w io.Writer, projectID string, location string, datasetID string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TRL123456789..."

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.DeleteDatasetRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/datasets/%s", projectID, location, datasetID),
	}

	op, err := client.DeleteDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("DeleteDataset: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	if err := op.Wait(ctx); err != nil {
		return fmt.Errorf("Wait: %v", err)
	}

	fmt.Fprintf(w, "Dataset deleted.\n")

	return nil
}

Java

import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.DatasetName;
import com.google.protobuf.Empty;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class DeleteDataset {

  static void deleteDataset() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    deleteDataset(projectId, datasetId);
  }

  // Delete a dataset
  static void deleteDataset(String projectId, String datasetId)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the full path of the dataset.
      DatasetName datasetFullId = DatasetName.of(projectId, "us-central1", datasetId);
      Empty response = client.deleteDatasetAsync(datasetFullId).get();
      System.out.format("Dataset deleted. %s\n", response);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const datasetId = 'YOUR_DATASET_ID';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function deleteDataset() {
  // Construct request
  const request = {
    name: client.datasetPath(projectId, location, datasetId),
  };

  const [operation] = await client.deleteDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(`Dataset deleted: ${response}`);
}

deleteDataset();

PHP

use Google\Cloud\AutoMl\V1\AutoMlClient;

/** Uncomment and populate these variables in your code */
// $projectId = '[Google Cloud Project ID]';
// $location = 'us-central1';
// $datasetId = 'my_dataset_id_123';

$client = new AutoMlClient();

try {
    // get full path of dataset
    $formattedName = $client->datasetName(
        $projectId,
        $location,
        $datasetId
    );

    $operationResponse = $client->deleteDataset($formattedName);
    $operationResponse->pollUntilComplete();
    if ($operationResponse->operationSucceeded()) {
        $result = $operationResponse->getResult();
        printf('Dataset deleted.' . PHP_EOL);
    } else {
        $error = $operationResponse->getError();
        // handleError($error)
    }
} finally {
    $client->close();
}

Python

在运行此代码示例之前,您必须先安装 Python 客户端库

  • dataset_id 是数据集的全名,格式为 projects/{project-id}/locations/us-central1/datasets/{dataset-id}
from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# dataset_id = "YOUR_DATASET_ID"

client = automl.AutoMlClient()
# Get the full path of the dataset
dataset_full_id = client.dataset_path(project_id, "us-central1", dataset_id)
response = client.delete_dataset(name=dataset_full_id)

print("Dataset deleted. {}".format(response.result()))

其他语言

C#:请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Translation 参考文档

PHP:请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Translation 参考文档

Ruby:请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Translation 参考文档

导入问题:

创建数据集时,如果句子对过长或源语言和目标语言完全相同,则 AutoML Translation 可能会丢弃这些句对。

对于过长的句对,我们建议将其分解为大约 200 个字或更小的句对,然后重新创建包含已删除句对的数据集。在处理数据时,AutoML Translation 使用内部流程对输入数据进行令牌化,这可能会增加句子的大小。AutoML Translation 可使用此令牌化数据测量数据大小。因此,200 字的上限是最大长度的估算值。

您可以从数据集中移除在源语言和目标语言中相同的句对。如果您希望这些句子保持未翻译状态,请使用术语表资源构建用于定义 AutoML Translation 处理特定术语方式的自定义字典。