この古いバージョンの AutoML Natural Language は非推奨となり、レガシープラットフォームで新しいモデルのトレーニングやデプロイができなくなります。すでにデプロイされているモデルは 2024 年 5 月 30 日に動作しなくなります。以前の AutoML Natural Language のすべての機能と新機能は、Vertex AI プラットフォームで使用可能です。リソースを移行する方法については、Vertex AI に移行するをご覧ください。

データセットの作成とデータのインポート

データセットには分類するコンテンツタイプの代表的なサンプルが含まれ、サンプルにはカスタムモデルで使用するカテゴリラベルが付けられています。このデータセットを入力値として利用し、モデルをトレーニングします。

データセットの主な作成手順は次のとおりです。

データセットリソースを作成します。
データセットにトレーニングデータをインポートします。
ドキュメントにラベルを付けるか、エンティティを識別します。

分類と感情分析では、多くの場合、ラベルがすでに割り当てられているデータ項目をインポートすることで、手順 2 と手順 3 が同時に行われます。

データセットの作成

カスタムモデルを作成するには、まず空のデータセットを作成します。作成したデータセットには、最終的にそのモデルのトレーニングデータが格納されます。新しく作成したデータセットには、ドキュメントをインポートするまでデータは含まれません。

ウェブ UI

データセットを作成するには:

AutoML Natural Language UI を開き、トレーニングするモデルのタイプに対応するボックスの [開始] を選択します。

[データセット] ページが開き、現在のプロジェクトでこれまでに作成されたデータセットのステータスが表示されます。

別のプロジェクトのデータセットを追加するには、タイトルバーの右上にあるプルダウンリストからプロジェクトを選択します。
タイトルバーの [新しいデータセット] ボタンをクリックします。
データセットの名前を入力し、データセットを保存する地理的なロケーションを指定します。

詳細については、ロケーションをご覧ください。
モデルの目標を選択します。ここでは、データセットを使用してトレーニングするモデルが行う分析のタイプを指定します。
- [単一ラベルの分類] では、分類されたドキュメントごとに 1 つのラベルを割り当てます。
- マルチラベル分類では、1 つのドキュメントに複数のラベルを割り当てることができます
- エンティティ抽出では、ドキュメント内のエンティティを識別します。
- 感情分析では、ドキュメント内の感情的な傾向を分析します。
[データセットを作成] をクリックします。

新しいデータセットの [インポート] ページが表示されます。インポートの手順については、データセットへのデータのインポートをご覧ください。

コードサンプル

REST

リクエストのデータを使用する前に、次のように置き換えます。

project-id: プロジェクト ID
location-id: リソースのロケーション。グローバルロケーションの場合は us-central1、EU の場合は eu。

HTTP メソッドと URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets

リクエストの本文（JSON）:

{
  "displayName": "test_dataset",
  "textClassificationDatasetMetadata": {
    "classificationType": "MULTICLASS"
  }
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: project-id" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets"

PowerShell（Windows）

注:次のコマンドは、gcloud次のコマンドを実行して、ユーザーアカウントで CLI を使用します。gcloud init または gcloud auth login 。 gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
  "name": "projects/434039606874/locations/us-central1/datasets/356587829854924648",
  "displayName": "test_dataset",
  "createTime": "2018-04-26T18:02:59.825060Z",
  "textClassificationDatasetMetadata": {
    "classificationType": "MULTICLASS"
  }
}

Python

AutoML Natural Language のクライアントライブラリをインストールして使用する方法については、AutoML Natural Language のクライアントライブラリをご覧ください。詳細については、AutoML Natural Language Python API のリファレンスドキュメントをご覧ください。

AutoML Natural Language で認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、ローカル開発環境の認証の設定をご覧ください。

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "YOUR_DATASET_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
# Specify the classification type
# Types:
# MultiLabel: Multiple labels are allowed for one example.
# MultiClass: At most one label is allowed per example.
metadata = automl.TextClassificationDatasetMetadata(
    classification_type=automl.ClassificationType.MULTICLASS
)
dataset = automl.Dataset(
    display_name=display_name,
    text_classification_dataset_metadata=metadata,
)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)

created_dataset = response.result()

# Display the dataset information
print(f"Dataset name: {created_dataset.name}")
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))

Java

AutoML Natural Language のクライアントライブラリをインストールして使用する方法については、AutoML Natural Language のクライアントライブラリをご覧ください。詳細については、AutoML Natural Language Java API のリファレンスドキュメントをご覧ください。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.ClassificationType;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextClassificationDatasetMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageTextClassificationCreateDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");

      // Specify the classification type
      // Types:
      // MultiLabel: Multiple labels are allowed for one example.
      // MultiClass: At most one label is allowed per example.
      ClassificationType classificationType = ClassificationType.MULTILABEL;

      // Specify the text classification type for the dataset.
      TextClassificationDatasetMetadata metadata =
          TextClassificationDatasetMetadata.newBuilder()
              .setClassificationType(classificationType)
              .build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setTextClassificationDatasetMetadata(metadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

AutoML Natural Language のクライアントライブラリをインストールして使用する方法については、AutoML Natural Language のクライアントライブラリをご覧ください。詳細については、AutoML Natural Language Node.js API のリファレンスドキュメントをご覧ください。

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      textClassificationDatasetMetadata: {
        classificationType: 'MULTICLASS',
      },
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

Go

AutoML Natural Language のクライアントライブラリをインストールして使用する方法については、AutoML Natural Language のクライアントライブラリをご覧ください。詳細については、AutoML Natural Language Go API のリファレンスドキュメントをご覧ください。

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// languageTextClassificationCreateDataset creates a dataset for text classification.
func languageTextClassificationCreateDataset(w io.Writer, projectID string, location string, datasetName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_TextClassificationDatasetMetadata{
				TextClassificationDatasetMetadata: &automlpb.TextClassificationDatasetMetadata{
					// Specify the classification type:
					// - MULTILABEL: Multiple labels are allowed for one example.
					// - MULTICLASS: At most one label is allowed per example.
					ClassificationType: automlpb.ClassificationType_MULTICLASS,
				},
			},
		},
	}

	op, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %w", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	dataset, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

その他の言語

C#: クライアントライブラリページの C# の設定手順を行ってから、.NET 用の AutoML Natural Language リファレンスドキュメントをご覧ください。

PHP: クライアントライブラリページの PHP の設定手順を行ってから、PHP 用の AutoML Natural Language リファレンスドキュメントをご覧ください。

Ruby: クライアントライブラリページの Ruby の設定手順を行ってから、Ruby 用の AutoML Natural Language のリファレンスドキュメントをご覧ください。

エンティティの抽出

REST

リクエストのデータを使用する前に、次のように置き換えます。

project-id: プロジェクト ID
location-id: リソースのロケーション。グローバルロケーションの場合は us-central1、EU の場合は eu。

HTTP メソッドと URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets

リクエストの本文（JSON）:

{
  "displayName": "test_dataset",
  "textExtractionDatasetMetadata": {
   }
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: project-id" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets"

PowerShell（Windows）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
  name: "projects/000000000000/locations/us-central1/datasets/TEN5582774688079151104"
  display_name: "test_dataset"
  create_time {
     seconds: 1539886451
     nanos: 757650000
   }
   text_extraction_dataset_metadata {
   }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "YOUR_DATASET_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
metadata = automl.TextExtractionDatasetMetadata()
dataset = automl.Dataset(
    display_name=display_name, text_extraction_dataset_metadata=metadata
)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)

created_dataset = response.result()

# Display the dataset information
print(f"Dataset name: {created_dataset.name}")
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextExtractionDatasetMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageEntityExtractionCreateDataset {

  static void createDataset() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");

      TextExtractionDatasetMetadata metadata = TextExtractionDatasetMetadata.newBuilder().build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setTextExtractionDatasetMetadata(metadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      textExtractionDatasetMetadata: {},
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// languageEntityExtractionCreateDataset creates a dataset for text entity extraction.
func languageEntityExtractionCreateDataset(w io.Writer, projectID string, location string, datasetName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_TextExtractionDatasetMetadata{
				TextExtractionDatasetMetadata: &automlpb.TextExtractionDatasetMetadata{},
			},
		},
	}

	op, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %w", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	dataset, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

その他の言語

C#: クライアントライブラリページの C# の設定手順を行ってから、.NET 用の AutoML Natural Language リファレンスドキュメントをご覧ください。

PHP: クライアントライブラリページの PHP の設定手順を行ってから、PHP 用の AutoML Natural Language リファレンスドキュメントをご覧ください。

Ruby: クライアントライブラリページの Ruby の設定手順を行ってから、Ruby 用の AutoML Natural Language のリファレンスドキュメントをご覧ください。

感情分析

REST

リクエストのデータを使用する前に、次のように置き換えます。

project-id: プロジェクト ID
location-id: リソースのロケーション。グローバルロケーションの場合は us-central1、EU の場合は eu。

HTTP メソッドと URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets

リクエストの本文（JSON）:

{
  "displayName": "test_dataset",
  "textSentimentDatasetMetadata": {
    "sentimentMax": 4
  }
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: project-id" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets"

PowerShell（Windows）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
  name: "projects/000000000000/locations/us-central1/datasets/TST8962998974766436002"
  display_name: "test_dataset_name"
  create_time {
    seconds: 1538855662
    nanos: 51542000
  }
  text_sentiment_dataset_metadata {
    sentiment_max: 7
  }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "YOUR_DATASET_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"

# Each dataset requires a sentiment score with a defined sentiment_max
# value, for more information on TextSentimentDatasetMetadata, see:
# https://cloud.google.com/natural-language/automl/docs/prepare#sentiment-analysis
# https://cloud.google.com/automl/docs/reference/rpc/google.cloud.automl.v1#textsentimentdatasetmetadata
metadata = automl.TextSentimentDatasetMetadata(
    sentiment_max=4
)  # Possible max sentiment score: 1-10

dataset = automl.Dataset(
    display_name=display_name, text_sentiment_dataset_metadata=metadata
)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)

created_dataset = response.result()

# Display the dataset information
print(f"Dataset name: {created_dataset.name}")
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextSentimentDatasetMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageSentimentAnalysisCreateDataset {

  static void createDataset() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");
      // Specify the text classification type for the dataset.
      TextSentimentDatasetMetadata metadata =
          TextSentimentDatasetMetadata.newBuilder()
              .setSentimentMax(4) // Possible max sentiment score: 1-10
              .build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setTextSentimentDatasetMetadata(metadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      textSentimentDatasetMetadata: {
        sentimentMax: 4, // Possible max sentiment score: 1-10
      },
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// languageSentimentAnalysisCreateDataset creates a dataset for text sentiment analysis.
func languageSentimentAnalysisCreateDataset(w io.Writer, projectID string, location string, datasetName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_TextSentimentDatasetMetadata{
				TextSentimentDatasetMetadata: &automlpb.TextSentimentDatasetMetadata{
					SentimentMax: 4, // Possible max sentiment score: 1-10
				},
			},
		},
	}

	op, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %w", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	dataset, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

その他の言語

C#: クライアントライブラリページの C# の設定手順を行ってから、.NET 用の AutoML Natural Language リファレンスドキュメントをご覧ください。

PHP: クライアントライブラリページの PHP の設定手順を行ってから、PHP 用の AutoML Natural Language リファレンスドキュメントをご覧ください。

Ruby: クライアントライブラリページの Ruby の設定手順を行ってから、Ruby 用の AutoML Natural Language のリファレンスドキュメントをご覧ください。

データセットへのトレーニングデータのインポート

データセットを作成すると、Cloud Storage バケットに保存されている CSV ファイルからドキュメントの URI とドキュメントのラベルをインポートできるようになります。データの準備とインポート用の CSV ファイルの作成の詳細については、トレーニングデータの準備をご覧ください。

ドキュメントを空のデータセットにインポート、または既存のデータセットに追加でインポートできます。

ウェブ UI

ドキュメントをデータセットにインポートするには、次のようにします。

[データセット] ページでドキュメントをインポートするデータセットを選択します。
[インポート] タブで、トレーニングドキュメントの場所を指定します。

次のことが可能です。
- ローカル PC や Cloud Storage から、トレーニングドキュメントと関連するカテゴリラベルを含む .csv ファイルをアップロードする。
- ローカル PC からトレーニングドキュメントを含む .txt、.tif、.pdf、.zip ファイルをアップロードする。
インポートするファイルとインポートされたドキュメントが置かれる Cloud Storage のパスを選択する。
[インポート] をクリックします。

コードサンプル

REST

リクエストのデータを使用する前に、次のように置き換えます。

project-id: プロジェクト ID
location-id: リソースのロケーション。グローバルロケーションの場合は us-central1、EU の場合は eu。
dataset-id: データセット ID
bucket-name: Cloud Storage バケット
csv-file-name: CSV トレーニングデータファイル

HTTP メソッドと URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets/dataset-id:importData

リクエストの本文（JSON）:

{
  "inputConfig": {
    "gcsSource": {
      "inputUris": ["gs://bucket-name/csv-file-name.csv"]
      }
  }
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: project-id" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets/dataset-id:importData"

PowerShell（Windows）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets/dataset-id:importData" | Select-Object -Expand Content

出力は次のようになります。オペレーション ID を使用して、タスクのステータスを取得できます。例については、オペレーションのステータスの取得をご覧ください。

{
  "name": "projects/434039606874/locations/us-central1/operations/1979469554520650937",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
    "createTime": "2018-04-27T01:28:36.128120Z",
    "updateTime": "2018-04-27T01:28:36.128150Z",
    "cancellable": true
  }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# dataset_id = "YOUR_DATASET_ID"
# path = "gs://YOUR_BUCKET_ID/path/to/data.csv"

client = automl.AutoMlClient()
# Get the full path of the dataset.
dataset_full_id = client.dataset_path(project_id, "us-central1", dataset_id)
# Get the multiple Google Cloud Storage URIs
input_uris = path.split(",")
gcs_source = automl.GcsSource(input_uris=input_uris)
input_config = automl.InputConfig(gcs_source=gcs_source)
# Import data from the input URI
response = client.import_data(name=dataset_full_id, input_config=input_config)

print("Processing import...")
print(f"Data imported. {response.result()}")

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.DatasetName;
import com.google.cloud.automl.v1.GcsSource;
import com.google.cloud.automl.v1.InputConfig;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.protobuf.Empty;
import java.io.IOException;
import java.util.Arrays;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

class ImportDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String path = "gs://BUCKET_ID/path_to_training_data.csv";
    importDataset(projectId, datasetId, path);
  }

  // Import a dataset
  static void importDataset(String projectId, String datasetId, String path)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the complete path of the dataset.
      DatasetName datasetFullId = DatasetName.of(projectId, "us-central1", datasetId);

      // Get multiple Google Cloud Storage URIs to import data from
      GcsSource gcsSource =
          GcsSource.newBuilder().addAllInputUris(Arrays.asList(path.split(","))).build();

      // Import data from the input URI
      InputConfig inputConfig = InputConfig.newBuilder().setGcsSource(gcsSource).build();
      System.out.println("Processing import...");

      // Start the import job
      OperationFuture<Empty, OperationMetadata> operation =
          client.importDataAsync(datasetFullId, inputConfig);

      System.out.format("Operation name: %s%n", operation.getName());

      // If you want to wait for the operation to finish, adjust the timeout appropriately. The
      // operation will still run if you choose not to wait for it to complete. You can check the
      // status of your operation using the operation's name.
      Empty response = operation.get(45, TimeUnit.MINUTES);
      System.out.format("Dataset imported. %s%n", response);
    } catch (TimeoutException e) {
      System.out.println("The operation's polling period was not long enough.");
      System.out.println("You can use the Operation's name to get the current status.");
      System.out.println("The import job is still running and will complete as expected.");
      throw e;
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const datasetId = 'YOUR_DISPLAY_ID';
// const path = 'gs://BUCKET_ID/path_to_training_data.csv';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function importDataset() {
  // Construct request
  const request = {
    name: client.datasetPath(projectId, location, datasetId),
    inputConfig: {
      gcsSource: {
        inputUris: path.split(','),
      },
    },
  };

  // Import dataset
  console.log('Proccessing import');
  const [operation] = await client.importData(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(`Dataset imported: ${response}`);
}

importDataset();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// importDataIntoDataset imports data into a dataset.
func importDataIntoDataset(w io.Writer, projectID string, location string, datasetID string, inputURI string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TRL123456789..."
	// inputURI := "gs://BUCKET_ID/path_to_training_data.csv"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.ImportDataRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/datasets/%s", projectID, location, datasetID),
		InputConfig: &automlpb.InputConfig{
			Source: &automlpb.InputConfig_GcsSource{
				GcsSource: &automlpb.GcsSource{
					InputUris: []string{inputURI},
				},
			},
		},
	}

	op, err := client.ImportData(ctx, req)
	if err != nil {
		return fmt.Errorf("ImportData: %w", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	if err := op.Wait(ctx); err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	fmt.Fprintf(w, "Data imported.\n")

	return nil
}

その他の言語

C#: クライアントライブラリページの C# の設定手順を行ってから、.NET 用の AutoML Natural Language リファレンスドキュメントをご覧ください。

PHP: クライアントライブラリページの PHP の設定手順を行ってから、PHP 用の AutoML Natural Language リファレンスドキュメントをご覧ください。

Ruby: クライアントライブラリページの Ruby の設定手順を行ってから、Ruby 用の AutoML Natural Language のリファレンスドキュメントをご覧ください。

トレーニングドキュメントのラベル付け

モデルのトレーニングで役立つように、AutoML Natural Language が類似のドキュメントにラベル付けを行うように、データセット内の各ドキュメントにラベルを付ける必要があります。トレーニングデータの品質は作成するモデルの有効性を大きく左右し、ひいてはそのモデルから返される予測の品質にも大きく影響します。トレーニング中、AutoML Natural Language はラベルのないドキュメントを無視します。

トレーニングドキュメントには、次の 3 つの方法でラベルを付けることができます。

ラベルを .csv ファイルに含める（分類と感情分析のみ）
AutoML Natural Language UI でドキュメントにラベルを付ける
AI Platform Data Labeling Service を使用して人間のラベル付け担当者に依頼する

AutoML API にはラベル付けのためのメソッドは含まれていません。

.csv ファイルでドキュメントにラベルを付ける方法の詳細については、トレーニングデータの準備をご覧ください。

分類と感情分析用のラベル付け

AutoML Natural Language UI でドキュメントにラベルを付けるには、データセットの一覧ページからデータセットを選択してデータセットの詳細を表示します。選択したデータセットの表示名がタイトルバーに表示され、データセット内の個々のドキュメントが現在のラベルと一緒にページに一覧表示されます。左側にあるナビゲーションバーには、ラベル付きドキュメントとラベルなしドキュメントの数が表示され、項目の一覧をラベルや感情値でフィルタリングできます。

テキスト項目ページ

ラベルのないドキュメントへのラベルや感情値の割り当てや、ラベルの変更を行うには、更新するドキュメントおよび割り当てるラベルや感情値を選択します。ドキュメントのラベルを更新するには、次の 2 つの方法があります。

更新するドキュメントの横にあるチェックボックスをオンにし、ドキュメント一覧の上部に表示される [ラベル] プルダウンリストから適用するラベルを選択する。
更新する項目の行をクリックし、[Text detail] ページに表示されるリストから適用するラベルを選択する。

エンティティ抽出用のエンティティの識別

カスタムモデルをトレーニングする前に、データセット内のトレーニングドキュメントにアノテーションを付ける必要があります。インポート前にトレーニングドキュメントにアノテーションを追加するか、AutoML Natural Language UI でアノテーションを追加できます。

AutoML Natural Language UI でアノテーションを追加するには、データセットの一覧ページからデータセットを選択してデータセットの詳細を表示します。選択したデータセットの表示名がタイトルバーに表示され、データセット内の個々のドキュメントが付けられているアノテーションと一緒にページに一覧表示されます。左側のナビゲーションバーには、ラベルと各ラベルの表示回数が表示されます。ドキュメントの一覧はラベルでフィルタリングすることもできます。

アノテーションの一覧

ドキュメント内のアノテーションを追加または削除するには、更新するドキュメントをダブルクリックします。[編集] ページに、選択したドキュメントの全テキストが表示され、以前の注釈すべてがハイライトされます。

エンティティエディタ

PDF のトレーニングドキュメントやレイアウト情報をインポートしたドキュメントの場合、[編集] ページは 2 つのタブ [Plain text] と [Structured text] で構成されます。[Plain text] タブには、トレーニングドキュメントの内容が書式なしで表示されます。[Structured text] タブには、トレーニングドキュメントの基本レイアウトが再作成されます（[Plain text] タブにも、元の PDF ファイルへのリンクがあります）。

構造化テキストエディタ

新しいアノテーションを追加するには、エンティティを表すテキストをハイライト表示し、[Annotate] ダイアログボックスでラベルを選択して [保存] をクリックします。[Structured text] タブにアノテーションを追加すると、AutoML Natural Language はトレーニング中に考慮される要素としてページ上のアノテーションの位置を取得します。

アノテーションを追加

アノテーションを削除するには、右側のラベル一覧内でテキストを見つけ、横のゴミ箱アイコンをクリックします。

データセットの作成とデータのインポート

データセットの作成

ウェブ UI

コードサンプル

分類

REST

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

Python

Java

Node.js

Go

その他の言語

エンティティの抽出

REST

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

Python

Java

Node.js

Go

その他の言語

感情分析

REST

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

Python

Java

Node.js

Go

その他の言語

データセットへのトレーニング データのインポート

ウェブ UI

コードサンプル

REST

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

Python

Java

Node.js

Go

その他の言語

トレーニング ドキュメントのラベル付け

分類と感情分析用のラベル付け

エンティティ抽出用のエンティティの識別

データセットへのトレーニングデータのインポート

トレーニングドキュメントのラベル付け