本文档适用于 AutoML Natural Language(它与 Vertex AI 不同)。如果您使用的是 Vertex AI,请参阅 Vertex AI 文档

训练模型

如果您有一个包含一组带标签的固定训练文档的数据集,您就可以创建和训练自定义模型了。

训练模型可能需要几个小时才能完成。所需的训练时间取决于多种因素,例如数据集的大小、训练项的性质以及模型的复杂程度。AutoML Natural Language 使用早停法确保获得最佳的模型,避免出现过拟合问题。

对于分类模型,平均训练时间约为 6 小时,最长为 24 小时。对于实体提取和情感分析模式,平均训练时间为 5 小时,最长为 6 小时。

成功训练模型后,我们会向您的项目关联的电子邮件地址发送一封邮件。

自定义模型的最长使用期限为 18 个月。该时间期限过后,您必须创建并训练新模型,才能继续进行预测。

网页界面

如需训练模型,请执行以下操作:

  1. 打开 AutoML Natural Language 界面,然后在与您计划训练的模型类型对应的框中选择开始使用

    此时会出现数据集页面,其中显示了之前为当前项目创建的数据集的状态。如需使用其他项目的数据集进行训练,请从标题栏右上角的下拉列表中选择项目。

  2. 选择要用于训练自定义模型的数据集。

    所选数据集的显示名会显示在标题栏中,该页面还会列出数据集中的各个文档及其标签。

    文本项页面

  3. 查看完数据集后,点击标题栏正下方的训练标签页。

    如果您要通过此数据集训练第一个模型,训练页面会提供对数据集的基本分析,并就其是否适合训练为您提供相关建议。如果 AutoML Natural Language 建议更改,请考虑返回文本项页面并添加文档或标签。

    如果您已通过此数据集训练其他模型,训练页面会显示这些模型的基本评估指标。

  4. 点击开始训练

  5. 输入模型的名称。

    模型名称的长度不得超过 32 个字符,且只能包含字母、数字和下划线。第一个字符必须是字母。

  6. (可选):如需训练医疗保健术语的实体提取模型,请选择启用 Healthcare Entity Extraction (Enable Healthcare Entity Extraction)(Beta 版)。通过此选项,您可以开始使用针对处理医疗保健数据进行了优化的医疗保健模型。如需了解详情,请参阅 AutoML Entity Extraction for Healthcare

  7. 如果您想自动部署模型,请选中训练完成后部署模型复选框。

  8. 点击开始训练

代码示例

分类

REST 和命令行

在使用任何请求数据之前,请先进行以下替换:

  • project-id:您的项目 ID
  • location-id:资源的位置,全球位置为 us-central1,欧盟位置为 eu
  • dataset-id:您的数据集 ID

HTTP 方法和网址:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models

请求 JSON 正文:

{
  "displayName": "test_model",
  "dataset_id": "dataset-id",
  "textClassificationModelMetadata": {
   }
}

如需发送您的请求,请展开以下选项之一:

您应该会看到类似如下所示的输出。可以使用操作 ID 来获取任务的状态。如需示例,请参阅获取操作状态

{
  "name": "projects/434039606874/locations/us-central1/operations/1979469554520652445",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
    "createTime": "2018-04-27T01:28:41.338120Z",
    "updateTime": "2018-04-27T01:28:41.338120Z",
    "cancellable": true
  }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# dataset_id = "YOUR_DATASET_ID"
# display_name = "YOUR_MODEL_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
# Leave model unset to use the default base model provided by Google
metadata = automl.TextClassificationModelMetadata()
model = automl.Model(
    display_name=display_name,
    dataset_id=dataset_id,
    text_classification_model_metadata=metadata,
)

# Create a model with the model metadata in the region.
response = client.create_model(parent=project_location, model=model)

print("Training operation name: {}".format(response.operation.name))
print("Training started...")

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.Model;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextClassificationModelMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageTextClassificationCreateModel {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String displayName = "YOUR_DATASET_NAME";
    createModel(projectId, datasetId, displayName);
  }

  // Create a model
  static void createModel(String projectId, String datasetId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");
      // Set model metadata.
      TextClassificationModelMetadata metadata =
          TextClassificationModelMetadata.newBuilder().build();
      Model model =
          Model.newBuilder()
              .setDisplayName(displayName)
              .setDatasetId(datasetId)
              .setTextClassificationModelMetadata(metadata)
              .build();

      // Create a model with the model metadata in the region.
      OperationFuture<Model, OperationMetadata> future =
          client.createModelAsync(projectLocation, model);
      // OperationFuture.get() will block until the model is created, which may take several hours.
      // You can use OperationFuture.getInitialFuture to get a future representing the initial
      // response to the request, which contains information while the operation is in progress.
      System.out.format("Training operation name: %s\n", future.getInitialFuture().get().getName());
      System.out.println("Training started...");
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const dataset_id = 'YOUR_DATASET_ID';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createModel() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    model: {
      displayName: displayName,
      datasetId: datasetId,
      textClassificationModelMetadata: {}, // Leave unset, to use the default base model
    },
  };

  // Don't wait for the LRO
  const [operation] = await client.createModel(request);
  console.log(`Training started... ${operation}`);
  console.log(`Training operation name: ${operation.name}`);
}

createModel();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// languageTextClassificationCreateModel creates a model for text classification.
func languageTextClassificationCreateModel(w io.Writer, projectID string, location string, datasetID string, modelName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TCN123456789..."
	// modelName := "model_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.CreateModelRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Model: &automlpb.Model{
			DisplayName: modelName,
			DatasetId:   datasetID,
			ModelMetadata: &automlpb.Model_TextClassificationModelMetadata{
				TextClassificationModelMetadata: &automlpb.TextClassificationModelMetadata{},
			},
		},
	}

	op, err := client.CreateModel(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateModel: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())
	fmt.Fprintf(w, "Training started...\n")

	return nil
}

其他语言

C#: 请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Natural Language 参考文档。

PHP: 请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Natural Language 参考文档。

Ruby: 请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Natural Language 参考文档。

实体提取

REST 和命令行

在使用任何请求数据之前,请先进行以下替换:

  • project-id:您的项目 ID
  • location-id:资源的位置,全球位置为 us-central1,欧盟位置为 eu
  • dataset-id:您的数据集 ID
  • model-hint:要使用的基准模型,例如 defaulthealthcareBeta 版)。

HTTP 方法和网址:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models

请求 JSON 正文:

{
  "displayName": "test_model",
  "dataset_id": "dataset-id",
  "textExtractionModelMetadata": {
    "model_hint": "model-hint"
  }
}

如需发送您的请求,请展开以下选项之一:

您应该会看到类似如下所示的输出。可以使用操作 ID 来获取任务的状态。如需示例,请参阅获取操作状态

{
  "name": "projects/434039606874/locations/us-central1/operations/1979469554520652445",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
    "createTime": "2018-04-27T01:28:41.338120Z",
    "updateTime": "2018-04-27T01:28:41.338120Z",
    "cancellable": true
  }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# dataset_id = "YOUR_DATASET_ID"
# display_name = "YOUR_MODEL_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
# Leave model unset to use the default base model provided by Google
metadata = automl.TextExtractionModelMetadata()
model = automl.Model(
    display_name=display_name,
    dataset_id=dataset_id,
    text_extraction_model_metadata=metadata,
)

# Create a model with the model metadata in the region.
response = client.create_model(parent=project_location, model=model)

print("Training operation name: {}".format(response.operation.name))
print("Training started...")

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.Model;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextExtractionModelMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageEntityExtractionCreateModel {

  static void createModel() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String displayName = "YOUR_DATASET_NAME";
    createModel(projectId, datasetId, displayName);
  }

  // Create a model
  static void createModel(String projectId, String datasetId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");
      // Set model metadata.
      TextExtractionModelMetadata metadata = TextExtractionModelMetadata.newBuilder().build();
      Model model =
          Model.newBuilder()
              .setDisplayName(displayName)
              .setDatasetId(datasetId)
              .setTextExtractionModelMetadata(metadata)
              .build();

      // Create a model with the model metadata in the region.
      OperationFuture<Model, OperationMetadata> future =
          client.createModelAsync(projectLocation, model);
      // OperationFuture.get() will block until the model is created, which may take several hours.
      // You can use OperationFuture.getInitialFuture to get a future representing the initial
      // response to the request, which contains information while the operation is in progress.
      System.out.format("Training operation name: %s\n", future.getInitialFuture().get().getName());
      System.out.println("Training started...");
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const dataset_id = 'YOUR_DATASET_ID';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createModel() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    model: {
      displayName: displayName,
      datasetId: datasetId,
      textExtractionModelMetadata: {}, // Leave unset, to use the default base model
    },
  };

  // Don't wait for the LRO
  const [operation] = await client.createModel(request);
  console.log(`Training started... ${operation}`);
  console.log(`Training operation name: ${operation.name}`);
}

createModel();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// languageEntityExtractionCreateModel creates a model for text entity extraction.
func languageEntityExtractionCreateModel(w io.Writer, projectID string, location string, datasetID string, modelName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TEN123456789..."
	// modelName := "model_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.CreateModelRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Model: &automlpb.Model{
			DisplayName: modelName,
			DatasetId:   datasetID,
			ModelMetadata: &automlpb.Model_TextExtractionModelMetadata{
				TextExtractionModelMetadata: &automlpb.TextExtractionModelMetadata{},
			},
		},
	}

	op, err := client.CreateModel(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateModel: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())
	fmt.Fprintf(w, "Training started...\n")

	return nil
}

其他语言

C#: 请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Natural Language 参考文档。

PHP: 请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Natural Language 参考文档。

Ruby: 请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Natural Language 参考文档。

情感分析

REST 和命令行

在使用任何请求数据之前,请先进行以下替换:

  • project-id:您的项目 ID
  • location-id:资源的位置,全球位置为 us-central1,欧盟位置为 eu
  • dataset-id:您的数据集 ID

HTTP 方法和网址:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/models

请求 JSON 正文:

{
  "displayName": "test_model",
  "dataset_id": "dataset-id",
  "textSentimentModelMetadata": {
  }
}

如需发送您的请求,请展开以下选项之一:

您应该会看到类似如下所示的输出。可以使用操作 ID 来获取任务的状态。如需示例,请参阅获取操作状态

{
  "name": "projects/434039606874/locations/us-central1/operations/1979469554520652445",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
    "createTime": "2018-04-27T01:28:41.338120Z",
    "updateTime": "2018-04-27T01:28:41.338120Z",
    "cancellable": true
  }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# dataset_id = "YOUR_DATASET_ID"
# display_name = "YOUR_MODEL_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
# Leave model unset to use the default base model provided by Google
metadata = automl.TextSentimentModelMetadata()
model = automl.Model(
    display_name=display_name,
    dataset_id=dataset_id,
    text_sentiment_model_metadata=metadata,
)

# Create a model with the model metadata in the region.
response = client.create_model(parent=project_location, model=model)

print("Training operation name: {}".format(response.operation.name))
print("Training started...")

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.Model;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextSentimentModelMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageSentimentAnalysisCreateModel {

  static void createModel() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String displayName = "YOUR_DATASET_NAME";
    createModel(projectId, datasetId, displayName);
  }

  // Create a model
  static void createModel(String projectId, String datasetId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");
      // Set model metadata.
      System.out.println(datasetId);
      TextSentimentModelMetadata metadata = TextSentimentModelMetadata.newBuilder().build();
      Model model =
          Model.newBuilder()
              .setDisplayName(displayName)
              .setDatasetId(datasetId)
              .setTextSentimentModelMetadata(metadata)
              .build();

      // Create a model with the model metadata in the region.
      OperationFuture<Model, OperationMetadata> future =
          client.createModelAsync(projectLocation, model);
      // OperationFuture.get() will block until the model is created, which may take several hours.
      // You can use OperationFuture.getInitialFuture to get a future representing the initial
      // response to the request, which contains information while the operation is in progress.
      System.out.format("Training operation name: %s\n", future.getInitialFuture().get().getName());
      System.out.println("Training started...");
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const dataset_id = 'YOUR_DATASET_ID';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createModel() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    model: {
      displayName: displayName,
      datasetId: datasetId,
      textSentimentModelMetadata: {}, // Leave unset, to use the default base model
    },
  };

  // Don't wait for the LRO
  const [operation] = await client.createModel(request);
  console.log(`Training started... ${operation}`);
  console.log(`Training operation name: ${operation.name}`);
}

createModel();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	automlpb "google.golang.org/genproto/googleapis/cloud/automl/v1"
)

// languageSentimentAnalysisCreateModel creates a model for text sentiment analysis.
func languageSentimentAnalysisCreateModel(w io.Writer, projectID string, location string, datasetID string, modelName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TST123456789..."
	// modelName := "model_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %v", err)
	}
	defer client.Close()

	req := &automlpb.CreateModelRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Model: &automlpb.Model{
			DisplayName: modelName,
			DatasetId:   datasetID,
			ModelMetadata: &automlpb.Model_TextSentimentModelMetadata{
				TextSentimentModelMetadata: &automlpb.TextSentimentModelMetadata{},
			},
		},
	}

	op, err := client.CreateModel(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateModel: %v", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())
	fmt.Fprintf(w, "Training started...\n")

	return nil
}

其他语言

C#: 请按照客户端库页面上的 C# 设置说明操作,然后访问 .NET 版 AutoML Natural Language 参考文档。

PHP: 请按照客户端库页面上的 PHP 设置说明操作,然后访问 PHP 版 AutoML Natural Language 参考文档。

Ruby: 请按照客户端库页面上的 Ruby 设置说明操作,然后访问 Ruby 版 AutoML Natural Language 参考文档。