Diese Legacy-Version von AutoML Natural Language wurde verworfen und ist nach dem 15. März 2024 nicht mehr in Google Cloud verfügbar. Alle Funktionen der Legacy-Version von AutoML Natural Language sowie neue Features sind auf der Vertex AI-Plattform verfügbar. Informationen zum Migrieren Ihrer Ressourcen finden Sie unter Zu Vertex AI migrieren.

Diese Seite wurde von der Cloud Translation API übersetzt.

Datasets erstellen und Daten importieren

Ein Dataset enthält repräsentative Beispiele für den zu klassifizierenden Inhaltstyp und ist mit den Kategorielabels gekennzeichnet, die das benutzerdefinierte Modell verwenden soll. Das Dataset dient als Eingabe zum Trainieren eines Modells.

Die wesentlichen Schritte zum Erstellen eines Datasets sind:

Dataset-Ressource erstellen
Trainingsdaten in das Dataset importieren
Dokumente mit Labels versehen oder Entitäten identifizieren

Bei der Klassifizierung und Sentimentanalyse werden die Schritte 2 und 3 häufig kombiniert: Sie können Dokumente importieren, für die Labels bereits zugewiesen sind.

Dataset erstellen

Der erste Schritt zum Erstellen eines benutzerdefinierten Modells besteht darin, ein leeres Dataset zu erstellen, das mit den Trainingsdaten für das Modell gefüllt wird. Das neu erstellte Dataset enthält keine Daten, solange noch keine Dokumente darin importiert wurden.

Web-UI

So erstellen Sie ein Dataset:

Öffnen Sie die AutoML Natural Language UI und wählen Sie Erste Schritte in dem Feld für den Modelltyp aus, den Sie trainieren möchten.

Auf der Seite Datasets wird der Status zuvor erstellter Datasets für das aktuelle Projekt angezeigt.

Wenn Sie ein Dataset für ein anderes Projekt hinzufügen möchten, wählen Sie das Projekt in der Drop-down-Liste oben rechts in der Titelleiste aus.
Klicken Sie in der Titelleiste auf die Schaltfläche Neues Dataset.
Geben Sie einen Namen für das Dataset ein und legen Sie fest, an welchem geografischen Standort das Dataset gespeichert werden soll.

Weitere Informationen finden Sie unter Standorte.
Wählen Sie Ihr Modellziel aus, das angibt, welche Art von Analyse Sie mit dem Modell durchführen möchten, das Sie mit diesem Dataset trainieren.
- Die Klassifizierung mit einem einzigen Label weist jedem klassifizierten Dokument ein einzelnes Label zu
- Durch die Klassifizierung mit mehreren Labels können einem Dokument mehrere Labels zugewiesen werden
- Die Entitätsextraktion erkennt Entitäten in Dokumenten
- Die Sentimentanalyse ermittelt subjektive Einstellungen in Dokumenten
Klicken Sie auf Dataset erstellen.

Die Seite Importieren wird für das neue Dataset angezeigt. Weitere Informationen finden Sie unter Daten in ein Dataset importieren.

Codebeispiele

Klassifizierung

REST

Bevor Sie die Anfragedaten verwenden, ersetzen Sie die folgenden Werte:

project-id: Ihre Projekt-ID.
location-id: Der Standort für die Ressource, us-central1 für den globalen Standort oder eu für die EU

HTTP-Methode und URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets

JSON-Text der Anfrage:

{
  "displayName": "test_dataset",
  "textClassificationDatasetMetadata": {
    "classificationType": "MULTICLASS"
  }
}

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Hinweis: Der folgende Befehl setzt voraus, dass Sie sich mit Ihrem Nutzerkonto in der gcloud CLI angemeldet haben. Dazu haben Sie gcloud init oder gcloud auth login ausgeführt oder die Cloud Shell genutzt, die Sie automatisch in der gcloud CLI anmeldet. Um herauszufinden, welches Konto gerade aktiv ist, führen Sie gcloud auth list aus.

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: project-id" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets"

PowerShell (Windows)

Hinweis: Der folgende Befehl setzt voraus, dass Sie sich mit Ihrem Nutzerkonto in der gcloud-Befehlszeile angemeldet haben. Dazu führen Sie gcloud init oder gcloud auth login aus. Um herauszufinden, welches Konto gerade aktiv ist, führen Sie gcloud auth list aus.

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets" | Select-Object -Expand Content

Sie sollten in etwa folgende JSON-Antwort erhalten:

{
  "name": "projects/434039606874/locations/us-central1/datasets/356587829854924648",
  "displayName": "test_dataset",
  "createTime": "2018-04-26T18:02:59.825060Z",
  "textClassificationDatasetMetadata": {
    "classificationType": "MULTICLASS"
  }
}

Python

Informationen zum Installieren und Verwenden der Clientbibliothek für AutoML Natural Language finden Sie unter AutoML Natural Language-Clientbibliotheken. Weitere Informationen finden Sie in der Referenzdokumentation zur AutoML Natural Language Python API.

Richten Sie zur Authentifizierung bei AutoML Natural Language Standardanmeldedaten für Anwendungen ein. Weitere Informationen finden Sie unter Authentifizierung für eine lokale Entwicklungsumgebung einrichten.

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "YOUR_DATASET_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
# Specify the classification type
# Types:
# MultiLabel: Multiple labels are allowed for one example.
# MultiClass: At most one label is allowed per example.
metadata = automl.TextClassificationDatasetMetadata(
    classification_type=automl.ClassificationType.MULTICLASS
)
dataset = automl.Dataset(
    display_name=display_name,
    text_classification_dataset_metadata=metadata,
)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)

created_dataset = response.result()

# Display the dataset information
print(f"Dataset name: {created_dataset.name}")
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.ClassificationType;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextClassificationDatasetMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageTextClassificationCreateDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");

      // Specify the classification type
      // Types:
      // MultiLabel: Multiple labels are allowed for one example.
      // MultiClass: At most one label is allowed per example.
      ClassificationType classificationType = ClassificationType.MULTILABEL;

      // Specify the text classification type for the dataset.
      TextClassificationDatasetMetadata metadata =
          TextClassificationDatasetMetadata.newBuilder()
              .setClassificationType(classificationType)
              .build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setTextClassificationDatasetMetadata(metadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      textClassificationDatasetMetadata: {
        classificationType: 'MULTICLASS',
      },
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// languageTextClassificationCreateDataset creates a dataset for text classification.
func languageTextClassificationCreateDataset(w io.Writer, projectID string, location string, datasetName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_TextClassificationDatasetMetadata{
				TextClassificationDatasetMetadata: &automlpb.TextClassificationDatasetMetadata{
					// Specify the classification type:
					// - MULTILABEL: Multiple labels are allowed for one example.
					// - MULTICLASS: At most one label is allowed per example.
					ClassificationType: automlpb.ClassificationType_MULTICLASS,
				},
			},
		},
	}

	op, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %w", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	dataset, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

Weitere Sprachen

C# Folgen Sie der Anleitung zur Einrichtung von C# auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für .NET auf.

PHP: Folgen Sie der Anleitung zur Einrichtung von PHP auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für PHP auf.

Ruby: Folgen Sie der Anleitung zur Einrichtung von Ruby auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für Ruby auf.

Entitätsextraktion

REST

Bevor Sie die Anfragedaten verwenden, ersetzen Sie die folgenden Werte:

project-id: Ihre Projekt-ID.
location-id: Der Standort für die Ressource, us-central1 für den globalen Standort oder eu für die EU

HTTP-Methode und URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets

JSON-Text der Anfrage:

{
  "displayName": "test_dataset",
  "textExtractionDatasetMetadata": {
   }
}

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: project-id" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets"

PowerShell (Windows)

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets" | Select-Object -Expand Content

Sie sollten in etwa folgende JSON-Antwort erhalten:

{
  name: "projects/000000000000/locations/us-central1/datasets/TEN5582774688079151104"
  display_name: "test_dataset"
  create_time {
     seconds: 1539886451
     nanos: 757650000
   }
   text_extraction_dataset_metadata {
   }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "YOUR_DATASET_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
metadata = automl.TextExtractionDatasetMetadata()
dataset = automl.Dataset(
    display_name=display_name, text_extraction_dataset_metadata=metadata
)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)

created_dataset = response.result()

# Display the dataset information
print(f"Dataset name: {created_dataset.name}")
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextExtractionDatasetMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageEntityExtractionCreateDataset {

  static void createDataset() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");

      TextExtractionDatasetMetadata metadata = TextExtractionDatasetMetadata.newBuilder().build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setTextExtractionDatasetMetadata(metadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      textExtractionDatasetMetadata: {},
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// languageEntityExtractionCreateDataset creates a dataset for text entity extraction.
func languageEntityExtractionCreateDataset(w io.Writer, projectID string, location string, datasetName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_TextExtractionDatasetMetadata{
				TextExtractionDatasetMetadata: &automlpb.TextExtractionDatasetMetadata{},
			},
		},
	}

	op, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %w", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	dataset, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

Weitere Sprachen

C# Folgen Sie der Anleitung zur Einrichtung von C# auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für .NET auf.

PHP: Folgen Sie der Anleitung zur Einrichtung von PHP auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für PHP auf.

Ruby: Folgen Sie der Anleitung zur Einrichtung von Ruby auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für Ruby auf.

Sentimentanalyse

REST

Bevor Sie die Anfragedaten verwenden, ersetzen Sie die folgenden Werte:

project-id: Ihre Projekt-ID.
location-id: Der Standort für die Ressource, us-central1 für den globalen Standort oder eu für die EU

HTTP-Methode und URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets

JSON-Text der Anfrage:

{
  "displayName": "test_dataset",
  "textSentimentDatasetMetadata": {
    "sentimentMax": 4
  }
}

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: project-id" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets"

PowerShell (Windows)

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets" | Select-Object -Expand Content

Sie sollten in etwa folgende JSON-Antwort erhalten:

{
  name: "projects/000000000000/locations/us-central1/datasets/TST8962998974766436002"
  display_name: "test_dataset_name"
  create_time {
    seconds: 1538855662
    nanos: 51542000
  }
  text_sentiment_dataset_metadata {
    sentiment_max: 7
  }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "YOUR_DATASET_NAME"

client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"

# Each dataset requires a sentiment score with a defined sentiment_max
# value, for more information on TextSentimentDatasetMetadata, see:
# https://cloud.google.com/natural-language/automl/docs/prepare#sentiment-analysis
# https://cloud.google.com/automl/docs/reference/rpc/google.cloud.automl.v1#textsentimentdatasetmetadata
metadata = automl.TextSentimentDatasetMetadata(
    sentiment_max=4
)  # Possible max sentiment score: 1-10

dataset = automl.Dataset(
    display_name=display_name, text_sentiment_dataset_metadata=metadata
)

# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)

created_dataset = response.result()

# Display the dataset information
print(f"Dataset name: {created_dataset.name}")
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.Dataset;
import com.google.cloud.automl.v1.LocationName;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.cloud.automl.v1.TextSentimentDatasetMetadata;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

class LanguageSentimentAnalysisCreateDataset {

  static void createDataset() throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String displayName = "YOUR_DATASET_NAME";
    createDataset(projectId, displayName);
  }

  // Create a dataset
  static void createDataset(String projectId, String displayName)
      throws IOException, ExecutionException, InterruptedException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // A resource that represents Google Cloud Platform location.
      LocationName projectLocation = LocationName.of(projectId, "us-central1");
      // Specify the text classification type for the dataset.
      TextSentimentDatasetMetadata metadata =
          TextSentimentDatasetMetadata.newBuilder()
              .setSentimentMax(4) // Possible max sentiment score: 1-10
              .build();
      Dataset dataset =
          Dataset.newBuilder()
              .setDisplayName(displayName)
              .setTextSentimentDatasetMetadata(metadata)
              .build();
      OperationFuture<Dataset, OperationMetadata> future =
          client.createDatasetAsync(projectLocation, dataset);

      Dataset createdDataset = future.get();

      // Display the dataset information.
      System.out.format("Dataset name: %s\n", createdDataset.getName());
      // To get the dataset id, you have to parse it out of the `name` field. As dataset Ids are
      // required for other methods.
      // Name Form: `projects/{project_id}/locations/{location_id}/datasets/{dataset_id}`
      String[] names = createdDataset.getName().split("/");
      String datasetId = names[names.length - 1];
      System.out.format("Dataset id: %s\n", datasetId);
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const displayName = 'YOUR_DISPLAY_NAME';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function createDataset() {
  // Construct request
  const request = {
    parent: client.locationPath(projectId, location),
    dataset: {
      displayName: displayName,
      textSentimentDatasetMetadata: {
        sentimentMax: 4, // Possible max sentiment score: 1-10
      },
    },
  };

  // Create dataset
  const [operation] = await client.createDataset(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Dataset name: ${response.name}`);
  console.log(`
    Dataset id: ${
      response.name
        .split('/')
        [response.name.split('/').length - 1].split('\n')[0]
    }`);
}

createDataset();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// languageSentimentAnalysisCreateDataset creates a dataset for text sentiment analysis.
func languageSentimentAnalysisCreateDataset(w io.Writer, projectID string, location string, datasetName string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetName := "dataset_display_name"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.CreateDatasetRequest{
		Parent: fmt.Sprintf("projects/%s/locations/%s", projectID, location),
		Dataset: &automlpb.Dataset{
			DisplayName: datasetName,
			DatasetMetadata: &automlpb.Dataset_TextSentimentDatasetMetadata{
				TextSentimentDatasetMetadata: &automlpb.TextSentimentDatasetMetadata{
					SentimentMax: 4, // Possible max sentiment score: 1-10
				},
			},
		},
	}

	op, err := client.CreateDataset(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateDataset: %w", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	dataset, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	fmt.Fprintf(w, "Dataset name: %v\n", dataset.GetName())

	return nil
}

Weitere Sprachen

C# Folgen Sie der Anleitung zur Einrichtung von C# auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für .NET auf.

PHP: Folgen Sie der Anleitung zur Einrichtung von PHP auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für PHP auf.

Ruby: Folgen Sie der Anleitung zur Einrichtung von Ruby auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für Ruby auf.

Trainingsdaten in ein Dataset importieren

Nachdem Sie ein Dataset erstellt haben, können Sie Dokument-URIs und Labels für Dokumente aus einer CSV-Datei importieren, die in einem Cloud Storage-Bucket gespeichert ist. Weitere Informationen zum Vorbereiten von Daten und Erstellen einer CSV-Datei zum Importieren finden Sie unter Trainingsdaten vorbereiten.

Sie können Dokumente in ein leeres Dataset importieren oder zusätzliche Dokumente in ein vorhandenes Dataset importieren.

Web-UI

So importieren Sie Dokumente in ein Dataset:

Wählen Sie auf der Seite Datasets das Dataset aus, in das Sie Dokumente importieren möchten.
Geben Sie auf dem Tab Import an, wo die Trainingsdokumente zu finden sind.

Sie haben folgende Möglichkeiten:
- Laden Sie eine CSV-Datei mit den Trainingsdokumenten und den zugehörigen Kategorielabels von Ihrem lokalen Computer oder aus dem Cloud Storage hoch.
- Sie können eine Sammlung von TXT-, PDF-, TIF- oder ZIP-Dateien mit den Trainingsdokumenten von Ihrem lokalen Computer hochladen.
Wählen Sie die zu importierenden Dateien und den Cloud Storage-Pfad für die importierten Dokumente aus.
Klicken Sie auf Importieren.

Codebeispiele

REST

Bevor Sie die Anfragedaten verwenden, ersetzen Sie die folgenden Werte:

project-id: Ihre Projekt-ID.
location-id: Der Standort für die Ressource, us-central1 für den globalen Standort oder eu für die EU
dataset-id: Ihre Dataset-ID
bucket-name: Ihr Cloud Storage-Bucket
csv-file-name: Ihre CSV-Trainingsdatendatei

HTTP-Methode und URL:

POST https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets/dataset-id:importData

JSON-Text der Anfrage:

{
  "inputConfig": {
    "gcsSource": {
      "inputUris": ["gs://bucket-name/csv-file-name.csv"]
      }
  }
}

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: project-id" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets/dataset-id:importData"

PowerShell (Windows)

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "project-id" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1/projects/project-id/locations/location-id/datasets/dataset-id:importData" | Select-Object -Expand Content

Die Ausgabe sieht in etwa so aus: Sie können den Status der Aufgabe anhand der Vorgangs-ID abrufen. Ein Beispiel finden Sie unter Status eines Vorgangs abrufen.

{
  "name": "projects/434039606874/locations/us-central1/operations/1979469554520650937",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
    "createTime": "2018-04-27T01:28:36.128120Z",
    "updateTime": "2018-04-27T01:28:36.128150Z",
    "cancellable": true
  }
}

Python

from google.cloud import automl

# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# dataset_id = "YOUR_DATASET_ID"
# path = "gs://YOUR_BUCKET_ID/path/to/data.csv"

client = automl.AutoMlClient()
# Get the full path of the dataset.
dataset_full_id = client.dataset_path(project_id, "us-central1", dataset_id)
# Get the multiple Google Cloud Storage URIs
input_uris = path.split(",")
gcs_source = automl.GcsSource(input_uris=input_uris)
input_config = automl.InputConfig(gcs_source=gcs_source)
# Import data from the input URI
response = client.import_data(name=dataset_full_id, input_config=input_config)

print("Processing import...")
print(f"Data imported. {response.result()}")

Java

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.automl.v1.AutoMlClient;
import com.google.cloud.automl.v1.DatasetName;
import com.google.cloud.automl.v1.GcsSource;
import com.google.cloud.automl.v1.InputConfig;
import com.google.cloud.automl.v1.OperationMetadata;
import com.google.protobuf.Empty;
import java.io.IOException;
import java.util.Arrays;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

class ImportDataset {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "YOUR_PROJECT_ID";
    String datasetId = "YOUR_DATASET_ID";
    String path = "gs://BUCKET_ID/path_to_training_data.csv";
    importDataset(projectId, datasetId, path);
  }

  // Import a dataset
  static void importDataset(String projectId, String datasetId, String path)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (AutoMlClient client = AutoMlClient.create()) {
      // Get the complete path of the dataset.
      DatasetName datasetFullId = DatasetName.of(projectId, "us-central1", datasetId);

      // Get multiple Google Cloud Storage URIs to import data from
      GcsSource gcsSource =
          GcsSource.newBuilder().addAllInputUris(Arrays.asList(path.split(","))).build();

      // Import data from the input URI
      InputConfig inputConfig = InputConfig.newBuilder().setGcsSource(gcsSource).build();
      System.out.println("Processing import...");

      // Start the import job
      OperationFuture<Empty, OperationMetadata> operation =
          client.importDataAsync(datasetFullId, inputConfig);

      System.out.format("Operation name: %s%n", operation.getName());

      // If you want to wait for the operation to finish, adjust the timeout appropriately. The
      // operation will still run if you choose not to wait for it to complete. You can check the
      // status of your operation using the operation's name.
      Empty response = operation.get(45, TimeUnit.MINUTES);
      System.out.format("Dataset imported. %s%n", response);
    } catch (TimeoutException e) {
      System.out.println("The operation's polling period was not long enough.");
      System.out.println("You can use the Operation's name to get the current status.");
      System.out.println("The import job is still running and will complete as expected.");
      throw e;
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'us-central1';
// const datasetId = 'YOUR_DISPLAY_ID';
// const path = 'gs://BUCKET_ID/path_to_training_data.csv';

// Imports the Google Cloud AutoML library
const {AutoMlClient} = require('@google-cloud/automl').v1;

// Instantiates a client
const client = new AutoMlClient();

async function importDataset() {
  // Construct request
  const request = {
    name: client.datasetPath(projectId, location, datasetId),
    inputConfig: {
      gcsSource: {
        inputUris: path.split(','),
      },
    },
  };

  // Import dataset
  console.log('Proccessing import');
  const [operation] = await client.importData(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();
  console.log(`Dataset imported: ${response}`);
}

importDataset();

Go

import (
	"context"
	"fmt"
	"io"

	automl "cloud.google.com/go/automl/apiv1"
	"cloud.google.com/go/automl/apiv1/automlpb"
)

// importDataIntoDataset imports data into a dataset.
func importDataIntoDataset(w io.Writer, projectID string, location string, datasetID string, inputURI string) error {
	// projectID := "my-project-id"
	// location := "us-central1"
	// datasetID := "TRL123456789..."
	// inputURI := "gs://BUCKET_ID/path_to_training_data.csv"

	ctx := context.Background()
	client, err := automl.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("NewClient: %w", err)
	}
	defer client.Close()

	req := &automlpb.ImportDataRequest{
		Name: fmt.Sprintf("projects/%s/locations/%s/datasets/%s", projectID, location, datasetID),
		InputConfig: &automlpb.InputConfig{
			Source: &automlpb.InputConfig_GcsSource{
				GcsSource: &automlpb.GcsSource{
					InputUris: []string{inputURI},
				},
			},
		},
	}

	op, err := client.ImportData(ctx, req)
	if err != nil {
		return fmt.Errorf("ImportData: %w", err)
	}
	fmt.Fprintf(w, "Processing operation name: %q\n", op.Name())

	if err := op.Wait(ctx); err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	fmt.Fprintf(w, "Data imported.\n")

	return nil
}

Weitere Sprachen

C# Folgen Sie der Anleitung zur Einrichtung von C# auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für .NET auf.

PHP: Folgen Sie der Anleitung zur Einrichtung von PHP auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für PHP auf.

Ruby: Folgen Sie der Anleitung zur Einrichtung von Ruby auf der Seite der Clientbibliotheken und rufen Sie dann die AutoML Natural Language-Referenzdokumentation für Ruby auf.

Trainingsdokumente mit Labels versehen

Damit die einzelnen Dokumente eines Datasets beim Training eines Modells von Nutzen sind, müssen diesen Dokumenten Labels so zugewiesen werden, wie auch AutoML Natural Language ähnliche Dokumente mit Labels versehen soll. Die Qualität der Trainingsdaten beeinflusst stark die Effizienz des von Ihnen erstellten Modells und folglich die Qualität der damit generierten Vorhersagen. AutoML Natural Language ignoriert während des Trainings Dokumente ohne Labels.

Sie können Labels für Ihre Trainingsdokumente auf drei verschiedene Arten angeben:

Fügen Sie Labels in die CSV-Datei ein (nur bei der Klassifizierung und Sentimentanalyse).
Weisen Sie Ihren Dokumenten Labels in der AutoML Natural Language UI zu
Anfordern von Labeling durch Menschen über den AI Platform Data Labeling Service

Die AutoML API enthält keine Methoden für das Labeling.

Weitere Informationen zum Labeling von Dokumenten in einer CSV-Datei finden Sie unter Trainingsdaten vorbereiten.

Labeling bei Klassifizierung und Sentimentanalyse

Wählen Sie auf der Seite mit der Dataset-Liste das Dataset aus, um Dokumente in der AutoML Natural Language UI mit einem Label zu versehen und Details zum Dataset anzeigen zu lassen. Der Anzeigename des ausgewählten Datasets wird in der Titelleiste angezeigt. Die einzelnen Dokumente im Dataset werden zusammen mit ihren Labels aufgelistet. Die Navigationsleiste auf der linken Seite fasst die Anzahl der Dokumente mit und ohne Label zusammen und ermöglicht, die Liste der Dokumente nach Label oder Sentimentwert zu filtern.

Text items page

Wählen Sie die Dokumente aus, die Sie aktualisieren möchten, und die Labels oder Werte, die Sie ihnen zuweisen möchten, um Labels oder Sentimentwerte Elementen ohne Labels zuzuweisen oder Labels für Dokumente zu ändern. Es gibt zwei Möglichkeiten, das Label eines Dokuments zu aktualisieren:

Klicken Sie auf das Kästchen neben den Dokumenten, die Sie aktualisieren möchten, und wählen Sie die anzuwendenden Labels aus der Drop-down-Liste Label aus, die oben in der Liste der Dokumente angezeigt wird.
Klicken Sie auf die Zeile des Dokuments, das Sie aktualisieren möchten, und wählen Sie die anzuwendenden Labels oder Werte aus der Liste auf der Seite Textdetail aus.

Entitäten für die Entitätsextraktion identifizieren

Vor dem Training Ihres benutzerdefinierten Modells müssen Sie die Trainingsdokumente im Dataset annotieren. Sie können Trainingsdokumente annotieren, bevor Sie sie importieren, oder Sie können Annotationen in der AutoML Natural Language UI hinzufügen.

Wählen Sie auf der Seite mit der Dataset-Liste das Dataset aus, um in der AutoML Natural Language UI Annotationen hinzuzufügen und Details zum Dataset anzeigen zu lassen. Der Anzeigename des ausgewählten Datasets wird in der Titelleiste angezeigt. Die einzelnen Dokumente im Dataset werden zusammen mit ihren Annotationen aufgelistet. Die Navigationsleiste auf der linken Seite fasst die Labels und die Häufigkeit der einzelnen Labels zusammen. Sie können die Liste der Dokumente auch nach Label filtern.

Annotationsliste

Wenn Sie Annotationen in einem Dokument hinzufügen oder löschen möchten, doppelklicken Sie auf das Dokument, das Sie aktualisieren möchten. Die Seite Bearbeiten zeigt den vollständigen Text des ausgewählten Dokuments an, wobei alle vorherigen Annotationen hervorgehoben sind.

Entitäts-Editor

Bei PDF-Trainingsdokumenten oder Dokumenten, die mit Layoutinformationen importiert wurden, hat die Seite Bearbeiten zwei Tabs: Nur Text und Strukturierter Text. Der Tab Nur Text zeigt den Rohinhalt des Trainingsdokuments ohne Formatierung an. Der Tab Strukturierter Text erstellt das grundlegende Layout des Trainingsdokuments neu. (Der Tab Nur Text enthält auch einen Link zur ursprünglichen PDF-Datei.)

Editor für strukturierten Text

Um eine neue Anmerkung hinzuzufügen, markieren Sie den Text, der die Entität darstellt, wählen Sie das Label aus dem Dialogfeld Annotation und klicken Sie auf Speichern. Wenn Sie Anmerkungen auf dem Tab Strukturierter Text hinzufügen, erfasst AutoML Natural Language die Position der Anmerkung auf der Seite als einen Faktor, der während des Trainings berücksichtigt wird.

Anmerkung hinzufügen

Um eine Anmerkung zu entfernen, suchen Sie den Text in der Liste der Labels auf der rechten Seite und klicken Sie daneben auf das Papierkorbsymbol.