Quickstart

This quickstart shows you how to use AutoML Natural Language Entity Extraction to create a custom machine learning model for identifying entities in text content. It trains a custom model using a corpus of biomedical research abstracts that mention hundreds of diseases and concepts. The resulting model identifies these medical entities in further texts. The sample training data is hosted in a public Google Cloud Storage bucket, at gs://cloud-ml-data/NL-entity/dataset.csv.

This dataset is in the public domain as a "United States Government Work" under the terms of the United States Copyright Act.

Set up your project

Before you can use AutoML Natural Language Entity Extraction, you must create a Google Cloud project and enable AutoML Natural Language Entity Extraction for that project.

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the GCP Console, go to the Manage resources page and select or create a project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your Google Cloud Platform project.

    Learn how to enable billing

  4. Enable the Cloud AutoML and Storage APIs.

    Enable the APIs

Create a dataset

  1. Open the AutoML Natural Language Entity Extraction UI and select your project from the drop-down list in the title bar.

    The first time you open the AutoML Natural Language Entity Extraction UI, you'll need to read and accept the terms of use for the EAP program.

  2. Click the New Dataset button in the title bar.

  3. On the Create dataset page, enter a name for the dataset then click Create dataset.

  4. Enter the location of the training data to import.

    In the Select a CSV file on Cloud Storage text box, enter the path for the sample CSV file:

    cloud-ml-data/NL-entity/dataset.csv
    

    (The gs:// prefix is added automatically.) Alternatively, you can click Browse and navigate to the CSV file.

    This quickstart uses sample data staged in a public Google Cloud Storage bucket. The training data is JSONL files that contain sample text documents annotated with the location of the entities you want the model to learn to extract. To import the training data into the dataset, you use a CSV file that points to the JSONL files; see Preparing your training data for information about the format.

  5. Click Import.

    You're returned to the Datasets page; your dataset will show a Status of Running:importing data while your documents are being imported. This process should take just a few minutes.

When your training data has been successfully imported, the Status column says Success:importing data and the UI shows the generated ID for the dataset (used when making AutoML API calls) as well as the number of items imported.

Train your model

  1. From the Datasets listing page, click the dataset name.

  2. Click Start Training.

  3. Enter a name for your custom model or accept the default name.

  4. Click the 'Deploy model after training finishes' check box to deploy your model automatically when training is complete.

  5. Click Start Training.

Training a model can take several hours to complete. Typical training time using the example dataset is about three hours. After the model is successfully trained, you will receive a message at the email address you used to sign up for the program.

Evaluate the custom model

After training a model, AutoML Natural Language Entity Extraction evaluates the quality and accuracy of the new model. To see the evaluation metrics for your model:

  1. Open the AutoML Natural Language Entity Extraction UI and click the Models tab (with lightbulb icon) in the left navigation bar.

  2. Click the name of the model you want to evaluate.

  3. If necessary, click the Evaluate tab just below the title bar.

    If training has been completed for the model, AutoML Natural Language Entity Extraction shows its evaluation metrics. It provides precision and recall scores for the model as a whole and for each extracted entity (referred to as a label in the UI). To see the metrics for a particular entity, choose the entity from the filter list of labels.

    Evaluation page

Precision and recall measure how well the model is capturing information, and how much it's leaving out. Precision indicates, from all the items identified as a particular entity, how many actually were correctly identified. Recall indicates, from all the items that should have been identified as a particular entity, how many were actually identified.

Use this data to evaluate your model's readiness. Low precision or recall scores can indicate that your model needs additional training data. Perfect precision and recall can indicate that the data is too easy and may not generalize well.

Use the custom model

After your model has been successfully trained, you can use it to extract entities using your custom model. Click the Test and Use tab, enter text into the text box, and click Predict. AutoML Natural Language Entity Extraction analyzes the text using your model and displays the annotations.

Prediction results

Clean up

To avoid unnecessary Google Cloud Platform charges, use the GCP Console to delete your project if you do not need it.

Was this page helpful? Let us know how we did:

Send feedback about...

AutoML Natural Language Entity Extraction