Training entity extraction models for healthcare

The AutoML Entity Extraction for Healthcare provides a starting point for you to train custom Healthcare Natural Language models. After you have trained a model, you can request predictions from the model. A prediction occurs when you submit medical text to the model for entity extraction.

The AutoML supports the following prediction modes:

  • Online prediction, where you submit a single document and the model returns the analysis synchronously.
  • Batch prediction, where you submit a collection of documents that the model analyzes asynchronously.

Enable the AutoML API

Before you train a model using the AutoML Entity Extraction for Healthcare, you must enable the AutoML API for your Google Cloud project.

To enable AutoML API, complete the following steps:

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Create a service account:

    1. In the Google Cloud console, go to the Create service account page.

      Go to Create service account
    2. Select your project.
    3. In the Service account name field, enter a name. The Google Cloud console fills in the Service account ID field based on this name.

      In the Service account description field, enter a description. For example, Service account for quickstart.

    4. Click Create and continue.
    5. Grant the Project > Owner role to the service account.

      To grant the role, find the Select a role list, then select Project > Owner.

    6. Click Continue.
    7. Click Done to finish creating the service account.

      Do not close your browser window. You will use it in the next step.

  5. Create a service account key:

    1. In the Google Cloud console, click the email address for the service account that you created.
    2. Click Keys.
    3. Click Add key, and then click Create new key.
    4. Click Create. A JSON key file is downloaded to your computer.
    5. Click Close.
  6. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your credentials. This variable applies only to your current shell session, so if you open a new session, set the variable again.

  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  8. Make sure that billing is enabled for your Google Cloud project.

  9. Create a service account:

    1. In the Google Cloud console, go to the Create service account page.

      Go to Create service account
    2. Select your project.
    3. In the Service account name field, enter a name. The Google Cloud console fills in the Service account ID field based on this name.

      In the Service account description field, enter a description. For example, Service account for quickstart.

    4. Click Create and continue.
    5. Grant the Project > Owner role to the service account.

      To grant the role, find the Select a role list, then select Project > Owner.

    6. Click Continue.
    7. Click Done to finish creating the service account.

      Do not close your browser window. You will use it in the next step.

  10. Create a service account key:

    1. In the Google Cloud console, click the email address for the service account that you created.
    2. Click Keys.
    3. Click Add key, and then click Create new key.
    4. Click Create. A JSON key file is downloaded to your computer.
    5. Click Close.
  11. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your credentials. This variable applies only to your current shell session, so if you open a new session, set the variable again.

  12. Enable the AutoML Natural Language API.

    Enable the API

  13. Install the Google Cloud CLI.
  14. To initialize the gcloud CLI, run the following command:

    gcloud init

Set up permissions

To train custom models that use AutoML Entity Extraction for Healthcare as a base model, you must use a service account that has the healthcare.nlpservce.analyzeEntities permission, which is included in the healthcare.nlpServiceViewer role.

To assign this role, run the gcloud projects add-iam-policy-binding command:

gcloud projects add-iam-policy-binding PROJECT_ID --member serviceAccount:SERVICE_ACCOUNT_ID --role roles/healthcare.nlpServiceViewer

Training a model

Training a model using the AutoML UI

To train a model using the AutoML UI, complete the following steps:

  1. Open the AutoML Natural Language UI and then click Get started under AutoML Entity Extraction.

    The Datasets page appears, showing the status of any previously created datasets for the current project. To train using a dataset for a different project, select the project from the list in the upper right of the title bar.

  2. Create a dataset or select the dataset that you want to use to train the custom model.

    The display name of the selected dataset appears in the title bar, and the page lists the individual documents in the dataset along with their labels.

  3. Import a CSV file that lists a dataset of text or documents in a structured JSONL format.

  4. After you have reviewed the dataset, click the Train tab under the title bar.

    If you are training the first model from this dataset, the training page provides a basic analysis of the dataset and advises you if it is adequate for training. If AutoML Natural Language suggests changes, consider returning to the Text items page and adding documents or labels.

    If you have trained other models from this dataset, the training page displays the basic evaluation metrics for those models.

  5. Click Start Training.

  6. Enter a name for the model.

    The model name can be up to 32 characters and contain only letters, numbers, and underscores. The first character must be a letter.

  7. If you want to deploy the model automatically, select the Deploy model after training finishes option.

  8. Select the Enable Healthcare Entity Extraction option.

  9. Click Start Training.

Training can take several hours. After your model is trained, you receive an email notification.

Training a model using the AutoML API

To train a model the AutoML API, use the projects.locations.models.create method.

  1. Save the request body below to a file named request.json. Provide the following information in the request:

    • DISPLAY_NAME, a display name for the model
    • DATASET_ID the dataset ID
    {
    "displayName": "DISPLAY_NAME",
    "dataset_id": "DATASET_ID",
    "textExtractionModelMetadata": {
       "model_hint": "healthcare"
    }
    }
    
  2. Run the projects.locations.models.create command.

    curl

    To make the POST request using curl, run the following command:

    curl -X POST \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    https://automl.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models
    

    PowerShell

    To make the POST request using Windows PowerShell, run the following command:

    $cred = gcloud auth application-default print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }
    
    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://automl.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models" | Select-Object -Expand Content
    

    The output of the command should be similar to the following sample. You can use the operation ID to get the status of the task. For more information, see Getting the status of an operation.

    {
      "name": "projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
        "createTime": "CREATE_TIME",
        "updateTime": "UPDATE_TIME",
        "cancellable": true
      }
    }
    

Making predictions

Making predictions using the AutoML Natural Language UI

You can use AutoML Entity Extraction for Healthcare to make predictions on files in Cloud Storage or text you enter in the AutoML Natural Language UI.

To make a prediction using the AutoML Natural Language UI, complete the following steps:

  1. Open the AutoML Natural Language UI and then click Models.

  2. Click the row for the model that you want to use to analyze the document.

  3. Click the Test & Use tab below the title bar.

  4. Click Select a file on Cloud Storage and then enter the Cloud Storage path for a PDF file, or click Input text below and then enter medical text to use for prediction.

  5. Click Predict.

Making predictions using the batchPredict method

To use your model to do high-throughput asynchronous prediction on a corpus of documents, you can use the batchPredict method. To use batch prediction method, you specify input and output URIs that point to locations in Cloud Storage buckets.

The input URI points to a JSONL file that specifies the content to analyze. The output specifies a location where the AutoML saves results from the batch prediction.

To make predictions using the batchPredict method, complete the following steps:

  1. Create a JSONL file that contains the content to analyze, either inline or as links to files that are stored in a Cloud Storage bucket.

    The following sample shows inline content that is included in the JSONL file, with each item including the required unique ID.

    { "id": "0", "text_snippet": { "content": "Insulin regimen human 5 units IV administered.." } }
    { "id": "1", "text_snippet": { "content": "Blood pressure is normal." } }
    ...
    { "id": "n", "text_snippet": { "content": "Pulse: 80. BP: 110/70. Respirations: 16. Temp: 97.4." } }
    

    The following sample shows a JSONL file that contains links to input files, which must be in Cloud Storage buckets:

    { "document": { "input_config": { "gcs_source": { "input_uris": [ "gs://FOLDER/FILENAME1" ] } } } }
    { "document": { "input_config": { "gcs_source": { "input_uris": [ "gs://FOLDER/FILENAME2" ] } } } }
    ...
    
  2. Create a JSON file that specifies the location of the JSONL input file and the output directory in a Cloud Storage bucket.

    {
    "input_config": { "gcs_source": { "input_uris": [ "gs://JSONL_FILE_LOCATION"] } },
    "output_config": { "gcs_destination": { "output_uri_prefix": "gs://OUTPUT_DIR" } }
    }
    
  3. To make predictions, run the batchPredict method:

    curl

    In the batchPredict command, make the following substitutions:

    • Replace REQUEST_FILENAME with the location of your request JSON file.
    • Replace PROJECT_ID/locations/REGION/models/MODEL_ID with the fully-qualified name of your model. To find the model ID, go to the Models page in the AutoML UI.

    The following sample shows a POST request using curl:

    curl -X POST \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @REQUEST_FILENAME \
    https://automl.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID:batchPredict
    

    The response to the command is similar to the following sample:

    {
    "name": "projects/824236087934/locations/REGION/operations/MODEL_ID",
    "metadata": {
      "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
      "createTime": "CREATE_TIME",
      "updateTime": "UPDATE_TIME",
      "batchPredictDetails": {
        "inputConfig": {
          "gcsSource": {
            "inputUris": [
              "gs://INPUT_URI"
            ]
          }
        }
      }
    }
    }
    

    To check if the prediction is complete, run the following command:

    curl -X GET \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json" \
     https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID
    

    In the response, which will be similar to the following sample, look for ¨done¨: true to confirm that the operation is complete:

    {
    "name": "projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID",
    "metadata": {
      "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
      "createTime": "CREATE_TIME",
      "updateTime": "UPDATE_TIME",
      "batchPredictDetails": {
        "inputConfig": {
          "gcsSource": {
            "inputUris": [
              "gs://JSONL_FILE_LOCATION"
            ]
          }
        },
        "outputInfo": {
          "gcsOutputDirectory": "gs://OUTPUT_DIRPREDICTION_FILENAME"
        }
      }
    },
    "done": true,
    "response": {
      "@type": "type.googleapis.com/google.cloud.automl.v1beta1.BatchPredictResult"
    }
    }
    

    In the output location you specified, a JSONL file contains the results of the predictions.

    PowerShell

    In the batchPredict command, make the following substitutions:

    • Replace REQUEST_FILENAME with the location where you stored your request JSON file.
    • Replace PROJECT_ID/locations/REGION/models/MODEL_ID with the fully-qualified name of your model. To find the model ID, go to the Models page in the AutoML UI.

    The following sample shows a POST request using Windows PowerShell:

    $cred = gcloud auth application-default print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }
    
    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile REQUEST_FILENAME `
    -Uri "https://automl.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID:batchPredict" | Select-Object -Expand Content
    

    The response to the command is similar to the following sample:

    {
    "name": "projects/824236087934/locations/REGION/operations/MODEL_ID",
    "metadata": {
      "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
      "createTime": "CREATE_TIME",
      "updateTime": "UPDATE_TIME",
      "batchPredictDetails": {
        "inputConfig": {
          "gcsSource": {
            "inputUris": [
              "gs://INPUT_URI"
            ]
          }
        }
      }
    }
    }
    

    To check if the prediction is complete, run the following command:

    curl -X GET \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json" \
     https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID
    

    In the response, which will be similar to the following sample, look for ¨done¨: true to confirm that the operation is complete:

    {
    "name": "projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID",
    "metadata": {
      "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata",
      "createTime": "CREATE_TIME",
      "updateTime": "UPDATE_TIME",
      "batchPredictDetails": {
        "inputConfig": {
          "gcsSource": {
            "inputUris": [
              "gs://JSONL_FILE_LOCATION"
            ]
          }
        },
        "outputInfo": {
          "gcsOutputDirectory": "gs://OUTPUT_DIRPREDICTION_FILENAME"
        }
      }
    },
    "done": true,
    "response": {
      "@type": "type.googleapis.com/google.cloud.automl.v1beta1.BatchPredictResult"
    }
    }
    

    In the output location you specified, a JSONL file contains the results of the predictions.