The AutoML Entity Extraction for Healthcare provides a starting point for you to train custom Healthcare Natural Language models. After you have trained a model, you can request predictions from the model. A prediction occurs when you submit medical text to the model for entity extraction.
The AutoML supports the following prediction modes:
- Online prediction, where you submit a single document and the model returns the analysis synchronously.
- Batch prediction, where you submit a collection of documents that the model analyzes asynchronously.
Enable the AutoML API
Before you training a model using the AutoML Entity Extraction for Healthcare, you must enable the AutoML API for your Google Cloud project.
To enable AutoML API, complete the following steps:
-
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
-
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
-
Set up authentication:
-
In the Cloud Console, go to the Create service account key page.
Go to the Create Service Account Key page - From the Service account list, select New service account.
- In the Service account name field, enter a name.
From the Role list, select Project > Owner.
- Click Create. A JSON file that contains your key downloads to your computer.
-
-
Set the environment variable
GOOGLE_APPLICATION_CREDENTIALS
to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again. - Enable the AutoML Natural Language API.
- Install and initialize the Cloud SDK.
Set up permissions
To train custom models that use AutoML Entity Extraction for Healthcare as a
base model, you must use a service account that has the
healthcare.nlpservce.analyzeEntities
permission,
which is included in the healthcare.nlpServiceViewer
role.
To assign this role, run the
gcloud projects add-iam-policy-binding
command:
gcloud projects add-iam-policy-binding PROJECT_ID --member serviceAccount:SERVICE_ACCOUNT_ID --role roles/healthcare.nlpServiceViewer
Training a model
Training a model using the AutoML UI
To train a model using the AutoML UI, complete the following steps:
Open the AutoML Natural Language UI and then click Get started under AutoML Entity Extraction.
The Datasets page appears, showing the status of any previously created datasets for the current project. To train using a dataset for a different project, select the project from the list in the upper right of the title bar.
Create a dataset or select the dataset that you want to use to train the custom model.
The display name of the selected dataset appears in the title bar, and the page lists the individual documents in the dataset along with their labels.
Import a CSV file that lists a dataset of text or documents in a structured JSONL format.
After you have reviewed the dataset, click the Train tab under the title bar.
If you are training the first model from this dataset, the training page provides a basic analysis of the dataset and advises you if it is adequate for training. If AutoML Natural Language suggests changes, consider returning to the Text items page and adding documents or labels.
If you have trained other models from this dataset, the training page displays the basic evaluation metrics for those models.
Click Start Training.
Enter a name for the model.
The model name can be up to 32 characters and contain only letters, numbers, and underscores. The first character must be a letter.
If you want to deploy the model automatically, select the Deploy model after training finishes option.
Select the Enable Healthcare Entity Extraction option.
Click Start Training.
Training can take several hours. After your model is trained, you receive an email notification.
Training a model using the AutoML API
To train a model the AutoML API, use the projects.locations.models.create method.
Save the request body below to a file named
request.json
. Provide the following information in the request:DISPLAY_NAME
, a display name for the modelDATASET_ID
the dataset ID
{ "displayName": "DISPLAY_NAME", "dataset_id": "DATASET_ID", "textExtractionModelMetadata": { "model_hint": "healthcare" } }
Run the
projects.locations.models.create
command.curl
To make the POST request using
curl
, run the following command:curl -X POST \ -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \ https://automl.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models
PowerShell
To make the POST request using Windows PowerShell, run the following command:
$cred = gcloud auth application-default print-access-token $headers = @{ "Authorization" = "Bearer $cred" } Invoke-WebRequest ` -Method POST ` -Headers $headers ` -ContentType: "application/json; charset=utf-8" ` -InFile request.json ` -Uri "https://automl.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models" | Select-Object -Expand Content
The output of the command should be similar to the following sample. You can use the operation ID to get the status of the task. For more information, see Getting the status of an operation.
{ "name": "projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata", "createTime": "CREATE_TIME", "updateTime": "UPDATE_TIME", "cancellable": true } }
Making predictions
Making predictions using the AutoML Natural Language UI
You can use AutoML Entity Extraction for Healthcare to make predictions on files in Cloud Storage or text you enter in the AutoML Natural Language UI.
To make a prediction using the AutoML Natural Language UI, complete the following steps:
Open the AutoML Natural Language UI and then click Models.
Click the row for the model that you want to use to analyze the document.
Click the Test & Use tab below the title bar.
Click Select a file on Cloud Storage and then enter the Cloud Storage path for a PDF file, or click Input text below and then enter medical text to use for prediction.
Click Predict.
Making predictions using the batchPredict
method
To use your model to do high-throughput asynchronous prediction on a corpus of
documents, you can use the
batchPredict
method. To use batch prediction method, you specify input and output URIs that
point to locations in Cloud Storage buckets.
The input URI points to a JSONL file that specifies the content to analyze. The output specifies a location where the AutoML saves results from the batch prediction.
To make predictions using the batchPredict
method, complete the following
steps:
Create a JSONL file that contains the content to analyze, either inline or as links to files that are stored in a Cloud Storage bucket.
The following sample shows inline content that is included in the JSONL file, with each item including the required unique ID.
{ "id": "0", "text_snippet": { "content": "Insulin regimen human 5 units IV administered.." } } { "id": "1", "text_snippet": { "content": "Blood pressure is normal." } } ... { "id": "n", "text_snippet": { "content": "Pulse: 80. BP: 110/70. Respirations: 16. Temp: 97.4." } }
The following sample shows a JSONL file that contains links to input files, which must be in Cloud Storage buckets:
{ "document": { "input_config": { "gcs_source": { "input_uris": [ "gs://FOLDER/FILENAME1" ] } } } } { "document": { "input_config": { "gcs_source": { "input_uris": [ "gs://FOLDER/FILENAME2" ] } } } } ...
Create a JSON file that specifies the location of the JSONL input file and the output directory in a Cloud Storage bucket.
{ "input_config": { "gcs_source": { "input_uris": [ "gs://JSONL_FILE_LOCATION"] } }, "output_config": { "gcs_destination": { "output_uri_prefix": "gs://OUTPUT_DIR" } } }
To make predictions, run the
batchPredict
method:curl
In the
batchPredict
command, make the following substitutions:- Replace
REQUEST_FILENAME
with the location of your request JSON file. - Replace
PROJECT_ID/locations/REGION/models/MODEL_ID
with the fully-qualified name of your model. To find the model ID, go to the Models page in the AutoML UI.
The following sample shows a POST request using
curl
:curl -X POST \ -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @REQUEST_FILENAME \ https://automl.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID:batchPredict
The response to the command is similar to the following sample:
{ "name": "projects/824236087934/locations/REGION/operations/MODEL_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata", "createTime": "CREATE_TIME", "updateTime": "UPDATE_TIME", "batchPredictDetails": { "inputConfig": { "gcsSource": { "inputUris": [ "gs://INPUT_URI" ] } } } } }
To check if the prediction is complete, run the following command:
curl -X GET \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json" \ https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID
In the response, which will be similar to the following sample, look for
¨done¨: true
to confirm that the operation is complete:{ "name": "projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata", "createTime": "CREATE_TIME", "updateTime": "UPDATE_TIME", "batchPredictDetails": { "inputConfig": { "gcsSource": { "inputUris": [ "gs://JSONL_FILE_LOCATION" ] } }, "outputInfo": { "gcsOutputDirectory": "gs://OUTPUT_DIRPREDICTION_FILENAME" } } }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.automl.v1beta1.BatchPredictResult" } }
In the output location you specified, a JSONL file contains the results of the predictions.
PowerShell
In the
batchPredict
command, make the following substitutions:- Replace
REQUEST_FILENAME
with the location where you stored your request JSON file. - Replace
PROJECT_ID/locations/REGION/models/MODEL_ID
with the fully-qualified name of your model. To find the model ID, go to the Models page in the AutoML UI.
The following sample shows a POST request using Windows PowerShell:
$cred = gcloud auth application-default print-access-token $headers = @{ "Authorization" = "Bearer $cred" } Invoke-WebRequest ` -Method POST ` -Headers $headers ` -ContentType: "application/json; charset=utf-8" ` -InFile REQUEST_FILENAME ` -Uri "https://automl.googleapis.com/v1beta1/projects/PROJECT_ID/locations/REGION/models/MODEL_ID:batchPredict" | Select-Object -Expand Content
The response to the command is similar to the following sample:
{ "name": "projects/824236087934/locations/REGION/operations/MODEL_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata", "createTime": "CREATE_TIME", "updateTime": "UPDATE_TIME", "batchPredictDetails": { "inputConfig": { "gcsSource": { "inputUris": [ "gs://INPUT_URI" ] } } } } }
To check if the prediction is complete, run the following command:
curl -X GET \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json" \ https://automl.googleapis.com/v1/projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID
In the response, which will be similar to the following sample, look for
¨done¨: true
to confirm that the operation is complete:{ "name": "projects/PROJECT_ID/locations/REGION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.automl.v1beta1.OperationMetadata", "createTime": "CREATE_TIME", "updateTime": "UPDATE_TIME", "batchPredictDetails": { "inputConfig": { "gcsSource": { "inputUris": [ "gs://JSONL_FILE_LOCATION" ] } }, "outputInfo": { "gcsOutputDirectory": "gs://OUTPUT_DIRPREDICTION_FILENAME" } } }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.automl.v1beta1.BatchPredictResult" } }
In the output location you specified, a JSONL file contains the results of the predictions.
- Replace