This tutorial demonstrates how to create a custom model for classifying content using AutoML Natural Language. The application trains a custom model using a corpus of crowd-sourced "happy moments" from the Kaggle open-source dataset HappyDB. The resulting model classifies happy moments into categories reflecting the causes of happiness.
The data is made available through a Creative Commons CCO: Public Domain license.
The tutorial covers training the custom model, evaluating its performance, and classifying new content.
Prerequisites
Configure your project environment
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the AutoML Natural Language APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the AutoML Natural Language APIs.
- Install the Google Cloud CLI.
- Follow the instructions to create a service account and download a key file.
- Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path to the service account key file that you downloaded when you created the service account. For example:export GOOGLE_APPLICATION_CREDENTIALS=key-file
- Add your new service account to the AutoML Editor IAM role with the
following commands. Replace project-id with the name of your
Google Cloud project and replace service-account-name with the name of
your new service account, for example
service-account1@myproject.iam.gserviceaccount.com
.gcloud auth login gcloud config set project project-id gcloud projects add-iam-policy-binding project-id
--member=serviceAccount:service-account-name
--role='roles/automl.editor' - Allow the AutoML Natural Language service accounts to access your Google Cloud project resources:
gcloud projects add-iam-policy-binding project-id
--member="serviceAccount:custom-vision@appspot.gserviceaccount.com"
--role="roles/storage.admin" - Install the client library.
- Set the PROJECT_ID and REGION_NAME environment variables.
Replace project-id with the Project ID of your Google Cloud Platform project. AutoML Natural Language currently requires the locationus-central1
.export PROJECT_ID="project-id" export REGION_NAME="us-central1"
- Create a Google Cloud Storage bucket to store the documents that you will
use to train your custom model.
The bucket name must be in the format:$PROJECT_ID-lcm
. The following command creates a storage bucket in theus-central1
region named$PROJECT_ID-lcm
.gsutil mb -p $PROJECT_ID -c regional -l $REGION_NAME gs://$PROJECT_ID-lcm/
- Copy the
happiness.csv
file from the public bucket to your Google Cloud Storage bucket.
Thehappiness.csv
file is in the NL-classification folder in the public bucket cloud-ml-data.
Source code file locations
If you want the source code, it can be found here. Please feel free to copy the source code files into your Google Cloud Platform project folder. Otherwise, we recommend directly copying the code from this page as you reach each step.
Python
The tutorial consists of these Python programs:
language_text_classification_create_dataset.py
– Includes functionality to create a datasetimport_dataset.py
– Includes functionality to import a datasetlanguage_text_classification_create_model.py
– Includes functionality to create a modellist_model_evaluations.py
– Includes functionality to list model evaluationslanguage_text_classification_predict.py
– Includes functionality related to predictiondelete_model.py
- Include functionality to delete a model
Java
The tutorial consists of these Java files:
LanguageTextClassificationCreateDataset.java
– Includes functionality to create a datasetImportDataset.java
– Includes functionality to import a datasetLanguageTextClassificationCreateModel.java
– Includes functionality to create a modelListModelEvaluations.java
– Includes functionality to list model evaluationsLanguageTextClassificationPredict.java
– Includes functionality related to predictionDeleteModel.java
– Includes functionality to delete a model
Node.js
The tutorial consists of these Node.js programs:
language_text_classification_create_dataset.js
– Includes functionality to create a datasetimport_dataset.js
– Includes functionality to import a datasetlanguage_text_classification_create_model.js
– Includes functionality to create a modellist_model_evaluations.js
– Includes functionality to list model evaluationslanguage_text_classification_predict.js
– Includes functionality related to predictiondelete_model.js
- Include functionality to delete a model
Running the application
Step 1: Create a dataset
The first step in creating a custom model is to create an empty dataset that will eventually hold the training data for the model. When you create a dataset, you specify the type of classification you want your custom model to perform:
- MULTICLASS assigns a single label to each classified document
- MULTILABEL allows a document to be assigned multiple labels
This tutorial creates a dataset named ‘happydb’ and uses MULTICLASS.
Copy the Code
Python
Java
Node.js
To authenticate to AutoML Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Request
Run the create_dataset
function to create an empty dataset. You'll need to modify
the following lines of code:
- Set the
project_id
to your PROJECT_ID Set the
display_name
for the dataset (happydb
)
Python
python language_text_classification_create_dataset.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.LanguageTextClassificationCreateDataset"
Node.js
node language_text_classification_create_dataset.js
Response
The response includes the details of the newly created dataset, including the
Dataset ID that you'll use to reference the dataset in future requests. We
recommend that you set an environment variable DATASET_ID
to the returned
Dataset ID value.
Dataset name: projects/216065747626/locations/us-central1/datasets/TCN7372141011130533778 Dataset id: TCN7372141011130533778 Dataset display name: happydb Text classification dataset specification: classification_type: MULTICLASS Dataset example count: 0 Dataset create time: seconds: 1530251987 nanos: 216586000
Step 2: Import training items into the dataset
The next step is to populate the dataset with a list of training content items labeled using the target categories.
The import_dataset
function interface takes as input a .csv file that lists the
locations of all training documents and the proper label for each training document.
(See Preparing your training data for
details about the required format.) For this tutorial, we will be using
happiness.csv
, which you uploaded to Google Cloud Storage above.
Copy the Code
Python
Java
Node.js
To authenticate to AutoML Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Request
Run the import_data
function to import the training content. The first
piece of code to change is the Dataset ID from the previous step and the second
is the URI of happiness.csv
. You'll need to modify
the following lines of code:
- Set the
project_id
to your PROJECT_ID - Set the
dataset_id
for the dataset (from the output of the previous step) Set the
path
which is the URI of the (gs://YOUR_PROJECT_ID-lcm/csv/happiness.csv
)
Python
python import_dataset.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.ImportDataset"
Node.js
node import_dataset.js
Response
Processing import... Dataset imported.
Step 3: Create (train) the model
Now that you have a dataset of labeled training documents, you can train a new model.
Copy the Code
Python
Java
Node.js
To authenticate to AutoML Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Request
Call the create_model
function to create a model. The Dataset ID is from the
previous steps. You'll need to modify the following lines of code:
- Set the
project_id
to your PROJECT_ID - Set the
dataset_id
for the dataset (from the output of the previous step) Set the
display_name
for your model (happydb_model)
Python
python language_text_classification_create_model.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.LanguageTextClassificationCreateModel"
Node.js
node language_text_classification_create_model.js
Response
The create_model
function kicks off a training operation and prints the operation
name. Training happens asynchronously and can take a while to complete, so you can
use the operation ID to check training status.
When training is complete, create_model
returns the Model ID. As with the Dataset
ID, you might want to set an environment variable MODEL_ID
to the returned
Model ID value.
Training operation name: projects/216065747626/locations/us-central1/operations/TCN3007727620979824033 Training started... Model name: projects/216065747626/locations/us-central1/models/TCN7683346839371803263 Model id: TCN7683346839371803263 Model display name: happydb_model Model create time: seconds: 1529649600 nanos: 966000000 Model deployment state: deployed
Step 4: Evaluate the model
After training, you can evaluate your model's readiness by reviewing its precision, recall, and F1 score.
The display_evaluation
function takes the Model ID as a parameter.
Copy the Code
Python
Java
Node.js
To authenticate to AutoML Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Request
Make a request to display the overall evaluation performance of the model by executing the following request. You'll need to modify the following lines of code:
- Set the
project_id
to your PROJECT_ID Set the
model_id
to your model's id
Python
python list_model_evaluations.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.ListModelEvaluations"
Node.js
node list_model_evaluations.js
Response
If the precision and recall scores are too low, you can strengthen the training dataset and re-train your model. For more information, see Evaluating models.
Precision and recall are based on a score threshold of 0.5 Model Precision: 96.3% Model Recall: 95.7% Model F1 score: 96.0% Model Precision@1: 96.33% Model Recall@1: 95.74% Model F1 score@1: 96.04%
Step 5: Deploy the model
When your custom model meets your quality standards, you can deploy it and then make predictions request.
Copy the Code
Python
Java
Node.js
To authenticate to AutoML Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Request
For the deploy_model
function you'll need to modify the following lines of
code:
- Set the
project_id
to your PROJECT_ID Set the
model_id
to your model's id
Python
python deploy_model.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.DeployModel.java"
Node.js
node deploy_model.js
Response
Model deployment finished.
Step 6: Use the model to make a prediction
After you deploy your model, you can use it to classify novel content.
Copy the Code
Python
Java
Node.js
To authenticate to AutoML Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Request
For the predict
function you'll need to modify the following lines of code:
- Set the
project_id
to your PROJECT_ID - Set the
model_id
to your model's id Set the
content
you want to predict
Python
python language_text_classification_predict.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.LanguageTextClassificationPredict"
Node.js
node language_text_classification_predict.js
Response
The function returns the classification score for how well the content matches each category.
Prediction results: Predicted class name: affection Predicted class score: 0.9702693223953247
Step 7: Delete a Model
When you are done using this sample model, you can delete it permanently. You will no longer be able to use the model for prediction.
Copy the Code
Python
Java
Node.js
To authenticate to AutoML Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Request
Make a request with operation type delete_model
to delete a model you created
you'll need to modify the following lines of code:
- Set the
project_id
to your PROJECT_ID Set the
model_id
to your model's id
Python
python delete_model.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.DeleteModel"
Node.js
node delete_model.js
Response
Model deleted.