Prerequisites
- Complete the instructions on the before you begin page.
- Make sure that the roles assigned to your service account allow write access to the project that you intended to use for topic modeling (Project > Owner or Project > Editor). The service account should also allow read access to the Cloud Storage API.
Import conversation data
Data Requirements
V1 models require more data to perform well.
- A minimum of 10k conversations are required.
- A minimum of 5k conversations with 5 back and forth turns between an agent and a customer are required.
- We recommend using about 100k conversations for training.
V2 models can work with smaller datasets.
- A minimum of 1k conversations with 5 back and forth turns between an agent and a customer are required.
- We recommend using about 10k conversations for training.
You can provide your conversation data as either audio data (like phone call recordings) or as JSON-formatted text files. For details on the format and instructions for uploading it to Cloud Storage, see the conversation data reference.
You can upload these example conversation files to Cloud Storage:
- conversation-01.json
- conversation-02.json
- conversation-03.json
- conversation-04.json
- conversation-05.json
When you have imported your conversation data, you can list and filter the conversations using the API.
Training data best practices
Make sure almost all the transcripts are in the same language, because mixing different languages in the same model doesn't work well. A few transcripts of a different language should be fine.
Make sure that the conversation's speaker roles are assigned properly when the conversation is ingested. Each conversation turn should be accurately labeled as coming from either the customer or the agent. For messages from agents, specify whether the message is from a human agent or virtual agents. Use
AGENT
for human agent roles, useAUTOMATED_AGENT
for bot agents. You can use eitherEND_USER
orCUSTOMER
for customer roles.Make sure most conversations have transcripts from both customer and agent channels. Conversations with only one channel won't be used in training.
We recommend that you check the Cloud Data Loss Prevention redaction quality, if applicable. Sometimes the redaction is overly aggressive and removes important information from the transcripts.
Create a model from the Insights console
Follow these steps to create a model from the Insights console:
Go to the Insights console, and select your project from the drop-down.
Navigate to the topic models, and click Create New.
Ensure Topic Model V2 is selected as the model training strategy.
Optional: Select a language from the Language drop-down list to train a non-English model. Insights supports French, German, Italian, Spanish, and Portuguese languages. This automatically filters conversations of the selected language to use during training.
Select the conversations for training. To filter the training conversations, you can filter by any available field using the drop-down list.
Click Start Training to begin training a new topic model.
Create a model with the REST API
To create a V2 topic model from the REST API, follow the prerequisite and data import steps for topic model creation.
Send a creation request to the Insights API with a model definition
to create a model. In addition to a display name and training data configuration,
you must include the model type TYPE_V2
in your request. You can optionally
specify the language_code
field in the request to train a model for a specific
language.
REST
To create a topic model, call the create
method on the
issueModel
resource.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- MODEL_NAME: A human-readable name for the new issue model.
HTTP method and URL:
POST https://contactcenterinsights.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/issueModels
Request JSON body:
{ "display_name": "my new test model", "input_data_config": { "filter": "medium=\"CHAT\"", "custom_taxonomy": { "taxonomy_entries": [ { "display_name": "reschedule car service" }, { "display_name": "problem with windshield wipers" } ] }, "industry": "auto", "issue_granularity": "STANDARD" }, "model_type": "TYPE_V2", "language_code": "en-US" }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID" }
Operation status
Creating a topic model is a long-running operation, so it might take a substantial amount of time to complete. You can poll the status of the operation to see if it has completed.