Prerequisites
- Complete the instructions on the before you begin page.
- Make sure that the roles assigned to your service account allow write access to the project that you intended to use for topic modeling (Project > Owner or Project > Editor). The service account should also allow read access to the Cloud Storage API.
- Make sure that the Data Labeling API is enabled in your project. For details on enabling an API, see the cloud endpoints guide.
Import conversation data
To create a topic model you will need at least 10k conversations, otherwise the request will be rejected. For information on bulk importing conversation data, see the client tooling documentation.
You can provide your conversation data as either audio data (like phone call recordings) or as JSON-formatted text files. For details on the format and instructions for uploading it to Cloud Storage, see the conversation data reference.
You can upload these example conversation files to Cloud Storage:
- conversation-01.json
- conversation-02.json
- conversation-03.json
- conversation-04.json
- conversation-05.json
Training data best practices
Make sure all the transcripts are mostly in English (a few non-English words/sentences are fine). Topic modeling currently supports English conversations only.
Make sure that the conversation's speaker roles are assigned properly when the conversation is ingested. Each conversation turn should be accurately labeled as coming from either the customer or the agent. For messages from agents, specify whether the message is from a human agent or bot/system agents. Use
AGENT
for human agent roles, useAUTOMATED_AGENT
for bot agents. You can use eitherEND_USER
orCUSTOMER
for customer roles.Make sure most conversations have transcripts from both customer and agent channels. Conversations with only one channel won't be used in training.
We recommend that you check the Cloud Data Loss Prevention redaction quality, if applicable. Sometimes the redaction is overly aggressive and removes important information from the transcripts.
Provide 100k or more transcripts for training. The system works with a smaller number of transcripts, but more transcripts lead to better performance.
Create a model
To create a new model, you must define your model and send a creation request to the CCAI Insights API. In the definition for your model, you must provide the following information:
- A display name for your model.
- A training information configuration for your data. You can specify to use
either
CHAT
data orPHONE_CALL
data, depending on the data source of your chat transcripts. By default CCAI Insights will use all conversations in your Google Cloud Platform project to create the topic model.
REST
To create a topic model, call the create
method on the
issueModel
resource.
Before using any of the request data, make the following replacements:
- PROJECT_ID: your Google Cloud project ID.
- LOCATION_ID: the location you chose for your Cloud Storage bucket. The only
location currently available is
us-central1
. - MODEL_NAME: a human-readable name for the new issue model.
HTTP method and URL:
POST https://contactcenterinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/issueModels
Request JSON body:
{ "display_name": MODEL_NAME, "input_data_config": { "filter": "medium=\"CHAT\"" } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID" }
Python
To authenticate to CCAI Insights, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To authenticate to CCAI Insights, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To authenticate to CCAI Insights, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Operation status
Creating a topic model is a long-running operation, so it might take a substantial amount of time to complete. You can poll the status of the operation to see if it has completed.