Create a topic model


  1. Complete the instructions on the before you begin page.
  2. Make sure that the roles assigned to your service account allow write access to the project that you intended to use for topic modeling (Project > Owner or Project > Editor). The service account should also allow read access to the Cloud Storage API.

Import conversation data

Data Requirements

  1. V1 models require more data to perform well.

    1. A minimum of 10k conversations are required.
    2. A minimum of 5k conversations with 5 back and forth turns between an agent and a customer are required.
    3. We recommend using about 100k conversations for training.
  2. V2 models can work with smaller datasets.

    1. A minimum of 1k conversations with 5 back and forth turns between an agent and a customer are required.
    2. We recommend using about 10k conversations for training.

You can provide your conversation data as either audio data (like phone call recordings) or as JSON-formatted text files. For details on the format and instructions for uploading it to Cloud Storage, see the conversation data reference.

You can upload these example conversation files to Cloud Storage:

Training data best practices

  1. Make sure all the transcripts are mostly in English (a few non-English words/sentences are fine). Topic modeling currently supports English conversations only.

  2. Make sure that the conversation's speaker roles are assigned properly when the conversation is ingested. Each conversation turn should be accurately labeled as coming from either the customer or the agent. For messages from agents, specify whether the message is from a human agent or bot/system agents. Use AGENT for human agent roles, use AUTOMATED_AGENT for bot agents. You can use either END_USER or CUSTOMER for customer roles.

  3. Make sure most conversations have transcripts from both customer and agent channels. Conversations with only one channel won't be used in training.

  4. We recommend that you check the Cloud Data Loss Prevention redaction quality, if applicable. Sometimes the redaction is overly aggressive and removes important information from the transcripts.

Create a model from the CCAI Insights console

Follow these steps to create a model from the CCAI Insights console:

  1. Go to the CCAI Insights console, and select your project from the dropdown.

  2. Navigate to the topic models, and click Create New.

  3. Ensure Topic Model V2 is selected as the model training strategy.

  4. Optional: Select a language from the Language dropdown to train a non-English model. CCAI Insights supports French, German, Italian, Spanish and Portuguese languages. This automatically filters conversations of the selected language to use during training.

  5. Select the conversations for training. To filter the training conversations, you can filter by any available field using the dropdown.

  6. Click Start Training to begin training a new topic model.

Create a model with the REST API

To create a V2 topic model from the REST API, follow the prerequisite and data import steps for topic model creation.

Send a creation request to the CCAI Insights API with a model definition to create a model. In addition to a display name and training data configuration, you must include the model type TYPE_V2 in your request. You can optionally specify the language_code field in the request to train a model for a specific language.


To create a topic model, call the create method on the issueModel resource.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your project ID.
  • MODEL_NAME: A human-readable name for the new issue model.

HTTP method and URL:


Request JSON body:

  "display_name": "my new test model",
  "input_data_config": {
      "filter": "medium=\"CHAT\"",
      "custom_taxonomy": {
        "taxonomy_entries": [
            "display_name": "reschedule car service"
            "display_name": "problem with windshield wipers"
      "industry": "auto",
      "issue_granularity": "STANDARD"
   "model_type": "TYPE_V2",
   "language_code": "en-US"

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

  "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID"

Operation status

Creating a topic model is a long-running operation, so it might take a substantial amount of time to complete. You can poll the status of the operation to see if it has completed.