Creating a conversation dataset

A conversation dataset contains conversation transcript data. This data is used to train a Smart Reply model and recommend text responses to human agents conversing with an end-user. See the data overview page for more details on the types of data that can be used by Agent Assist. If you would like to test API integration or feature functionality without uploading your own data, you can use conversation data provided by Agent Assist.

Before you begin

  1. Follow the Dialogflow setup instructions to enable Dialogflow on a Google Cloud Platform (GCP) project.
  2. Enable the Data Labeling API for your project.
  3. We recommend that you read the Agent Assist basics page before starting this tutorial.
  4. (Optional) Review the documentation about using the publicly-available conversation data and model if you would like to test Smart Reply functionality without providing your own data. If you choose this option, you can skip ahead and create a conversation profile using the publicly-available conversation dataset and pre-trained model.
  5. If you are implementing Smart Reply using your own conversation data, make sure your transcripts are in JSON in the specified format and stored in a Google Cloud Storage bucket. A conversation dataset must contain at least 30,000 conversations, otherwise model training will fail. As a general rule, the more conversations you have the better your model quality will be. We suggest that you remove any conversations with fewer than 20 messages or 3 conversation turns (changes in which participant is making an utterance). We recommend that you upload at least 3 months of conversations to ensure coverage of as many use cases as possible. The maximum number of conversations in a conversation dataset is 1,000,000.

  6. Navigate to the Agent Assist Console. Select your GCP project, then click on the Data menu option on the far left margin of the page:

    The Data menu displays all of your data. There are two tabs, one each for conversation datasets and knowledge bases:

  7. Click on the conversation datasets tab, then on the +Create new button at the top right of the conversation datasets page:

Create a conversation dataset

  1. When you create a new conversation dataset, the following page appears:

  2. Enter a Name and optional Description for your new dataset. In the Conversation data field, enter the URI of the storage bucket that contains your conversation transcripts. Agent Assist supports use of the * symbol for wildcard matching. The URI should have the following format:

    gs://<bucket name>/<object name>
    

    For example:

    gs://mydata/conversationjsons/conv0*.json
    gs://mydatabucket/test/conv.json
    
  3. At the bottom of the page is a drop-down Objective menu:

    If you know for sure that your dataset will be used to train a Smart Reply model, you can make that selection now. Otherwise, you can create a dataset without assigning it to a model type. Make your selection and click Create. Your new dataset now appears in the dataset list on the Data menu page under the Conversation datasets tab.

What's next

Train a Smart Reply model on one or more conversation datasets using the Agent Assist console.