Create a conversation dataset

A conversation dataset contains conversation transcript data. This data is used to train a Smart Reply model and recommend text responses to human agents conversing with an end-user. See the data overview page for more details on the types of data that can be used by Agent Assist. If you would like to test API integration or feature functionality without uploading your own data, you can use conversation data provided by Agent Assist.

Agent Assist also provides publicly-available conversation data if you would like to create a conversation dataset to see how Smart Reply works or test your integration before uploading your own data. See the conversation data format documentation for more information about using the test data.

Before you begin

  1. Follow the Dialogflow setup instructions to enable Dialogflow on a Google Cloud Platform (GCP) project.
  2. Enable the Data Labeling API for your project. Agent Assist uses the Data Labeling API to create your dataset.
  3. We recommend that you read the Agent Assist basics page before starting this tutorial.
  4. (Optional) Review the documentation about using the publicly-available conversation data and model if you would like to test Smart Reply functionality without providing your own data. If you choose this option, you can skip ahead and create a conversation profile using the publicly-available conversation dataset and pre-trained model.
  5. If you are implementing Smart Reply using your own conversation data, make sure your transcripts are in JSON in the specified format and stored in a Google Cloud Storage bucket. A conversation dataset must contain at least 30,000 conversations, otherwise model training will fail. As a general rule, the more conversations you have the better your model quality will be. We suggest that you remove any conversations with fewer than 20 messages or 3 conversation turns (changes in which participant is making an utterance). We also suggest that you remove any bot messages or messages automatically generated by systems (for example, "Agent enters the chat room"). We recommend that you upload at least 3 months of conversations to ensure coverage of as many use cases as possible. The maximum number of conversations in a conversation dataset is 1,000,000.

  6. Navigate to the Agent Assist Console. Select your GCP project, then click on the Data menu option on the far left margin of the page:

    The Data menu displays all of your data. There are two tabs, one each for conversation datasets and knowledge bases:

  7. Click on the conversation datasets tab, then on the +Create new button at the top right of the conversation datasets page:

Create a conversation dataset

  1. When you create a new conversation dataset, the following page appears:

  2. Enter a Name and optional Description for your new dataset. In the Conversation data field, enter the URI of the storage bucket that contains your conversation transcripts. Agent Assist supports use of the * symbol for wildcard matching. The URI should have the following format:

    gs://<bucket name>/<object name>
    

    For example:

    gs://mydata/conversationjsons/conv0*.json
    gs://mydatabucket/test/conv.json
    
  3. Click Create. Your new dataset now appears in the dataset list on the Data menu page under the Conversation datasets tab.

What's next

Train a Smart Reply model on one or more conversation datasets using the Agent Assist console.