Upload conversation data

Conversation data is accepted as transcripts (Smart Reply) and transcripts plus annotation data (Summarization). Optionally, you can use Agent Assist-provided conversation data and demo models to test functionality or integration without having to provide your own data. In order to use Smart Reply and Summarization during runtime you must provide your own conversation data.

This page guides you through the steps required to use the public datasets as well as to format your own data for upload to Cloud Storage. You must provide your conversation data as JSON-formatted text files.

Smart Reply data format

Smart Reply can be used in conjunction with any Agent Assist feature, or as a stand-alone feature. In order to implement Smart Reply, you must provide Agent Assist with conversation data.

Agent Assist provides sample conversation data that you can use to train a model, plus a demo model and allowlist. You can use these resources to create a conversation profile and test feature functionality without needing to provide your own data. If you do provide your own data, it must be in the specified format.

Use the Smart Reply sample conversation data

The sample conversation dataset is derived from an external source and is stored in a Google Cloud Storage bucket. The data contains task-oriented dialogues touching six domains: "Booking", "restaurant", "hotel", "attraction", "taxi", and "train". To train your own model using this dataset, follow the steps to create a conversation dataset using the Agent Assist Console. In the Conversation data field, enter gs://smart_messaging_integration_test_data/*.json to use the test dataset. If you are making direct API calls instead of using the Console, you can create a conversation dataset by pointing the API to the Cloud Storage bucket above.

Use the demo Smart Reply model and allowlist

To test out the demo Smart Reply model and allowlist using the Console (no need for a dataset), navigate to the Agent Assist Console and click the Get started button under the Smart Reply feature. The Console tutorials give you options for using your own data, provided data, or the demo model.

If you are making calls to the API directly instead of using the Console, the model and allowlist can be found in the following locations:

Model: projects/ccai-shared-external/conversationModels/c671dd72c5e4656f
Allowlist: projects/ccai-shared-external/knowledgeBases/smart_messaging_kb/documents/NzU1MDYzOTkxNzU0MjQwODE5Mg

To test feature functionality, we suggest that you start by using the following end-user messages to trigger a response:

"Can you find me an expensive place to stay that is located in the east?"
"I'm looking for an expensive restaurant that serves Thai food, please."
"Hi, I need a hotel that includes free wifi in the north of Cambridge."

Summarization data format

Summarization can be used in conjunction with any Agent Assist feature, or as a stand-alone feature. In order to implement Summarization, you must provide Agent Assist with conversation data that includes annotations. An annotation is a summary of an associated conversation transcript. Annotations are used to train a model that you can use to generate summaries for your agents at the end of each conversation with an end-user.

Use the sample Summarization conversation data and demo model

Agent Assist also provides sample annotated conversation data that you can use to train a model. We recommend that you choose this option if you would like to test the Summarization feature before you format your own dataset. The test dataset is located in the following Cloud Storage bucket: gs://summarization_integration_test_data/data. If you use the sample data, you can train a Summarization model using either the Console or the API. Enter gs://summarization_integration_test_data/data/* in the dataset URI field to use the sample dataset.

To test out the demo Summarization model (no need for a dataset), navigate to the Agent Assist Console and click the Get started button under the Summarization feature. The Console tutorials give you options for using your own data, provided data, or the demo model.

Format annotations

Agent Assist Summarization custom models are trained using conversation datasets. A conversation dataset contains your own uploaded transcript and annotation data.

Before you can begin uploading data, you must make sure that that each conversation transcript is in JSON format, has an associated annotation, and is stored in a Google Cloud Storage bucket.

To create annotations, add expected key and value strings to the annotation field associated with each conversation in your dataset. For best results, annotation training data should adhere to the following guidelines:

The recommended minimum number of training annotations is 1000. The enforced minimum number is 100.
Training data should not contain PII.
Annotations should not include any information about gender, race or age.
Annotations should not use toxic or profane language.
Annotations should not contain any information that can't be inferred from the corresponding conversation transcript.
Each annotation can contain up to 3 sections. You can choose your own section names.
Annotations should have correct spelling and grammar.

The following is an example demonstrating the format of a conversation transcript with associated annotation:

{
  "entries": [
    {
      "text": "How can I help?",
      "role": "AGENT"
    },
    {
      "text": "I cannot login",
      "role": "CUSTOMER"
    },
    {
      "text": "Ok, let me confirm. Are you experiencing issues accessing your account",
      "role": "AGENT"
    },
    {
      "text": "Yes",
      "role": "CUSTOMER"
    },
    {
      "text": "Got it. Do you still have access to the registered email for the account",
      "role": "AGENT"
    },
    {
      "text": "Yes",
      "role": "AGENT"
    },
    {
      "text": "I have sent an email with reset steps. You can follow the instructions in the email to reset your login password",
      "role": "AGENT"
    },
    {
      "text": "That's nice",
      "role": "CUSTOMER"
    },
    {
      "text": "Is there anything else I can help",
      "role": "AGENT"
    },
    {
      "text": "No that's all",
      "role": "CUSTOMER"
    },
    {
      "text": "Thanks for calling. You have a nice day",
      "role": "AGENT"
    }
  ],
  "conversation_info": {
    "annotations": [
      {
        "annotation": {
          "conversation_summarization_suggestion": {
            "text_sections": [
              {
                "key": "Situation",
                "value": "Customer was unable to login to account"
              },
              {
                "key": "Action",
                "value": "Agent sent an email with password reset instructions"
              },
              {
                "key": "Outcome",
                "value": "Problem was resolved"
              }
            ]
          }
        }
      }
    ]
  }
}

Conversation transcript data

Text conversation data must be supplied in JSON-formatted files, where each file contains data for a single conversation. The following describes the required JSON format.

Conversation

The top-level object for conversation data.

Field	Type	Description
conversation_info	ConversationInfo { }	Optional. Metadata for the conversation.
entries	Entry [ ]	Required. The chronologically ordered conversation messages.

ConversationInfo

The metadata for a conversation.

Field	Type	Description
categories	Category [ ]	Optional. Custom categories for the conversation data.

Field	Type	Description
display_name	string	Required. A display name for the category.

Entry

Data for a single conversation message.

Field	Type	Description
text	string	Required. The text for this conversation message. All text should be capitalized properly. Model quality can be significantly impacted if all letters in the text are either capitalized or lowercase. An error will be returned if this field is left empty.
user_id	integer	Optional. A number that identifies the conversation participant. Each participant should have a single `user_id`, used repeatedly if they participate in multiple conversations.
role	string	Required. The conversation participant role. One of: "AGENT", "CUSTOMER".
start_timestamp_usec	integer	Optional if the conversation is only used for FAQ assist, Article Suggestion and Summarization, otherwise Required. The timestamp for the start of this conversation turn in microseconds.

Example

The following shows an example of a conversation data file.

{
  "conversation_info":{
    "categories":[
      {
        "display_name":"Category 1"
      }
    ]
  },
  "entries": [
    {
      "start_timestamp_usec": 1000000,
      "text": "Hello, I'm calling in regards to ...",
      "role": "CUSTOMER",
      "user_id": 1
    },
    {
      "start_timestamp_usec": 5000000,
      "text": "Yes, I can answer your question ...",
      "role": "AGENT",
      "user_id": 2
    },
    ...
  ]
}

Upload conversations to Cloud Storage

You must provide your conversation data in a Cloud Storage bucket contained within your Google Cloud Platform project. When creating the bucket:

Be sure that you have selected the Google Cloud Platform project you use for Dialogflow.
Use the Standard Storage class.
Set the bucket location to a location nearest to your location. You will need the location ID (for example, us-west1) when providing the conversation data, so take note of your choice.
You will also need the bucket name when providing the conversation data.

Follow the Cloud storage quickstart instructions to create a bucket and upload files.

Upload conversation data Stay organized with collections Save and categorize content based on your preferences.

Smart Reply data format

Use the Smart Reply sample conversation data

Use the demo Smart Reply model and allowlist

Summarization data format

Use the sample Summarization conversation data and demo model

Format annotations

Conversation transcript data

Conversation

ConversationInfo

Category

Entry

Example

Upload conversations to Cloud Storage

Upload conversation data