Assessing the quality of training phrases in Dialogflow intents

This tutorial shows you how to analyze and evaluate the quality of the training phrases supplied to your Dialogflow agent's intents. The purpose of this analysis is to avoid confusing the agent with phrases irrelevant to the intents supplied to, or more relevant to, other intents.

The approach you take is to generate semantic embeddings of the training phrases by using the TensorFlow Hub (tf.Hub) Universal Sentence Encoder module. You then compute cohesion and separation measurements based on the similarity between embeddings within the same intents and different intents. The tutorial also identifies "confusing" training phrases, where they are nearer—in the embedding space—to intents that are different from the ones supplied for.

You can find the code for this tutorial in this Colab notebook. The article assumes that you have a basic background knowledge of Dialogflow. To learn more about Dialogflow, see this multi-part tutorial on how to build, secure, and scale a chatbot by using Dialogflow Enterprise Edition on Google Cloud.


Dialogflow lets you build conversational interfaces on top of your products and services by providing a powerful natural-language understanding (NLU) engine to process and understand natural language input. Use cases for Dialogflow include:

  • Building booking and reservation bots for airlines, cinemas, and so on.
  • Simplifying a system for ordering fast food for delivery.
  • Enabling efficient customer service through semi-automated call centers.

Although you can implement complex conversational flows to handle a user utterance, Dialogflow fundamentally performs the following steps:

  1. The user asks questions like, "What is the total of my bill for the last month?"
  2. The agent parses the input and matches it to an intent such as bill_value_inquiry.
  3. The agent also extracts entities information, like "last month".
  4. Given the intent of the extracted entities, the agent then invokes a fulfillment to respond to the user's request.

The following table describes the key concepts in the Dialogflow platform.

Term Description
agent Agents are best described as NLU modules that can be integrated into your system. An agent converts text or spoken user requests into actionable data, when a user's input matches an intent in your agent.
intent In a conversation, intents map user input to responses. In each intent, you define examples (training phrases) of user utterances that can trigger the intent, what to extract from each utterance, and how to respond.
entities Where intents allow your agent to understand the motivation behind a particular user input, entities are used to pick out specific pieces of information that your users mention. For example, street addresses, product names, or amounts with units can be used to fulfill the user's request.
fulfillment Fulfillment allows you to use the entity information extracted by the agent to generate dynamic responses or trigger actions on your backend on an intent-by-intent basis.

For more details on Dialogflow concepts, see the Dialogflow documentation.

Intents are essential to a Dialogflow system, because they link the user request to the right business logic to fulfill it. For example, a Dialogflow system for a telecom services provider might have intents like bill_value_inquiry, pay_bill, upgrade_contract, cancel_contract, and add_service. However, in order to match the user utterance (text or speech) to the right intent, intents need to be trained with a set of relevant training phrases. For example, for a weather inquiry intent, training phrases might be:

  • "What is the weather like right now?"
  • "What is the temperature in Cairo tomorrow?"
  • "Do I need to take an umbrella with me to Zurich next week?"

When you create several intents in your system, some phrases you supply to an intent might be confusing or misleading—for example, a phrase that's more relevant to another intent might be used to train the wrong intent. For example, suppose you have a Dialogflow agent that serves as the source of truth for a sales organization. You might have two intents for fetching contacts: one for the internal account teams and one for the customer. You might call these get_internal_contacts and get_external_contacts. A typical training phrase for each intent would be:

  • get_internal_contacts: "Who is the point of contact for Customer X?"
  • get_external_contacts: "How do I get in contact with Customer X?"

Suppose that your users supplied the following request as they were looking for the external contacts: "Contacts for Customer X". This request can confuse the Dialogflow agent because the phrase can match both intents. If the wrong intent matches, users will have a poor experience because they must change the formulation of the request, which is annoying and time consuming.

Therefore, you want to make sure that phrases within the same intent are more similar, while phrases between different intents are less similar. The rest of the tutorial explains how to evaluate the quality of the training phrase supplied for each intent, and how to identify potentially confusing training phrases.


The approach used in this tutorial is to compute the similarity between two phrases and, by extension, to compute the similarity matrix for all the training phrases. Once you have that matrix, you can compute the following:

  • Cohesion: The average similarity value between each pair of phrases in the same intent. That value is computed for each intent. The higher the intent cohesion value, the better the intent training phrases.
  • Separation: Given two intents, the average distance between each pair of training phrases in the two intents.
  • Confusing phrases: Training phrases that are highly similar to training phrases in other intents.

To compute a similarity value between two phrases, you must convert each phrase to a real-value feature vector, which represents the semantics of the phrase (embeddings). To help with this task, the tutorial uses TensorFlow Hub (tf.Hub), a library used for the publication, discovery, and consumption of reusable modules of machine learning models. These modules can be pre-trained models or embeddings that are extracted from text, images, and so on. You can browse the available text embeddings. The tutorial uses the Universal Sentence Encoder (v2) module, which is used to encode text into 512 dimensional vectors that can be used for text classification, semantic similarity, clustering, and other natural-language tasks.

In this tutorial, you use cosine similarity as a proximity metric between two embedding vectors. Given two real-value vectors (in our example, two embedding vectors extracted from two training phrases), cosine similarity calculates the cosine of the angle between them, using the following formula:

$$ \cos(A,B) = \frac{\sum_{i=1}^{n}A_iB_i}{\sqrt{\sum_{i=1}^{n}{A_i^2}}\sqrt{\sum_{i=1}^{n}{B_i^2}}} $$

In this formula, n is the number of elements in the vector. The smaller the angle between the vectors, the bigger the cosine value of this angle, indicating higher similarity. The cosine similarity value between any two vectors is always between 0 and 1.

Figure 1 shows an overview of the approach:

Overview of evaluating intents cohesion and separation

Figure 1: Overview of evaluating intents cohesion and separation

The figure illustrates the following sequence:

  1. Import the intents and their training phrases.
  2. Generate embeddings for the training phrases using the tf.Hub Universal Sentence Encoder pre-trained module.
  3. Create a visualization of the generated embeddings in a two-dimensional space.
  4. Compute the embeddings cosine similarity matrix containing the pairwise similarity values between all the training phrases in different intents.
  5. Calculate the cohesion and separation metrics.
  6. Identify the confusing phrases.


  • (Optional) Create a Dialogflow agent.
  • Import intents with training phrases.
  • Run the Colab notebook for intent quality assessment.


This tutorial uses the following billable components of Google Cloud:

  • Dialogflow: Standard Edition is free, while Enterprise Edition offers paid enterprise support. You can choose which edition to use when you create your Dialogflow agent. Your account can include agents from both editions. For more details, refer to the Dialogflow Pricing page.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. Enable the Dialogflow API.

    Enable the API

  5. Create a service account to call the Dialogflow API.

    Create service account
  6. In the Service account details dialog, enter the account name and description as shown in the following screenshot, and then click Create:

    Screenshot of service account details dialog
  7. Set the role to Dialogflow API Client and click Continue.

    Screenshot of service account permissions dialog

Completing the tutorial in the Colab notebook

The following sections walk through the steps discussed in the approach section to calculate the cohesion and separation metrics and to identify confusing phrases.

Getting started with the Colab notebook

  1. Go to the Colab notebook:

  2. Make a local copy to your Google Drive.

    Copying the notebook to your Google Drive

  3. In Cloud Shell, install the Python libraries needed for the rest of the tutorial, before importing the required libraries and modules.

    !pip install --quiet --upgrade tensorflow dialogflow scipy tensorflow-hub seaborn
  4. Set your Google Cloud PROJECT_ID and the SERVICE_ACCOUNT_EMAIL that you created in the Before you begin section.

    Set your Google Cloud PROJECT_ID and SERVICE_ACCOUNT_EMAIL

  5. Authenticate your session to create a key for your service account:

    !gcloud config set project {PROJECT_ID}
    !gcloud iam service-accounts keys create sa-key.json \
        --iam-account={SERVICE_ACCOUNT_EMAIL} --project={PROJECT_ID}

    After you run these commands, a link is displayed.

  6. Follow the link to authenticate your user account.

  7. Copy the authentication code from the web page, and paste it in the Enter verification code field in the notebook:

    **Enter verification code** field in the notebook

Setting up a Dialogflow agent

If you already have a Dialogflow agent that you want to use in this tutorial, you can skip this step. However, if you don't have an agent, or you want to set up a new one, you can download a zip file with the content of an exported Dialogflow agent, called intents-healthcheck. You import this agent into your Dialogflow account as follows:

  1. Download the zip file of the imported agent:

    gsutil cp gs://dialogflow-intent-health-check/ .
  2. Go to

  3. Click the Go to Console button on the top right.

  4. In the left menu, click Create new agent.

    Create new agent

  5. Enter the agent name: intents-healthcheck.

  6. Select your GCP project from the Google Project list.

    • A Google Cloud project can have only one Dialogflow agent. So if you don't find your Google Cloud project in the list, an agent is already associated with your project.
    • If you select Create a new project, Dialogflow creates a Google Cloud project with the same name as your agent.
  7. Click Create.

    Entering information about the agent

  8. In the left-hand menu, select the new agent and then click the Settings settings icon. Then in the menu in the middle of the page, select Export and Import.

    Export and Import dialog

  9. Click Restore from zip:

    1. Select the file you downloaded in step 1.
    2. Type RESTORE in the text box at the bottom of the form to confirm.
    3. Click Restore.

    Restore the agent from a zip file

    After restoring the agent, Dialogflow creates five intents.

  10. Verify the imported intents by selecting Intents from the menu on the left. You find the following intents:

    Verifying the imported intents

You use this restored agent for the rest of the tutorial.

Walking through the code in the Colab notebook

The sections that follow describe what the code in the notebook does when you run it.

Fetching your intents

The following code fetches intents and their training phrases from the Dialogflow agent using the fetch_intents_training_phrases method. This method returns a dictionary, where the keys are the intents named in your Dialogflow agent, and each value is a list of the training phrases in each entity. In the code, project references the project to which your agent belongs, and service_account_file references the file that you created earlier.

def get_intents(service_account_file, project):

    dialogflow_entity_client =  dialogflow.EntityTypesClient.from_service_account_file(service_account_file)
    parent = dialogflow_entity_client.project_agent_path(project)
    entities = list(dialogflow_entity_client.list_entity_types(parent))

    dialogflow_intents_client = dialogflow.IntentsClient.from_service_account_file(service_account_file)
    parent = dialogflow_intents_client.project_agent_path(project)
    intents = list(dialogflow_intents_client.list_intents(

    entities_name_to_value = {}
    for intent in intents:
        entities_used = {entity.display_name
            for entity in intent.parameters}

        for entity in entities:
            if entity.display_name in entities_used \
                    and entity.display_name not in entities_name_to_value:
                entities_name_to_value[entity.display_name] = np.random.choice(
                    np.random.choice(entity.entities).synonyms, replace=False)

    intent_to_training_phrases = defaultdict(list)
    for intent in intents:
        for training_phrase in intent.training_phrases:
            parts = [entities_name_to_value[part.alias] if part.entity_type else part.text
                for part in]
        # Remove intents with no training phrases
        if not intent_to_training_phrases[intent.display_name]:
            del intent_to_training_phrases[intent.display_name]
    return intent_to_training_phrases

The following code verifies the retrieved intents:

intent_training_phrases = fetch_intents_training_phrases("sa-key.json", project_id)
for intent in intent_training_phrases:
    print("{}:{}".format(intent, len(intent_training_phrases[intent])))

The fetch_intents_training_phrases method returns the following listing. This code snippet shows the intents in the demo intents-healthcheck agent, followed by the count of the training phrases available in each intent.


Generating embeddings for the training phrases

The following code downloads the tf.Hub Universal Sentence Encoder pre-trained module:

embed_module = hub.Module("")

After the first use, the module is cached locally.

The following code implements a method that accepts a list of sentences and returns a list of embeddings based on the tf.Hub module:

def make_embeddings_fn():
    placeholder = tf.placeholder(dtype=tf.string)
    embed = embed_module(placeholder)
    session = tf.Session()[tf.global_variables_initializer(), tf.tables_initializer()])
    def _embeddings_fn(sentences):
        computed_embeddings =
            embed, feed_dict={placeholder: sentences})
        return computed_embeddings
    return _embeddings_fn

generate_embeddings = make_embeddings_fn()

This method ensures that the tf.Session is created and that the embedding module is loaded only once, not every time the method is called.

The following code generates embeddings for the training phrases in the intents:

    intent: {
        training_phrase': [embedding_array]

training_phrases_with_embeddings = defaultdict(list)
for intent_name, training_phrases_list in intent_training_phrases.items():
    computed_embeddings = generate_embeddings(training_phrases_list)
    training_phrases_with_embeddings[intent_name] = dict(zip(training_phrases_list, computed_embeddings))

This code snippet creates the training_phrases_with_embeddings nested dictionary.

The following code verifies the generated embeddings:

training_phrases_with_embeddings = defaultdict(list)
for intent_name, training_phrases_list in intent_training_phrases.items():
    computed_embeddings = generate_embeddings(training_phrases_list)
    training_phrases_with_embeddings[intent_name] = dict(zip(training_phrases_list, computed_embeddings))

The following code snippet shows each training phrase in the start_conversation intent, along with the first five elements of the embedding vector of each phrase. The Universal Sentence Encoder generates a 512-dimension embedding vector for each training phrase.

Ciao!:[-0.03649221  0.02498418 -0.03456857  0.02827227  0.00471277]
Howdy!:[-0.02732556 -0.00821852 -0.00794602  0.06356855 -0.03726532]
Hello!:[-0.0255452   0.00690543 -0.00611844  0.05633081 -0.0142823 ]
Hi!:[-0.03227544 -0.00985429 -0.01329378  0.06012927 -0.03646606]

Visualizing embeddings in two-dimensional space

The following code reduces the dimensionality of the embeddings from 512 to 2 by using Principal Component Analysis to compute the principal components:

from sklearn.decomposition import PCA
embedding_vectors = None

for intent in training_phrases_with_embeddings:
    embeddings = list(training_phrases_with_embeddings[intent].values())
    if embedding_vectors is None:
        embedding_vectors = embeddings
        embedding_vectors = np.concatenate((only_embeddings, embeddings))

pca = PCA(n_components=3)

This code snippet uses the PCA class in sklearn to generate a 2D representation of the training phrases embeddings.

The following code generates a visualization of the phrase embeddings with the reduced dimensionality:

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111)

legend = []

for color, intent in enumerate(training_phrases_with_embeddings):
    phrases = list(training_phrases_with_embeddings[intent].keys())
    embeddings = list(training_phrases_with_embeddings[intent].values())
    points = pca.transform(embeddings)
    xs = points[:,0]
    ys = points[:,1]
    ax.scatter(xs, ys, marker='o', s=100, c="C"+str(color))
    for i, phrase in enumerate(phrases):
        ax.annotate(phrase, (xs[i], ys[i]))


The following figure shows the resulting visualization: Visualizing the phrase embeddings with the reduced dimensionality

Computing pairwise similarity between phrases

The following code computes the pairwise cosine similarity for the training phrases embeddings, using sklearn.metrics.pairwise.cosine_similarity. The code creates a Dataframe, similarity_df, with the pairwise similarity values.

from sklearn.metrics.pairwise import cosine_similarity

flatten = []
for intent in training_phrases_with_embeddings:
        for phrase in training_phrases_with_embeddings[intent]:
            flatten.append((intent, phrase, training_phrases_with_embeddings[intent][phrase]))

data = []
for i in range(len(flatten)):
    for j in range(i+1, len(flatten)):
        intent_1 = flatten[i][0]
        phrase_1 = flatten[i][1]
        embedd_1 = flatten[i][2]
        intent_2 = flatten[j][0]
        phrase_2 = flatten[j][1]
        embedd_2 = flatten[j][2]
        similarity = cosine_similarity([embedd_1], [embedd_2])[0][0]
        record = [intent_1, phrase_1, intent_2, phrase_2, similarity]

similarity_df = pd.DataFrame(data,
    columns=["Intent A", "Phrase A", "Intent B", "Phrase B", "Similarity"])

The following code displays sample similarity records:

different_intent = similarity_df['Intent A'] != similarity_df['Intent B']

The following code snippet shows the most similar training phrases that don't belong to the same intent:

The most similar training phrases that don't belong to the same intent

Phrases in different intents that have high similarity value can be confusing to the Dialogflow agent, and could lead to directing the user input to the wrong intent.

Measuring cohesion and separation of intents

The following code computes a cohesion value for each intent, as described in the Approach section.

same_intent = similarity_df['Intent A'] == similarity_df['Intent B']
cohesion_df = pd.DataFrame(similarity_df[different_intent].groupby('Intent A', as_index=False)['Similarity'].mean())
cohesion_df.columns = ['Intent', 'Cohesion']

The result is a cohesion value for each intent:

Computing a cohesion value for each intent

The following code computes the pairwise separation between intents, as described in the Approach section.

different_intent = similarity_df['Intent A'] != similarity_df['Intent B']
separation_df = pd.DataFrame(similarity_df[different_intent].groupby(['Intent A', 'Intent B'], as_index=False)['Similarity'].mean())
separation_df['Separation'] = 1 - separation_df['Similarity']
del separation_df['Similarity']

The result is the pairwise separation between intents:

Computing the pairwise separation between intents

Further improvements

To improve the quality of the training phrases for your intents, consider the following approaches:

  • Find the phrases in different intents with high similarity, and change or remove them.
  • Find the phrases with the most similar phrases that belong to different intents.
  • Add more training phrases in intents with low cohesion, and investigate training phrases in intents with low separation.

Cleaning up

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project that you want to delete and then click Delete .
  3. In the dialog, type the project ID and then click Shut down to delete the project.

What's next