Edit on GitHub
Report issue
Page history

How to build a conversational app using Cloud Machine Learning APIs Part 1 of 3

Author(s): @PokerChang ,   Published: 2018-06-19


For consumers, conversational apps (such as chatbot) are among the most visible examples of machine learning in action. For developers, building a conversational app is instructive for understanding the value that machine-learning APIs bring to the process of creating completely new user experiences.

In this three-part post, we will show you how to build an example tour guide app for Apple iOS that can see, listen, talk and translate via API.AI (a developer platform for creating conversational experiences) and Google Cloud ML APIs for Speech, Vision and Translate. You will also see how easy it is to support multiple languages on these platforms.

English Demo

The three parts will focus on the following topics:

Part 1

  • Overview
  • Architecture
  • API.AI intents
  • API.AI contexts

Part 2

Part 3

  • Support the Google Assistant via Actions on Google Integration

Architecture

alt_text

Using API.AI

API.AI is a platform for building natural and rich conversational experiences. For our example, it will handle all core conversation flows in the tour guide app. (Note that API.AI provides great documentation and a sample app for its iOS SDK. SDKs for other platforms are also available, so you could easily extend this tour guide app to support Android.)

Create Agent

The first step is to create a Tour Guide Agent.

Create Intents

alt_text

To engage users in a conversation, we first need to understand what users are saying to the agent, and we do that with intents and entities. Intents map what your users say to what your conversational experience should do. Entities extract parameter values from user queries.

Each intent contains a set of examples of user input and the desired automated response. To do that, you need to predict what users will say to open the conversation, and then enter those phrases in the "Add user expression" box. This list doesn't need to be comprehensive. API.AI uses Machine Learning to train the agent to understand more variations of these examples. Later on, you can train the API.AI agent to understand more variations. For example, go to the Default Welcome Intent and add some user expressions "how are you", "hello", "hi" to open the conversation.

The next step after that is to add some more text responses.

Default Welcome Intent Screenshot

Next, it's time to work on contexts.

Contexts

Contexts represent the current context of a user's request. They are helpful for differentiating phrases that may be vague or have different meanings depending on the user's preferences or geographic location, the current page in an app, or the topic of conversation. Let's look at an example.

User: Where am I?

Bot: Please upload a nearby picture and I can help find out where you are.

[User uploads a picture of Golden Gate Bridge.]

Bot: You are near Golden Gate Bridge.

User: How much is the ticket?

Bot: Golden Gate Bridge is free to visit.

User: When does it close today?

Bot: Golden Gate Bridge is open 24 hours a day, 7 days a week.

User: How do I get there?

[Bot shows a map to Golden Gate Bridge.]

In the above conversation, when the user asks "How much is the ticket?" and "When does it close today?" or "How do I get there?", the bot understands that the context is around Golden Gate Bridge.

The next thing to do is to weave intents and contexts together. For our example, each box in the diagram below is an intent and a context; the arrows indicate the relationships between them.

alt_text

Output Contexts

Contexts are tied to user sessions (a session ID that you pass in API calls). If a user expression is matched to an intent, the intent can then set an output context to be shared by this expression in the future. You can also add a context when you send the user request to your API.AI agent. In our example, the where intent sets the where output context so that location intent will be matched in the future.

Input Contexts

Input contexts limit intents to be matched only when certain contexts are set. In our example, location's input context is set to where. The location intent is matched only when we are under where context.

Here are the steps to generate these intents and contexts:

  1. Create where intent and add where output context. This is the root in the context tree and has no input context. alt_text
  2. Create location intent.
    1. Add where input context.
    2. Reset where output context and add location output context. Note: In our tour guide app, the input context of location is where. When the location intent is detected, the where context needs to be reset so that any subsequent conversation won't trigger this context again. This is done by setting the lifespan of the output context where to 0 request. By default, a context has a lifespan of 5 requests or 10 minutes. alt_text
  3. Create ticket intent.
    1. Add location input context.
    2. Add location output context so that hours and map intents can continue to use the location context as input context.

You can pass the parameter from the input context with the format of #context.parameter; e.g., pass the location string from intent inquiry-where-location to inquiry.where.location.ticket in the format #inquiry-where-location.location.

alt_text

Finally, create hours and map intents similar to ticket intent.

Next time

In Part 2, we'll cover how to use Webhook integrations in API.AI to pass information from a matched intent into a Cloud Functions web service and then get a result. Finally, we'll cover how to integrate Cloud Vision/Speech/Translation API, including support for Chinese language.

In Part 3, we'll cover how to support the Google Assistant via Actions on Google integration.

You can download the source code from github.

Author Chang Luo

Submit a Tutorial

Share step-by-step guides

SUBMIT A TUTORIAL

Request a Tutorial

Ask for community help

SUBMIT A REQUEST

GCP Tutorials

Tutorials published by GCP

VIEW TUTORIALS

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.