How to build a conversational app using Cloud Machine Learning APIs, Part 1
Chang Luo
Software Engineer
Bob Liu
Software Engineer
For consumers, conversational apps (such as chatbot) are among the most visible examples of machine learning in action. For developers, building a conversational app is instructive for understanding the value that machine-learning APIs bring to the process of creating completely new user experiences.
In this three-part post, we'll show you how to build an example “tour guide” app for Apple iOS that can see, listen, talk and translate via API.AI (a developer platform for creating conversational experiences) and Google Cloud Machine Learning APIs for Speech, Vision and Translate. You'll also see how easy it is to support multiple languages on these platforms.

The three parts will focus on the following topics:
Part 1
- Overview
- Architecture
- API.AI intents
- API.AI contexts
Part 2
- API.AI webhook with Cloud Functions
- Cloud Vision API
- Cloud Speech API
- Cloud Translation API
- Support multiple languages
- Support the Google Assistant via Actions on Google Integration
This post is Part 1. Part 2 and 3 will be published in the following weeks.
Architecture
Using API.AI
API.AI is a platform for building natural and rich conversational experiences. For our example, it will handle all core conversation flows in the tour guide app. (Note that API.AI provides great documentation and a sample app for its iOS SDK. SDKs for other platforms are also available, so you could easily extend this tour guide app to support Android.)Create Agent
The first step is to create a “Tour Guide Agent.”
Create Intents
To engage users in a conversation, we first need to understand what users are saying to the agent. We do that with intents and entities. Intents map what your users say to what your conversational experience should do. Entities are used to extract parameter values from use queries.
Each intent contains a set of examples of user input and the desired automated response. To do that, you need to predict what users will say to open the conversation, and then enter those phrases in the “Add user expression” box. This list doesn’t need to be comprehensive. API.AI uses machine learning to train the agent to understand more variations of these examples. Later on, you can train the API.AI agent to understand more variations. For example, go to the Default Welcome Intent and add some user expressions “how are you,” “hello,” “hi” to open the conversation.
The next step after that is to add some more text responses.
Next, it’s time to work on contexts.
Contexts
Contexts represent the current context of a user’s request. They're helpful for differentiating phrases that may be vague or have different meanings depending on the user’s preferences or geographic location, the current page in an app or the topic of conversation. Let’s look at an example.
User: Where am I?
Bot: Please upload a nearby picture and I can help find out where you are.
[User uploads a picture of Golden Gate Bridge.]
Bot: You are near Golden Gate Bridge.
User: How much is the ticket?
Bot: Golden Gate Bridge is free to visit.
User: When does it close today?
Bot: Golden Gate Bridge is open 24 hours a day, 7 days a week.
User: How do I get there?
[Bot shows a map to Golden Gate Bridge.]
In the above conversation, when user asks “How much is the ticket?” and “When does it close today?” or “How do I get there?”, the bot understands that the context is around Golden Gate Bridge.
The next thing to do is to weave intents and contexts together. For our example, each box in the diagram below is an intent and a context; the arrows indicate the relationships between them.
Output Contexts
Contexts are tied to user sessions (a session ID that you pass in API calls). If a user expression is matched to an intent, the intent can then set an output context to be shared by this expression in the future. You can also add a context when you send the user request to your API.AI agent. In our example, the where intent sets the where output context so that Location intent will be matched in the future.
Input Contexts
Input contexts limit intents to be matched only when certain contexts are set. In our example, location’s input context is set to where. The location intent is matched only when we're under where context.
Here are the steps to generate these intents and contexts:
First, create where
intent and add where
output context. This is the root in the context tree and has no input context.
Second, create location
intent. Add where
input context. Reset where
output context and add location
output context. In our tour guide app, the input context of location
is where
. When the location
intent is detected, the where
context needs to be reset so that any subsequent conversation won’t trigger this context again. This is done by setting the lifespan of the output context where
to 0. By default, a context has a lifespan of 5 requests or 10 minutes.
Next, create ticket
intent. Add location
input context. Add location
output context
so that hours
and map
intents can continue to use the location
context as input context.
You can pass the parameter from the input context with the format of #context.parameter;
e.g., pass the location string from intent inquiry-where-location to inquiry.where.location.ticket
in the format #inquiry-where-location.location.
Finally, create hours
and map
intents similar to ticket
intent.
Next time
In Part 2, we’ll cover how to use Webhook integrations in API.AI to pass information from a matched intent into a Cloud Functions web service and then get a result. Finally, we’ll cover how to integrate Cloud Vision/Speech/Translation API, including support for Chinese language.In part 3, we'll cover how to support the Google Assistant via Actions on Google Integration.
You can download the source code from github.