How to build a conversational app using Cloud Machine Learning APIs - part 1 of 3
Contributed by Google employees.
For consumers, conversational apps (such as chatbots) are among the most visible examples of machine learning in action. For developers, building a conversational app is instructive for understanding the value that machine-learning APIs bring to the process of creating completely new user experiences.
In this three-part post, we will show you how to build an example tour guide app for Apple iOS that can see, listen, talk and translate via API.AI (a developer platform for creating conversational experiences) and Google Cloud ML APIs for Speech, Vision and Translate. You will also see how easy it is to support multiple languages on these platforms.
The three parts will focus on the following topics:
Part 1
- Overview
- Architecture
- API.AI intents
- API.AI contexts
- API.AI webhook with Cloud Functions
- Cloud Vision API
- Cloud Speech API
- Cloud Translation API
- Support for multiple languages
- Support the Google Assistant through Actions on Google integration
Architecture
Using API.AI
API.AI is a platform for building natural and rich conversational experiences. For our example, it will handle all core conversation flows in the tour guide app. (Note that API.AI provides great documentation and a sample app for its iOS SDK. SDKs for other platforms are also available, so you could easily extend this tour guide app to support Android.)
Create agent
The first step is to create a Tour Guide Agent.
Create intents
To engage users in a conversation, we first need to understand what users are saying to the agent, and we do that with intents and entities. Intents map what your users say to what your conversational experience should do. Entities extract parameter values from user queries.
Each intent contains a set of examples of user input and the desired automated response. To do that, you need to predict what users will say to open the conversation, and then enter those phrases in the "Add user expression" box. This list doesn't need to be comprehensive. API.AI uses Machine Learning to train the agent to understand more variations of these examples. Later on, you can train the API.AI agent to understand more variations. For example, go to the Default Welcome Intent and add some user expressions "how are you", "hello", "hi" to open the conversation.
The next step after that is to add some more text responses.
Next, it's time to work on contexts.
Contexts
Contexts represent the current context of a user's request. They are helpful for differentiating phrases that may be vague or have different meanings depending on the user's preferences or geographic location, the current page in an app, or the topic of conversation. Let's look at an example.
User: Where am I?
Bot: Please upload a nearby picture and I can help find out where you are.
[User uploads a picture of Golden Gate Bridge.]
Bot: You are near Golden Gate Bridge.
User: How much is the ticket?
Bot: Golden Gate Bridge is free to visit.
User: When does it close today?
Bot: Golden Gate Bridge is open 24 hours a day, 7 days a week.
User: How do I get there?
[Bot shows a map to Golden Gate Bridge.]
In the above conversation, when the user asks "How much is the ticket?" and "When does it close today?" or "How do I get there?", the bot understands that the context is around Golden Gate Bridge.
The next thing to do is to weave intents and contexts together. For our example, each box in the diagram below is an intent and a context; the arrows indicate the relationships between them.
Output contexts
Contexts are tied to user sessions (a session ID that you pass in API calls). If a user expression is matched to an intent, the intent can then set an output
context to be shared by this expression in the future. You can also add a context when you send the user request to your API.AI agent. In our example, the
where
intent sets the where
output context so that location intent will be matched in the future.
Input contexts
Input contexts limit intents to be matched only when certain contexts are set. In our example, location's input context is set to where. The location
intent is matched only when we are under where
context.
Here are the steps to generate these intents and contexts:
Create
where
intent and addwhere
output context. This is the root in the context tree and has no input context.Create
location
intent. 1. Addwhere
input context. 1. Resetwhere
output context and addlocation
output context.Note: In our tour guide app, the input context of
location
iswhere
. When thelocation
intent is detected, thewhere
context needs to be reset so that any subsequent conversation won't trigger this context again. This is done by setting the lifespan of the output contextwhere
to 0 request. By default, a context has a lifespan of 5 requests or 10 minutes.Create
ticket
intent.- Add
location
input context. - Add
location
outputcontext
so thathours
andmap
intents can continue to use thelocation
context as input context.
- Add
You can pass the parameter from the input context with the format of #context.parameter
; e.g., pass the location string from intent inquiry-where-location
to inquiry.where.location.ticket
in the format #inquiry-where-location.location.
Finally, create hours
and map
intents similar to ticket
intent.
Next time
In Part 2, we'll cover how to use Webhook integrations in API.AI to pass information from a matched intent into a Cloud Functions web service and then get a result. Finally, we'll cover how to integrate Cloud Vision/Speech/Translation API, including support for Chinese language.
In Part 3, we'll cover how to support the Google Assistant via Actions on Google integration.
You can download the source code from GitHub.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.