How to build a conversational app using Cloud Machine Learning APIs Part 1 of 3
In this three-part post, we will show you how to build an example tour guide app for Apple iOS that can see, listen, talk and translate via API.AI (a developer platform for creating conversational experiences) and Google Cloud ML APIs for Speech, Vision and Translate. You will also see how easy it is to support multiple languages on these platforms.
The three parts will focus on the following topics:
- API.AI intents
- API.AI contexts
- API.AI webhook with Cloud Functions
- Cloud Vision API
- Cloud Speech API
- Cloud Translation API
- Support for multiple languages
- Support the Google Assistant via Actions on Google Integration
API.AI is a platform for building natural and rich conversational experiences. For our example, it will handle all core conversation flows in the tour guide app. (Note that API.AI provides great documentation and a sample app for its iOS SDK. SDKs for other platforms are also available, so you could easily extend this tour guide app to support Android.)
The first step is to create a Tour Guide Agent.
To engage users in a conversation, we first need to understand what users are saying to the agent, and we do that with intents and entities. Intents map what your users say to what your conversational experience should do. Entities extract parameter values from user queries.
Each intent contains a set of examples of user input and the desired automated response. To do that, you need to predict what users will say to open the conversation, and then enter those phrases in the "Add user expression" box. This list doesn't need to be comprehensive. API.AI uses Machine Learning to train the agent to understand more variations of these examples. Later on, you can train the API.AI agent to understand more variations. For example, go to the Default Welcome Intent and add some user expressions "how are you", "hello", "hi" to open the conversation.
The next step after that is to add some more text responses.
Next, it's time to work on contexts.
Contexts represent the current context of a user's request. They are helpful for differentiating phrases that may be vague or have different meanings depending on the user's preferences or geographic location, the current page in an app, or the topic of conversation. Let's look at an example.
User: Where am I?
Bot: Please upload a nearby picture and I can help find out where you are.
[User uploads a picture of Golden Gate Bridge.]
Bot: You are near Golden Gate Bridge.
User: How much is the ticket?
Bot: Golden Gate Bridge is free to visit.
User: When does it close today?
Bot: Golden Gate Bridge is open 24 hours a day, 7 days a week.
User: How do I get there?
[Bot shows a map to Golden Gate Bridge.]
In the above conversation, when the user asks "How much is the ticket?" and "When does it close today?" or "How do I get there?", the bot understands that the context is around Golden Gate Bridge.
The next thing to do is to weave intents and contexts together. For our example, each box in the diagram below is an intent and a context; the arrows indicate the relationships between them.
Contexts are tied to user sessions (a session ID that you pass in API calls). If a user expression is matched to an intent, the intent can then set an output context to be shared by this expression in the future. You can also add a context when you send the user request to your API.AI agent. In our example, the
where intent sets the
where output context so that location intent will be matched in the future.
Input contexts limit intents to be matched only when certain contexts are set. In our example, location's input context is set to where. The
location intent is matched only when we are under
Here are the steps to generate these intents and contexts:
whereintent and add
whereoutput context. This is the root in the context tree and has no input context.
whereoutput context and add
locationoutput context. Note: In our tour guide app, the input context of
where. When the
locationintent is detected, the
wherecontext needs to be reset so that any subsequent conversation won't trigger this context again. This is done by setting the lifespan of the output context
whereto 0 request. By default, a context has a lifespan of 5 requests or 10 minutes.
mapintents can continue to use the
locationcontext as input context.
You can pass the parameter from the input context with the format of
#context.parameter; e.g., pass the location string from intent
inquiry.where.location.ticket in the format
map intents similar to
In Part 2, we'll cover how to use Webhook integrations in API.AI to pass information from a matched intent into a Cloud Functions web service and then get a result. Finally, we'll cover how to integrate Cloud Vision/Speech/Translation API, including support for Chinese language.
In Part 3, we'll cover how to support the Google Assistant via Actions on Google integration.
You can download the source code from github.