How to build a conversational app using Cloud Machine Learning APIs, Part 2
In part 1 of this blogpost, we gave you an overview of what a conversational tour guide iOS app might look like built on Cloud Machine Learning APIs and API.AI. We also demonstrated how to create API.AI intents and contexts. In part 2, we’ll discuss an advanced API.AI topic — webhook with Cloud Functions. We’ll also show you how to use Cloud Machine Learning APIs (Vision, Speech and Translation) and how to support a second language.
Webhooks via Cloud FunctionsIn API.AI, Webhook integrations allow you to pass information from a matched intent into a web service and get a result from it. Read on to learn how to request parade info from Cloud Functions.
- Go to console.cloud.google.com. Log in with your own account and create a new project.
- Once you’ve created a new project, navigate to that project.
- Enable the Cloud Functions API.
- Create a function. For the purposes of this guide, we’ll call the function “parades”. Select the “HTTP” trigger option, then select “inline” editor.
- Don’t forget to specify the function to execute to “parades”.
You’ll also need to create a “stage bucket”. Click on “browse” — you’ll see the browser, but no buckets will exist yet.
- Click on the “+” button to create the bucket.
- Specify a unique name for the bucket (you can use your project name, for instance), select “regional” storage and keep the default region (us-central1).
- Click back on the “select” button in the previous window.
- Click the “create” button to create the function.
The function will be created and deployed:
- Click the “parades” function line. In the “source” tab, you’ll see the sources.
Here’s our package.json file. This is dependent on the actions-on-google NPM module to ease the integration with API.AI and the Actions on Google platform that allows you to extend the Google Assistant with your own extensions (usable from Google Home):
In the index.js file, here’s our code:
In the code snippets above:
- We require the actions-on-google NPM module.
- We use the
ask()method to let the assistant send a result back to the user.
- We export a function where we’re using the actions-on-google module’s
ApiAiAppclass to handle the incoming request.
- Then, we call the
handleRequest()to handle the request.
- Once done, don’t forget to click the “create” function button. It will deploy the function in the cloud.
tell()will end the conversation and close the mic, while
ask()will not. This difference doesn’t matter for API.AI projects like the one we demonstrate here in part 1 and part 2 of this blogpost. When we integrate Actions on Google in part 3, we’ll explain this difference in more detail.
As shown below, the “testing” tab invokes your function, the “general” tab shows statistics and the “trigger” tab reveals the HTTP URL created for your function:
Your final step is to go to the API.AI console, then click the Fulfillment tab. Enable webhook and paste the URL above into the URL field.
With API.AI, we’ve built a chatbot that can converse with a human by text. Next, let’s give the bot “ears” to listen with Cloud Speech API, “eyes” to see with Cloud Vision API, a “mouth” to talk with the iOS text-to-speech SDK and “brains” for translating languages with Cloud Translation API.
Using Cloud Speech APICloud Speech API includes an iOS sample app. It’s quite straightforward to integrate the gRPC non-streaming sample app into our chatbot app. You’ll need to acquire an API key from Google Cloud Console and replace this line in
SpeechRecognitionService.mwith your API key.
Follow this example to use Cloud Vision API on iOS. You’ll need to replace the label and face detection with landmark detection as shown below.
You can use the same API key you used for Cloud Speech API.
Text to speechiOS 7+ has a built-in text-to-speech SDK, AVSpeechSynthesizer. The code below is all you need to convert text to speech.
Supporting multiple languages
Supporting additional languages in Cloud Speech API is a one-line change on the iOS client side. (Currently, there's no support for mixed languages.) For Chinese, replace this line in SpeechRecognitionService.m:
To support additional text-to-speech languages, add this line to the code:
Cloud Vision API landmark detection currently only supports English, so you’ll need to use the Cloud Translation API to translate to your desired language after receiving the English-language landmark description. (You would use Cloud Translation API similarly to Cloud Vision and Speech APIs.)
On the API.AI side, you’ll need to create a new agent and set its language to Chinese. One agent can support only one language. If you try to use the same agent for a second language, machine learning won’t work for that language.
You’ll also need to create all intents and entities in Chinese.
And you’re done!
You’ve just built a simple “tour guide” chatbot that supports English and Chinese.
Next timeWe hope this example has demonstrated how simple it is to build an app powered by machine learning. For more getting-started info, you might also want to try:
- Cloud Speech API Quickstart
- Cloud Vision API Quickstart
- Cloud Translation API Quickstart
- API.AI Quickstart
In part 3, we’ll cover how to build this app on Google Assistant with Actions on Google integration.