Google Cloud Platform

How to build a conversational app using Cloud Machine Learning APIs, Part 2

In part 1 of this blogpost, we gave you an overview of what a conversational tour guide iOS app might look like built on Cloud Machine Learning APIs and API.AI. We also demonstrated how to create API.AI intents and contexts. In part 2, we’ll discuss an advanced API.AI topic — webhook with Cloud Functions. We’ll also show you how to use Cloud Machine Learning APIs (Vision, Speech and Translation) and how to support a second language.

Webhooks via Cloud Functions 

In API.AI, Webhook integrations allow you to pass information from a matched intent into a web service and get a result from it. Read on to learn how to request parade info from Cloud Functions.
  1. Go to console.cloud.google.com. Log in with your own account and create a new project. 
  2. Once you’ve created a new project, navigate to that project. 
  3. Enable the Cloud Functions API. 
  4. <img height="440" src="https://3.bp.blogspot.com/-WYNJR8zGM0s/WZPB80hH5hI/AAAAAAAAEOo/XsRVwsaSdHYYc938oZgkSWKqVkav-2jTwCLcBGAs/s640/conversational-app-8.png" width="640"/>


  5. Create a function. For the purposes of this guide, we’ll call the function “parades”. Select the “HTTP” trigger option, then select “inline” editor. 
  6. <img src="https://4.bp.blogspot.com/-atpnwaWeN6s/WZPCew3w7fI/AAAAAAAAEOw/2rUeB7TRjJY83JIMyMCJyfAKSJ3AfODewCLcBGAs/s1600/conversational-app-10.png"/>


  7. Don’t forget to specify the function to execute to “parades”.
    You’ll also need to create a “stage bucket”. Click on “browse” — you’ll see the browser, but no buckets will exist yet. 
  8. <img height="310" src="https://4.bp.blogspot.com/-kLivUAyI_fQ/WZPCthK5GbI/AAAAAAAAEO0/jSMnizgIS9893FWXYwmULZDWtd8WwQbjwCLcBGAs/s640/conversational-app-4.png" width="640"/>


  9. Click on the “+” button to create the bucket.
    • Specify a unique name for the bucket (you can use your project name, for instance), select “regional” storage and keep the default region (us-central1).
    • Click back on the “select” button in the previous window.
    • Click the “create” button to create the function.

    The function will be created and deployed: 
  10. <img height="238" src="https://2.bp.blogspot.com/-iMY0j2QqPbI/WZPC29jbZEI/AAAAAAAAEO4/P7Uqy2KCVp4fPQmcS8uh298w4GrGuSzYwCLcBGAs/s640/conversational-app-5.png" width="640"/>


  11. Click the “parades” function line. In the “source” tab, you’ll see the sources. 
Now it’s time to code our function! We’ll need two files: the “index.js” file will contain the JavaScript / Node.JS logic, and the “package.json” file contains the Node package definition, including the dependencies we’ll need in our function.

Here’s our package.json file. This is dependent on the actions-on-google NPM module to ease the integration with API.AI and the Actions on Google platform that allows you to extend the Google Assistant with your own extensions (usable from Google Home):

  {
  "name": "parades",
  "version": "0.0.1",
  "main": "index.js",
  "dependencies": {
    "actions-on-google": "^1.1.1"
  }
}

In the index.js file, here’s our code:

  const ApiAiApp = require('actions-on-google').ApiAiApp;
function parade(app) {
  app.ask(`Chinese New Year Parade in Chinatown from 6pm to 9pm.`);
}
exports.parades = function(request, response) {
    var app = new ApiAiApp({request: request, response: response});
    var actionMap = new Map();
    actionMap.set("inquiry.parades", parade);
    app.handleRequest(actionMap);
};

In the code snippets above:

  1. We require the actions-on-google NPM module. 
  2. We use the ask() method to let the assistant send a result back to the user. 
  3. We export a function where we’re using the actions-on-google module’s ApiAiApp class to handle the incoming request. 
  4. We create a map that maps “intents” from API.AI to a JavaScript function. 
  5. Then, we call the handleRequest() to handle the request. 
  6. Once done, don’t forget to click the “create” function button. It will deploy the function in the cloud. 
There's subtle difference between tell() and ask() APIs. tell() will end the conversation and close the mic, while ask() will not. This difference doesn’t matter for API.AI projects like the one we demonstrate here in part 1 and part 2 of this blogpost. When we integrate Actions on Google in part 3, we’ll explain this difference in more detail.

As shown below, the “testing” tab invokes your function, the “general” tab shows statistics and the “trigger” tab reveals the HTTP URL created for your function:

conversational-app-7sfxj.PNG

Your final step is to go to the API.AI console, then click the Fulfillment tab. Enable webhook and paste the URL above into the URL field.

conversational-app-353dd.PNG

With API.AI, we’ve built a chatbot that can converse with a human by text. Next, let’s give the bot “ears” to listen with Cloud Speech API, “eyes” to see with Cloud Vision API, a “mouth” to talk with the iOS text-to-speech SDK and “brains” for translating languages with Cloud Translation API.

Using Cloud Speech API 

Cloud Speech API includes an iOS sample app. It’s quite straightforward to integrate the gRPC non-streaming sample app into our chatbot app. You’ll need to acquire an API key from Google Cloud Console and replace this line in SpeechRecognitionService.m with your API key.

  #define API_KEY @"YOUR_API_KEY"

Landmark detection 

  NSDictionary *paramsDictionary =
  @{@"requests":@[
        @{@"image":
            @{@"content":binaryImageData},
          @"features":@[
              @{@"type":@"LANDMARK_DETECTION", @"maxResults":@1}]}]};

Follow this example to use Cloud Vision API on iOS. You’ll need to replace the label and face detection with landmark detection as shown below.

You can use the same API key you used for Cloud Speech API. 

Text to speech

iOS 7+ has a built-in text-to-speech SDK, AVSpeechSynthesizer. The code below is all you need to convert text to speech.

  #import 
AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:message];
AVSpeechSynthesizer *synthesizer = [[AVSpeechSynthesizer alloc] init];
[synthesizer speakUtterance:utterance];

Supporting multiple languages

Supporting additional languages in Cloud Speech API is a one-line change on the iOS client side. (Currently, there's no support for mixed languages.) For Chinese, replace this line in SpeechRecognitionService.m

  recognitionConfig.languageCode = @"en-US";

with

  recognitionConfig.languageCode = @"zh-Hans";

To support additional text-to-speech languages, add this line to the code:

  #import 
AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:message];
utterance.voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"zh-Hans"];
AVSpeechSynthesizer *synthesizer = [[AVSpeechSynthesizer alloc] init];
[synthesizer speakUtterance:utterance];

Both Cloud Speech API and Apple’s AVSpeechSynthesisVoice support BCP-47 language code.

Cloud Vision API landmark detection currently only supports English, so you’ll need to use the Cloud Translation API to translate to your desired language after receiving the English-language landmark description. (You would use Cloud Translation API similarly to Cloud Vision and Speech APIs.)

On the API.AI side, you’ll need to create a new agent and set its language to Chinese. One agent can support only one language. If you try to use the same agent for a second language, machine learning won’t work for that language.

conversational-app-6ejj8.PNG

You’ll also need to create all intents and entities in Chinese.

conversational-app-12ak5s.PNG

And you’re done!

You’ve just built a simple “tour guide” chatbot that supports English and Chinese.

Next time 

We hope this example has demonstrated how simple it is to build an app powered by machine learning. For more getting-started info, you might also want to try:
You can download the source code from Github.

In part 3, we’ll cover how to build this app on Google Assistant with Actions on Google integration.