Edit on GitHub
Report issue
Page history

How to build a conversational app using Cloud Machine Learning APIs Part 2 of 3

Author(s): @PokerChang ,   Published: 2018-06-19


Author: Chang Luo

In Part 1 of this series, we gave you an overview of what a conversational tour guide iOS app might look like built on Cloud Machine Learning APIs and API.AI. We also demonstrated how to create API.AI intents and contexts. In part 2, we'll discuss an advanced API.AI topic — webhook with Cloud Functions. We'll also show you how to use Cloud Machine Learning APIs (Vision, Speech, and Translation) and how to support a second language.

Webhooks via Cloud Functions

In API.AI, webhook integrations allow you to pass information from a matched intent into a web service and get a result from it. Read on to learn how to request parade info from Cloud Functions.

  1. Go to Google Cloud Platform Console. Log in with your own account and create a new project.

  2. Once you've created a new project, navigate to that project.

  3. Enable the Cloud Functions API. alt_text

  4. Create a function. For the purposes of this guide, we'll call the function "parades".

  5. Select the "HTTP" trigger option, then select "inline" editor. alt_text Don't forget to specify the function to execute to "parades".

  6. You'll also need to create a "stage bucket". Click on "browse" — you'll see the browser, but no buckets will exist yet. alt_text

    1. Click on the "+" button to create the bucket.
    2. Specify a unique name for the bucket (you can use your project name, for instance), select "regional" storage, and keep the default region (us-central1).
    3. Click back on the "select" button in the previous window.
    4. Click the "create" button to create the function. The function will be created and deployed: alt_text
  7. Click the "parades" function line. In the Source tab, you'll see the sources.

Now it's time to code our function! We'll need two files: the index.js file will contain the JavaScript / Node.JS logic, and the package.json file contains the Node package definition, including the dependencies we'll need in our function.

Here's our package.json file. This is dependent on the actions-on-google NPM module to ease the integration with API.AI and the Actions on Google platform that allows you to extend the Google Assistant with your own extensions (usable from Google Home):

{
  "name": "parades",
  "version": "0.0.1",
  "main": "index.js",
  "dependencies": {
    "actions-on-google": "^1.1.1"
  }
}

In the index.js file, here's our code:

const ApiAiApp = require('actions-on-google').ApiAiApp;
function parade(app) {
  app.ask(`Chinese New Year Parade in Chinatown from 6pm to 9pm.`);
}
exports.parades = function(request, response) {
    var app = new ApiAiApp({request: request, response: response});
    var actionMap = new Map();
    actionMap.set("inquiry.parades", parade);
    app.handleRequest(actionMap);
};

In the code snippet above:

  1. We require the actions-on-google NPM module.
  2. We use the ask() method to let the assistant send a result back to the user.
  3. We export a function where we're using the actions-on-google module's ApiAiApp class to handle the incoming request.
  4. We create a map that maps "intents" from API.AI to a JavaScript function.
  5. Then, we call the handleRequest() to handle the request.
  6. Once done, don't forget to click the "create" function button. It will deploy the function in the cloud.

There is a subtle difference between tell() and ask() APIs. tell() will end the conversation and close the mic, while ask() will not. This difference doesn't matter for API.AI projects like the one we demonstrate here in Part 1 and Part 2 of this series. When we integrate Actions on Google in Part 3, we'll explain this difference in more detail.

As shown below, the Testing tab invokes your function, the General tab shows statistics, and the Trigger tab reveals the HTTP URL created for your function:

alt_text

Your final step is to go to the API.AI console, then click the Fulfillment tab. Enable webhook and paste the URL above into the URL field.

alt_text

With API.AI, we've built a chatbot that can converse with a human by text. Next, let's give the bot "ears" to listen with Cloud Speech API, "eyes" to see with Cloud Vision API, a "mouth" to talk with the iOS text-to-speech SDK, and "brains" for translating languages with Cloud Translation API.

Using Cloud Speech API

Cloud Speech API includes an iOS sample app. It's quite straightforward to integrate the gRPC non-streaming sample app into our chatbot app. You'll need to acquire an API key from Google Cloud Console and replace this line in SpeechRecognitionService.m with your API key.

#define API_KEY @"YOUR_API_KEY"

Landmark Detection

Follow this example to use Cloud Vision API on iOS. You'll need to replace the label and face detection with landmark detection as shown below.

NSDictionary *paramsDictionary =
@{@"requests":@[
      @{@"image":
          @{@"content":binaryImageData},
        @"features":@[
            @{@"type":@"LANDMARK_DETECTION", @"maxResults":@1}]}]};

You can use the same API key you used for Cloud Speech API.

Text to Speech

iOS 7+ has a built-in text-to-speech SDK, AVSpeechSynthesizer. The code below is all you need to convert text to speech.

#import <AVFoundation/AVFoundation.h>
AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:message];
AVSpeechSynthesizer *synthesizer = [[AVSpeechSynthesizer alloc] init];
[synthesizer speakUtterance:utterance];

Supporting Multiple Languages

Chinese Demo

Supporting additional languages in Cloud Speech API is a one-line change on the iOS client side. (Currently, there is no support for mixed languages.) For Chinese, replace this line in SpeechRecognitionService.m:

recognitionConfig.languageCode = @"en-US";

with

recognitionConfig.languageCode = @"zh-Hans";

To support additional text-to-speech languages, add this line to the code:

#import <AVFoundation/AVFoundation.h>
AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:message];
utterance.voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"zh-Hans"];
AVSpeechSynthesizer *synthesizer = [[AVSpeechSynthesizer alloc] init];
[synthesizer speakUtterance:utterance];

Both Cloud Speech API and Apple's AVSpeechSynthesisVoice support BCP-47 language code.

Cloud Vision API landmark detection currently only supports English, so you'll need to use the Cloud Translation API to translate to your desired language after receiving the English-language landmark description. (You would use Cloud Translation API similarly to Cloud Vision and Speech APIs.)

On the API.AI side, you'll need to create a new agent and set its language to Chinese. One agent can support only one language. If you try to use the same agent for a second language, machine learning won't work for that language.

alt_text

You'll also need to create all intents and entities in Chinese.

alt_text


And you're done! You've just built a simple "tour guide" chatbot that supports English and Chinese.

Next time

We hope this example has demonstrated how simple it is to build an app powered by machine learning. For more getting-started info, you might also want to try:

You can download the source code from Github.

In Part 3, we'll cover how to build this app on the Google Assistant with Actions on Google integration.

Submit a Tutorial

Share step-by-step guides

SUBMIT A TUTORIAL

Request a Tutorial

Ask for community help

SUBMIT A REQUEST

GCP Tutorials

Tutorials published by GCP

VIEW TUTORIALS

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.