Adding speech translation to your Android app

This tutorial shows how to provide a speech translation feature to your Android app. The sample in this tutorial uses a microservice that receives an audio message, translates the message to a set of predefined languages, and stores the translated messages in audio files. The client Android app downloads and plays the translated audio files at the user request.

Solution overview

The solution includes the following components:

Microservice

The microservice is implemented on Cloud Functions for Firebase and uses the following Cloud AI products to translate the messages:

The microservice stores translated audio messages in a bucket in Cloud Storage for Firebase.

Client app

The client component is an Android app that records audio messages and downloads the translated messages from the Cloud Storage bucket. The sample is a chat app used in the Build an Android App Using Firebase and the App Engine Flexible Environment tutorial. This tutorial explains how to extend the sample app to implement the speech translation feature.

The following diagram shows the interaction between the microservice and the client app:

Solution high-level architecture

The microservice performs the following tasks:

  1. Receives the audio message in the Base64 encoded format.
  2. Transcribes the audio message using the Cloud Speech-to-Text API.
  3. Translates the transcribed message using the Translation API.
  4. Synthesizes the translated message using the Text-to-Speech API.
  5. Stores the translated audio message in a Cloud Storage bucket.
  6. Sends the response back to the client. The response includes the locale of the translated audio message.

Microservice architecture

The client app performs the following tasks:

  1. Records the audio message using the Cloud Speech-to-Text API best practices for greater accuracy. The app uses the microphone on the device to capture the audio.
  2. Encodes the audio message in Base64 format.
  3. Sends an HTTP request to the microservice that includes the encoded audio message.
  4. Receives the HTTP response from the microservice, which includes the locale of the translated audio message.
  5. Sends a request to the Cloud Storage bucket to retrieve the file that includes the translated audio message.
  6. Plays the translated audio message.

Objectives

This tutorial demonstrates how to:

  • Use Cloud Functions for Firebase to build a microservice that encapsulates the logic required to translate audio messages using the following Cloud AI products:
    • Cloud Speech-to-Text API
    • Translation API
    • Cloud Text-to-Speech API
  • Use the Android Framework APIs to record audio following the recommendations on how to provide audio data to the Cloud Speech-to-Text API.
  • Use the Cronet Library to upload speech data from the client app to the microservice and to download translated messages from Cloud Storage. For more information about the Cronet Library, see Perform network operations using Cronet in the Android developer documentation.

Costs

This tutorial extends the sample app implemented in Build an Android App Using Firebase and the App Engine Flexible Environment. Review the costs section in the mentioned tutorial and consider the following additional costs:

  • Firebase defines quotas for Cloud Functions usage that specify resource, time, and rate limits. For more information, see Quotas and Limits in the Firebase documentation.
  • Cloud Speech-to-Text API usage is priced monthly based on the length of audio successfully processed. There's a predetermined amount of processing time that you can use free of charge every month. For more information, see Cloud Speech-to-Text API Pricing.
  • Translation API usage is priced monthly based on the amount of characters sent to the API for processing. For more information, see Translation API Pricing.
  • Text-to-Speech API usage is priced monthly based on the amount of characters to synthesize into audio. There's a number of characters that you can use free of charge monthly. For more information, see Text-to-Speech API Pricing.
  • Firebase Storage usage fees are processed as Google Cloud Storage fees. For more information, see Cloud Storage Pricing.

Before you begin

Complete the Build an Android App Using Firebase and the App Engine Flexible Environment tutorial and install the following software:

Obtain a hardware device running Android 7.0 (API level 24) or higher to test the speech translation feature.

Cloning the sample code

Use the following command to clone the nodejs-docs-samples repository, which includes the microservice code:

git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git

Enabling billing and APIs for the GCP project

This tutorial uses the Playchat project created in Build an Android App Using Firebase and the App Engine Flexible Environment, which requires the App Engine Admin API and Compute Engine API.

The microservice requires the following APIs to process speech translation requests:

  • Cloud Text-to-Speech API
  • Cloud Translation API
  • Cloud Speech-to-Text API

To enable the required APIs:

  1. In the GCP Console, select the Playchat project.

    Go to the Projects page

  2. Make sure that billing is enabled for your Google Cloud Platform project.

    Learn how to enable billing

  3. Enable the App Engine, Cloud Speech-to-Text, Translation, and Cloud Text-to-Speech APIs.

    Enable the APIs

Configuring the default bucket on Cloud Storage for Firebase

The microservice uses the default Cloud Storage bucket in the Firebase project to store the translated audio files. You must enable read access to the user accounts that want to retrieve the audio files.

To enable read access you need the Firebase user UID of the account. To retrieve the user UID:

  1. From the left menu of the Firebase console, select Authentication in the Develop group.
  2. Take note of the User UID value of the user account that you want to use to test the app. The user UID is a 28 characters long string.

To enable read access to the user account, you must create a storage security rule. To create a security rule:

  1. From the left menu of the Firebase console, select Storage in the Develop group.
  2. Take note of the default bucket URL, which is in the form gs://[FIREBASE_PROJECT_ID].appspot.com and appears next to a link icon. You need this value to deploy the microservice.
  3. In the Storage page, go to the Rules section and add the following rule inside the service firebase.storage section:

     match /b/{bucket}/o {
       match /{allPaths=**} {
         allow read: if request.auth.uid == "[ACCOUNT_USER_UID]";
       }
     }
    

    Replace ACCOUNT_USER_UID with the user UID value obtained in the previous steps.

For more information, see Get Started with Storage Security Rules in the Firebase documentation.

Building and deploying the microservice

To build the microservice, open a terminal window and go to the functions/speech-to-speech/functions folder in the nodejs-docs-samples repository that you cloned in the previous section.

The microservice code includes an .nvmrc file that declares the version of Node.js that you must use to run the app. Run the following command to set up NVM and install the microservice dependencies:

nvm install && nvm use && npm install

Sign in to Firebase using the command-line interface by running the following command:

firebase login

The microservice requires the following environment variables:

  • OUTPUT_BUCKET: The default Cloud Storage bucket in the Firebase project.
  • SUPPORTED_LANGUAGE_CODES: A comma-separated list of language codes that the microservice supports.

Use the following commands to declare the required environment data in the command-line interface. Replace the FIREBASE_PROJECT_ID placeholder with the value that you looked up in the previous section.

firebase functions:config:set playchat.output_bucket="gs://[FIREBASE_PROJECT_ID].appspot.com"
firebase functions:config:set playchat.supported_language_codes="en,es,fr"

Configuring the Android app

The Playchat sample app requires the microservice URL to enable speech translation features. To retrieve the microservice URL:

  1. From the left menu of the Firebase console, select Functions in the Develop group.
  2. The microservice URL is displayed in the Trigger column, which is in the form https://[REGION_ID]-[FIREBASE_PROJECT_ID].cloudfunctions.net/[FUNCTION_NAME].

To configure the app to work with the microservice, open the app/src/main/res/values/speech_translation.xml file in the firebase-android-client repo and update the speechToSpeechEndpoint field with the microservice URL.

Running the Android app

To use the speech translation feature in the app you must use a device that supports recording audio using the built-in microphone, such as a hardware device.

To use the speech translation feature in the app:

  1. Make sure that the hardware device uses one of the languages configured in the Building and deploying the microservice section. To change the language, open the Settings app on the device and go to System > Languages & input > Languages.
  2. Open the Playchat project in Android Studio and connect the hardware device to your computer using a USB cable. For more information, see Set up a device for development.
  3. Click Run in Android Studio to build and run the app on the device.
  4. On the Playchat app, tap the microphone icon to start recording, record a short message, and tap the microphone icon again to stop recording.
  5. After a few seconds, the Playchat app displays the text of the recorded message on screen. Tap the message to play the audio version.
  6. Configure the device to use a different supported language.
  7. The Playchat app displays the previously recorded message in the new supported language. Tap the message to play the audio version in the new language.

The following screenshot shows the Playchat app displaying a message translated to French:

Speech translation feature in Android

Exploring the code

The client app performs the following tasks to support the speech translation feature:

  1. Records audio using the recommended parameters described in the Cloud Speech-to-Text API best practices.
  2. Encodes the audio using the Base64 scheme to represent the audio in a string format that can be embedded in an HTTP request.
  3. Sends an HTTP request to the microservice. The request includes the encoded audio message along with metadata that provides additional information about the payload. The app uses the Cronet Library to manage the network requests.
  4. When the user wants to listen to the translated message, the app downloads the corresponding audio file by issuing an authenticated HTTP request to the Cloud Storage bucket that stores the translated messages.

The following code example shows the constants that the sample uses to specify the recording configuration parameters:

private static final int AUDIO_SOURCE = MediaRecorder.AudioSource.UNPROCESSED;
private static final int SAMPLE_RATE_IN_HZ = 16000;
private static final int CHANNEL_CONFIG = AudioFormat.CHANNEL_IN_MONO;
private static final int AUDIO_FORMAT = AudioFormat.ENCODING_PCM_16BIT;
  • AUDIO_SOURCE: MediaRecorder.AudioSource.UNPROCESSED indicates an unprocessed audio source because applying signal processing algorithms such as noise reduction or gain control reduces recognition accuracy.
  • SAMPLE_RATE_IN_HZ: The sample uses a value of 16,000 for the native sample rate of the audio source.
  • CHANNEL_CONFIG: AudioFormat.CHANNEL_IN_MONO indicates only one audio channel in the recording. The sample assumes only one person's voice in the recording.
  • AUDIO_FORMAT: AudioFormat.ENCODING_PCM_16BIT indicates the linear PCM audio data format using 16 bits per sample. Linear PCM is a lossless format, which is preferred for speech recognition.

The client app uses the AudioRecord API to record audio from the built-in microphone and stores a .WAV file on the device. For more information, see the RecordingHelper class of the Playchat sample.

To encode the audio using the Base64 scheme, the sample uses the Base64 class of the Android Framework. The encoded audio must not include line terminators, which are omitted by using the NO_WRAP flag. The following example shows how to encode audio using the Base64 class:

public static String encode(File inputFile) throws IOException {
    byte[] data = new byte[(int) inputFile.length()];
    DataInputStream input = new DataInputStream(new FileInputStream(inputFile));
    int readBytes = input.read(data);
    Log.i(TAG, readBytes + " read from input file.");
    input.close();
    return Base64.encodeToString(data, Base64.NO_WRAP);
}

To send the encoded audio to the microservice, the client app issues an HTTP request with the following parameters:

  • Method: POST
  • Content type: application/json
  • Body: JSON object with the following attributes:
    • encoding: The LINEAR16 string
    • sampleRateHertz: The sample rate of the recorded audio. For example, 16000.
    • languageCode: The language code of the recorded message. The client app assumes that the message is recorded in the language configured in the device settings. For example, en-US.
    • audioContent: The audio message encoded in the Base64 scheme.

The following example shows how to build a JSON object that includes the attributes required in the request body:

JSONObject requestBody = new JSONObject();
try {
    requestBody.put("encoding", SPEECH_TRANSLATE_ENCODING);
    requestBody.put("sampleRateHertz", sampleRateInHertz);
    requestBody.put("languageCode", context.getResources().getConfiguration().getLocales().get(0));
    requestBody.put("audioContent", base64EncodedAudioMessage);
} catch(JSONException e) {
    Log.e(TAG, e.getLocalizedMessage());
    translationListener.onTranslationFailed(e);
}

For more details about how to build the HTTP request, see the SpeechTranslationHelper class of the Playchat sample app.

To retrieve the audio files from the Cloud Storage bucket the app uses a download URL that includes a token that can be revoked from the Firebase Console, if desired. You can get the download URL calling the getDownloadUrl() method, as shown in the following example:

FirebaseStorage storage = FirebaseStorage.getInstance();
StorageReference gsReference = storage.getReferenceFromUrl(gcsUrl);
gsReference.getDownloadUrl().addOnCompleteListener(getDownloadUriListener);

The microservice performs the following task to support the speech translation feature:

  1. Receives speech translation requests, which include the Base64 encoded audio.
  2. Sends the encoded audio to the Cloud Speech-to-Text API and receives a transcription in the source language.
  3. For each of the supported languages, sends the transcription to the Translation API and receives the translated text.
  4. For each of the supported languages, sends the translated text to the Cloud Text-to-Speech API and receives the translated audio.
  5. Uploads the translated audio files to the Cloud Storage bucket.

The microservice uses the output of a call to a Cloud API as the input of the call to the next API, as shown in the following code example:

const sttResponse = data[0];
// The data object contains one or more recognition
// alternatives ordered by accuracy.
const transcription = sttResponse.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
responseBody.transcription = transcription;
responseBody.gcsBucket = outputBucket;

let translations = [];
supportedLanguageCodes.forEach(languageCode => {
  let translation = {languageCode: languageCode};
  const filenameUUID = uuid();
  const filename = filenameUUID + '.' + outputAudioEncoding.toLowerCase();
  callTextTranslation(languageCode, transcription)
    .then(data => {
      const textTranslation = data[0];
      translation.text = textTranslation;
      return callTextToSpeech(languageCode, textTranslation);
    })
    .then(data => {
      const path = languageCode + '/' + filename;
      return uploadToCloudStorage(path, data[0].audioContent);
    })
    .then(() => {
      console.log(`Successfully translated input to ${languageCode}.`);
      translation.gcsPath = languageCode + '/' + filename;
      translations.push(translation);
      if (translations.length === supportedLanguageCodes.length) {
        responseBody.translations = translations;
        console.log(`Response: ${JSON.stringify(responseBody)}`);
        response.status(200).send(responseBody);
      }
    })
    .catch(error => {
      console.error(
        `Partial error in translation to ${languageCode}: ${error}`
      );
      translation.error = error.message;
      translations.push(translation);
      if (translations.length === supportedLanguageCodes.length) {
        responseBody.translations = translations;
        console.log(`Response: ${JSON.stringify(responseBody)}`);
        response.status(200).send(responseBody);
      }
    });
});

Cleaning up

To avoid incurring charges to your GCP account for the resources used in this tutorial:

Delete the GCP and Firebase project

The simplest way to stop billing charges is to delete the project you created for this tutorial. Although you created the project in the Firebase Console, you can also delete it in the GCP console, since the Firebase and GCP projects are one and the same.

  1. In the GCP Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete non-default versions your App Engine app

If you don't want to delete your GCP and Firebase project, you can reduce costs by deleting the non-default versions of your App Engine flexible environment app.

  1. In the GCP Console, go to the Versions page for App Engine.

    Go to the Versions page

  2. Select the checkbox for the non-default app version you want to delete.
  3. Click Delete to delete the app version.

Was this page helpful? Let us know how we did:

Send feedback about...