Base64 Encoding Audio Content

When passing audio to the Speech API, you can either pass the URI of a file located on Google Cloud Storage, or you can embed audio data directly within the request's content field.

Embedding Base64 encoded audio

Audio data is binary data. Within a gRPC request, you can simply write the binary data out directly; however, JSON is used when making a REST request. JSON is a text format that does not directly support binary data, so you will need to convert such binary data into text using Base64 encoding.

To base64 encode an audio file:

Linux

  1. Encode the audio file using the base64 command line tool, making sure to prevent line-wrapping by using the -w 0 flag:

    $ base64 source_audio_file -w 0 > dest_audio_file

2. Create a JSON request file, inlining the base64-encoded audio within the request's content field:

    {
      "config": {
        "encoding":"FLAC",
        "sampleRateHertz":16000,
        "languageCode":"en-US"
      },
      "audio": {
        "content": "ZkxhQwAAACIQABAAAAUJABtAA+gA8AB+W8FZndQvQAyjv..."
      }
    }

Mac OSX

  1. Encode the audio file using the base64 command line tool:

    $ base64 source_audio_file > dest_audio_file

2. Create a JSON request file, inlining the base64-encoded audio within the request's content field:

    {
      "config": {
        "encoding":"FLAC",
        "sampleRateHertz":16000,
        "languageCode":"en-US"
      },
      "audio": {
        "content": "ZkxhQwAAACIQABAAAAUJABtAA+gA8AB+W8FZndQvQAyjv..."
      }
    }

Windows

  1. Encode the audio file using the Base64.exe tool:

    C:> Base64.exe -e source_audio_file > dest_audio_file

2. Create a JSON request file, inlining the base64-encoded audio within the request's content field:

    {
      "config": {
        "encoding":"FLAC",
        "sampleRateHertz":16000,
        "languageCode":"en-US"
       },
      "audio": {
        "content": "ZkxhQwAAACIQABAAAAUJABtAA+gA8AB+W8FZndQvQAyjv..."
      }
    }

Embedding audio content programmatically

Embedding audio binary data into requests through text editors is neither desirable or practical. In practice, you will be embedding base64 encoded files within client code. All supported programming languages have built-in mechanisms for base64-encoding content:

Python

In Python, base64 encode audio files as follows:

# Import the base64 encoding library.
import base64

# Pass the audio data to an encoding function.
def encode_audio(audio):
  audio_content = audio.read()
  return base64.b64encode(audio_content)

Node.js

In Node.js, base64 encode audio files as follows, where audioFile is the binary-encoded audio data:

// Imports the Google Cloud client library
const fs = require('fs');
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const filename = 'Local path to audio file, e.g. /path/to/audio.raw';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
};
const audio = {
  content: fs.readFileSync(filename).toString('base64'),
};

const request = {
  config: config,
  audio: audio,
};

// Detects speech in the audio file
client
  .recognize(request)
  .then(data => {
    const response = data[0];
    const transcription = response.results
      .map(result => result.alternatives[0].transcript)
      .join('\n');
    console.log(`Transcription: `, transcription);
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

Java

In Java, use the encodeBase64 static method within org.apache.commons.codec.binary.Base64 to base64 encode binary files:

// Import the Base64 encoding library.
import org.apache.commons.codec.binary.Base64;

// Encode the speech.
byte[] encodedAudio = Base64.encodeBase64(audio.getBytes());

Send feedback about...

Google Cloud Speech API