Audio data is binary data. You can read the binary data directly from a gRPC response; however, JSON is used when responding to a REST request. Because JSON is a text format that does not directly support binary data, Text-to-Speech returns a response string encoded in Base64. You must convert the base64-encoded text data from the response to binary before you can play it on a device.
JSON responses from the Text-to-Speech include base64-encoded audio
content in the audioContent
field. For example:
{ "audioContent": "//NExAARqoIIAAhEuWAAAGNmBGMY4EBcxvABAXBPmPIAF//yAuh9Tn5CEap3/o..." }
To decode base64 into an audio file:
Linux
Copy only the base-64 encoded content into a text file.
Decode the source text file using the base64 command line tool by using the
-d
flag:
$ base64 SOURCE_BASE64_TEXT_FILE -d > DESTINATION_AUDIO_FILE
Mac OSX
Copy only the base-64 encoded content into a text file.
Decode the source text file using the base64 command line tool:
$ base64 --decode -i SOURCE_BASE64_TEXT_FILE > DESTINATION_AUDIO_FILE
Windows
Copy only the base-64 encoded content into a text file.
Decode the source text file using the
certutil
command.
certutil -decode SOURCE_BASE64_TEXT_FILE DESTINATION_AUDIO_FILE