Text-to-Speech allows you to convert words and sentences into base64 encoded audio data of natural human speech. You can then convert the audio data into a playable audio file like an MP3 by decoding the base64 data. The Text-to-Speech API accepts input as raw text or Speech Synthesis Markup Language (SSML).
This document describes how to create an audio file from either text or SSML input using Text-to-Speech. You can also review the Text-to-Speech basics article if you are unfamiliar with concepts like speech synthesis or SSML.
These samples require that you have installed and initialized the Google Cloud CLI. For information about setting up the gcloud CLI, see Authenticate to TTS.
Convert text to synthetic voice audio
The following code samples demonstrate how to convert a string into audio data.
You can configure the output of speech synthesis in a variety of ways, including selecting a unique voice or modulating the output in pitch, volume, speaking rate, and sample rate.
Protocol
Refer to the text:synthesize
API endpoint for complete details.
To synthesize audio from text, make an HTTP POST request to the
text:synthesize
endpoint. In the body of your POST request,
specify the type of voice to synthesize in the voice
configuration section,
specify the text to synthesize in the text
field of the input
section, and
specify the type of audio to create in the audioConfig
section.
The following code snippet sends a synthesis request to the
text:synthesize
endpoint and saves the results to a file
named synthesize-text.txt
. Replace PROJECT_ID
with
your project ID.
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "x-goog-user-project: <var>PROJECT_ID</var>" \ -H "Content-Type: application/json; charset=utf-8" \ --data "{ 'input':{ 'text':'Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets.' }, 'voice':{ 'languageCode':'en-gb', 'name':'en-GB-Standard-A', 'ssmlGender':'FEMALE' }, 'audioConfig':{ 'audioEncoding':'MP3' } }" "https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-text.txt
The Text-to-Speech API returns the synthesized audio as base64-encoded data contained
in the JSON output. The JSON output in the synthesize-text.txt
file looks
similar to the following code snippet.
{ "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.." }
To decode the results from the Text-to-Speech API as an MP3 audio file, run the
following command from the same directory as the synthesize-text.txt
file.
cat synthesize-text.txt | grep 'audioContent' | \ sed 's|audioContent| |' | tr -d '\n ":{},' > tmp.txt && \ base64 tmp.txt --decode > synthesize-text-audio.mp3 && \ rm tmp.txt
Go
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Go API reference documentation.
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Java API reference documentation.
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Node.js API reference documentation.
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Python API reference documentation.
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for Ruby.
Convert SSML to synthetic voice audio
Using SSML in your audio synthesis request can produce audio that is more similar to natural human speech. Specifically, SSML gives you finer-grain control over how the audio output represents pauses in the speech or how the audio pronounces dates, times, acronyms, and abbreviations.
For more details on the SSML elements supported by Text-to-Speech API, see the SSML reference.
Protocol
Refer to the text:synthesize
API endpoint for complete details.
To synthesize audio from SSML, make an HTTP POST request to the
text:synthesize
endpoint. In
the body of your POST request, specify the type of voice to synthesize in
the voice
configuration section, specify the SSML to synthesize in the
ssml
field of the input
section, and specify the type of audio to create
in the audioConfig
section.
The following code snippet sends a synthesis request to the
text:synthesize
endpoint and saves the results to a file
named synthesize-ssml.txt
. Replace PROJECT_ID
with
your project ID.
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "x-goog-user-project: <var>PROJECT_ID</var>" \ -H "Content-Type: application/json; charset=utf-8" --data "{ 'input':{ 'ssml':'<speak>The <say-as interpret-as=\"characters\">SSML</say-as> standard is defined by the <sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>' }, 'voice':{ 'languageCode':'en-us', 'name':'en-US-Standard-B', 'ssmlGender':'MALE' }, 'audioConfig':{ 'audioEncoding':'MP3' } }" "https://texttospeech.googleapis.com/v1/text:synthesize" > synthesize-ssml.txt
The Text-to-Speech API returns the synthesized audio as base64-encoded data contained
in the JSON output. The JSON output in the synthesize-ssml.txt
file looks
similar to the following code snippet.
{ "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.." }
To decode the results from the Text-to-Speech API as an MP3 audio file, run the
following command from the same directory as the synthesize-ssml.txt
file.
cat synthesize-ssml.txt | grep 'audioContent' | \ sed 's|audioContent| |' | tr -d '\n ":{},' > tmp.txt && \ base64 tmp.txt --decode > synthesize-ssml-audio.mp3 && \ rm tmp.txt
Go
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Go API reference documentation.
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Java API reference documentation.
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Node.js API reference documentation.
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Text-to-Speech, see Text-to-Speech client libraries. For more information, see the Text-to-Speech Python API reference documentation.
To authenticate to Text-to-Speech, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Text-to-Speech reference documentation for Ruby.