Create audio from text by using the command line

This document walks you through the process of making a request to Text-to-Speech using the command line. To learn more about the fundamental concepts in Text-to-Speech, read Text-to-Speech Basics.

Before you begin

Before you can send a request to the Text-to-Speech API, you must have completed the following actions. See the before you begin page for details.

  • Enable Text-to-Speech on a GCP project.
  • Make sure billing is enabled for Text-to-Speech.
  • Install the Google Cloud CLI, then initialize it by running the following command:

    gcloud init

Synthesize audio from text

You can convert text to audio by making an HTTP POST request to the https://texttospeech.googleapis.com/v1/text:synthesize endpoint. In the body of your POST command, specify the type of voice to synthesize in the voice configuration section, specify the text to synthesize in the text field of the input section, and specify the type of audio to create in the audioConfig section.

  1. Execute the REST request below at the command line to synthesize audio from text using Text-to-Speech. The command uses the gcloud auth application-default print-access-token command to retrieve an authorization token for the request.

    Before using any of the request data, make the following replacements:

    • PROJECT_ID: the alphanumeric ID of your Google Cloud project.

    HTTP method and URL:

    POST https://texttospeech.googleapis.com/v1/text:synthesize

    Request JSON body:

    {
      "input": {
        "text": "Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets."
      },
      "voice": {
        "languageCode": "en-gb",
        "name": "en-GB-Standard-A",
        "ssmlGender": "FEMALE"
      },
      "audioConfig": {
        "audioEncoding": "MP3"
      }
    }
    

    To send your request, expand one of these options:

    You should receive a JSON response similar to the following:

    {
      "audioContent": "//NExAASCCIIAAhEAGAAEMW4kAYPnwwIKw/BBTpwTvB+IAxIfghUfW.."
    }
    

  2. The JSON output for the REST command contains the synthesized audio in base64-encoded format. Copy the contents of the audioContent field into a new file named synthesize-output-base64.txt. Your new file will look something like the following:

    //NExAARqoIIAAhEuWAAAGNmBGMY4EBcxvABAXBPmPIAF//yAuh9Tn5CEap3/o
    ...
    VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
    
  3. Decode the contents of the synthesize-output-base64.txt file into a new file named synthesized-audio.mp3. For information on decoding base64, see Decoding Base64-Encoded Audio Content.

    Linux

    1. Copy only the base-64 encoded content into a text file.

    2. Decode the source text file using the base64 command line tool by using the -d flag:

        $ base64 SOURCE_BASE64_TEXT_FILE -d > DESTINATION_AUDIO_FILE
    

    Mac OSX

    1. Copy only the base-64 encoded content into a text file.

    2. Decode the source text file using the base64 command line tool:

        $ base64 --decode SOURCE_BASE64_TEXT_FILE > DESTINATION_AUDIO_FILE
    

    Windows

    1. Copy only the base-64 encoded content into a text file.

    2. Decode the source text file using the certutil command.

       certutil -decode SOURCE_BASE64_TEXT_FILE DESTINATION_AUDIO_FILE
    
  4. Play the contents of synthesized-audio.mp3 in an audio application or on an audio device. You can also open the synthesized-audio.mp3 in the Chrome browser to play the audio by navigating to the folder that contains the file, for example file://my_file_path/synthesized-audio.mp3

Clean up

To avoid unnecessary Google Cloud Platform charges, use the Google Cloud console to delete your project if you do not need it.

What's next

  • Learn more about Cloud Text-to-Speech by reading the basics.
  • Review the list of available voices you can use for synthetic speech.