Method: text.synthesize

Synthesizes speech synchronously: receive results after all text input has been processed.

HTTP request

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

The URL uses gRPC Transcoding syntax.

Request body

The request body contains data with the following structure:

JSON representation
{
  "input": {
    object (SynthesisInput)
  },
  "voice": {
    object (VoiceSelectionParams)
  },
  "audioConfig": {
    object (AudioConfig)
  },
  "enableTimePointing": [
    enum (TimepointType)
  ]
}
Fields
input

object (SynthesisInput)

Required. The Synthesizer requires either plain text or SSML as input.

voice

object (VoiceSelectionParams)

Required. The desired voice of the synthesized audio.

audioConfig

object (AudioConfig)

Required. The configuration of the synthesized audio.

enableTimePointing[]

enum (TimepointType)

Whether and what timepoints are returned in the response.

Response body

The message returned to the client by the text.synthesize method.

If successful, the response body contains data with the following structure:

JSON representation
{
  "audioContent": string,
  "timepoints": [
    {
      object (Timepoint)
    }
  ],
  "audioConfig": {
    object (AudioConfig)
  }
}
Fields
audioContent

string (bytes format)

The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

A base64-encoded string.

timepoints[]

object (Timepoint)

A link between a position in the original request input and a corresponding time in the output audio. It's only supported via <mark> of SSML input.

audioConfig

object (AudioConfig)

The audio metadata of audioContent.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

TimepointType

The type of timepoint information that is returned in the response.

Enums
TIMEPOINT_TYPE_UNSPECIFIED Not specified. No timepoint information will be returned.
SSML_MARK Timepoint information of <mark> tags in SSML input will be returned.

Timepoint

This contains a mapping between a certain point in the input text and a corresponding time in the output audio.

JSON representation
{
  "markName": string,
  "timeSeconds": number
}
Fields
markName

string

Timepoint name as received from the client within <mark> tag.

timeSeconds

number

Time offset in seconds from the start of the synthesized audio.