Method: text.synthesize

HTTP request
Request body
- JSON representation
Response body
- JSON representation
Authorization scopes
TimepointType
AdvancedVoiceOptions
- JSON representation
Timepoint
- JSON representation
Try it!

Synthesizes speech synchronously: receive results after all text input has been processed.

HTTP request

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

Request body

The request body contains data with the following structure:

JSON representation

JSON representation
{ "input": { object (`SynthesisInput`) }, "voice": { object (`VoiceSelectionParams`) }, "audioConfig": { object (`AudioConfig`) }, "enableTimePointing": [ enum (`TimepointType`) ], "advancedVoiceOptions": { object (`AdvancedVoiceOptions`) } }

{
  "input": {
    object (SynthesisInput)
  },
  "voice": {
    object (VoiceSelectionParams)
  },
  "audioConfig": {
    object (AudioConfig)
  },
  "enableTimePointing": [
    enum (TimepointType)
  ],
  "advancedVoiceOptions": {
    object (AdvancedVoiceOptions)
  }
}

Fields
`input`	`object (SynthesisInput)` Required. The Synthesizer requires either plain text or SSML as input.
`voice`	`object (VoiceSelectionParams)` Required. The desired voice of the synthesized audio.
`audioConfig`	`object (AudioConfig)` Required. The configuration of the synthesized audio.
`enableTimePointing[]`	`enum (TimepointType)` Whether and what timepoints are returned in the response.
`advancedVoiceOptions`	`object (AdvancedVoiceOptions)` Advanced voice options.

Response body

The message returned to the client by the text.synthesize method.

If successful, the response body contains data with the following structure:

JSON representation
{ "audioContent": string, "timepoints": [ { object (`Timepoint`) } ], "audioConfig": { object (`AudioConfig`) } }

Fields

Fields
`audioContent`	`string (bytes format)` The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64. A base64-encoded string.
`timepoints[]`	`object (Timepoint)` A link between a position in the original request input and a corresponding time in the output audio. It's only supported via `<mark>` of SSML input.
`audioConfig`	`object (AudioConfig)` The audio metadata of `audioContent`.

audioContent

string (bytes format)

The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

A base64-encoded string.

timepoints[]

object (Timepoint)

A link between a position in the original request input and a corresponding time in the output audio. It's only supported via <mark> of SSML input.

audioConfig

object (AudioConfig)

The audio metadata of audioContent.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

TimepointType

The type of timepoint information that is returned in the response.

Enums
`TIMEPOINT_TYPE_UNSPECIFIED`	Not specified. No timepoint information will be returned.
`SSML_MARK`	Timepoint information of `<mark>` tags in SSML input will be returned.

AdvancedVoiceOptions

Used for advanced voice options.

JSON representation
{ "lowLatencyJourneySynthesis": boolean }

Fields

Fields
`lowLatencyJourneySynthesis`	`boolean` Only for Journey voices. If false, the synthesis is context aware and has a higher latency.

lowLatencyJourneySynthesis

boolean

Only for Journey voices. If false, the synthesis is context aware and has a higher latency.

Timepoint

This contains a mapping between a certain point in the input text and a corresponding time in the output audio.

JSON representation
{ "markName": string, "timeSeconds": number }

Fields

Fields
`markName`	`string` Timepoint name as received from the client within `<mark>` tag.
`timeSeconds`	`number` Time offset in seconds from the start of the synthesized audio.

markName

string

Timepoint name as received from the client within <mark> tag.

timeSeconds

number

Time offset in seconds from the start of the synthesized audio.