Method: text.synthesize

HTTP request
Request body
- JSON representation
Response body
- JSON representation
Authorization scopes
TimepointType
Timepoint
- JSON representation
Try it!

Synthesizes speech synchronously: receive results after all text input has been processed.

HTTP request

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

Request body

The request body contains data with the following structure:

JSON representation
{ "input": { object (`SynthesisInput`) }, "voice": { object (`VoiceSelectionParams`) }, "audioConfig": { object (`AudioConfig`) }, "enableTimePointing": [ enum (`TimepointType`) ] }

Fields
`input`	`object (SynthesisInput)` Required. The Synthesizer requires either plain text or SSML as input.
`voice`	`object (VoiceSelectionParams)` Required. The desired voice of the synthesized audio.
`audioConfig`	`object (AudioConfig)` Required. The configuration of the synthesized audio.
`enableTimePointing[]`	`enum (TimepointType)` Whether and what timepoints are returned in the response.

Response body

The message returned to the client by the text.synthesize method.

If successful, the response body contains data with the following structure:

JSON representation
{ "audioContent": string, "timepoints": [ { object (`Timepoint`) } ], "audioConfig": { object (`AudioConfig`) } }

Fields

Fields
`audioContent`	`string (bytes format)` The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64. A base64-encoded string.
`timepoints[]`	`object (Timepoint)` A link between a position in the original request input and a corresponding time in the output audio. It's only supported via `<mark>` of SSML input.
`audioConfig`	`object (AudioConfig)` The audio metadata of `audioContent`.

audioContent

string (bytes format)

The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

A base64-encoded string.

timepoints[]

object (Timepoint)

A link between a position in the original request input and a corresponding time in the output audio. It's only supported via <mark> of SSML input.

audioConfig

object (AudioConfig)

The audio metadata of audioContent.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

TimepointType

The type of timepoint information that is returned in the response.

Enums
`TIMEPOINT_TYPE_UNSPECIFIED`	Not specified. No timepoint information will be returned.
`SSML_MARK`	Timepoint information of `<mark>` tags in SSML input will be returned.

Timepoint

This contains a mapping between a certain point in the input text and a corresponding time in the output audio.

JSON representation
{ "markName": string, "timeSeconds": number }

Fields

Fields
`markName`	`string` Timepoint name as received from the client within `<mark>` tag.
`timeSeconds`	`number` Time offset in seconds from the start of the synthesized audio.

markName

string

Timepoint name as received from the client within <mark> tag.

timeSeconds

number

Time offset in seconds from the start of the synthesized audio.