- HTTP request
- Request body
- Response body
- Authorization scopes
- TimepointType
- AdvancedVoiceOptions
- Timepoint
- Try it!
Synthesizes speech synchronously: receive results after all text input has been processed.
HTTP request
POST https://texttospeech.googleapis.com/v1beta1/text:synthesize
The URL uses gRPC Transcoding syntax.
Request body
The request body contains data with the following structure:
JSON representation |
---|
{ "input": { object ( |
Fields | |
---|---|
input |
Required. The Synthesizer requires either plain text or SSML as input. |
voice |
Required. The desired voice of the synthesized audio. |
audio |
Required. The configuration of the synthesized audio. |
enable |
Whether and what timepoints are returned in the response. |
advanced |
Advanced voice options. |
Response body
The message returned to the client by the text.synthesize
method.
If successful, the response body contains data with the following structure:
JSON representation |
---|
{ "audioContent": string, "timepoints": [ { object ( |
Fields | |
---|---|
audio |
The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64. A base64-encoded string. |
timepoints[] |
A link between a position in the original request input and a corresponding time in the output audio. It's only supported via |
audio |
The audio metadata of |
Authorization scopes
Requires the following OAuth scope:
https://www.googleapis.com/auth/cloud-platform
For more information, see the Authentication Overview.
TimepointType
The type of timepoint information that is returned in the response.
Enums | |
---|---|
TIMEPOINT_TYPE_UNSPECIFIED |
Not specified. No timepoint information will be returned. |
SSML_MARK |
Timepoint information of <mark> tags in SSML input will be returned. |
AdvancedVoiceOptions
Used for advanced voice options.
JSON representation |
---|
{ "lowLatencyJourneySynthesis": boolean } |
Fields | |
---|---|
low |
Only for Journey voices. If false, the synthesis is context aware and has a higher latency. |
Timepoint
This contains a mapping between a certain point in the input text and a corresponding time in the output audio.
JSON representation |
---|
{ "markName": string, "timeSeconds": number } |
Fields | |
---|---|
mark |
Timepoint name as received from the client within |
time |
Time offset in seconds from the start of the synthesized audio. |