Contains text input to be synthesized. Either text
or ssml
must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT
. The input size is limited to 5000 bytes.
JSON representation |
---|
{ "customPronunciations": { object ( |
Fields | |
---|---|
customPronunciations |
Optional. The pronunciation customizations are applied to the input. If this is set, the input is synthesized using the given pronunciation customizations. The initial support is for en-us, with plans to expand to other locales in the future. Instant Clone voices aren't supported. In order to customize the pronunciation of a phrase, there must be an exact match of the phrase in the input types. If using SSML, the phrase must not be inside a phoneme tag. |
Union field input_source . The input source, which is either plain text or SSML. input_source can be only one of the following: |
|
text |
The raw text to be synthesized. |
markup |
Markup for HD voices specifically. This field may not be used with any other voices. |
ssml |
The SSML document to be synthesized. The SSML document must be valid and well-formed. Otherwise the RPC will fail and return |
multiSpeakerMarkup |
The multi-speaker input to be synthesized. Only applicable for multi-speaker synthesis. |
prompt |
This system instruction is supported only for controllable/promptable voice models. If this system instruction is used, we pass the unedited text to Gemini-TTS. Otherwise, a default system instruction is used. AI Studio calls this system instruction, Style Instructions. |
MultiSpeakerMarkup
A collection of turns for multi-speaker synthesis.
JSON representation |
---|
{
"turns": [
{
object ( |
Fields | |
---|---|
turns[] |
Required. Speaker turns. |
Turn
A multi-speaker turn.
JSON representation |
---|
{ "speaker": string, "text": string } |
Fields | |
---|---|
speaker |
Required. The speaker of the turn, for example, 'O' or 'Q'. Please refer to documentation for available speakers. |
text |
Required. The text to speak. |