SynthesisInput

Contains text input to be synthesized. Either text or ssml must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT. The input size is limited to 5000 bytes.

JSON representation
{
  "customPronunciations": {
    object (CustomPronunciations)
  },

  // Union field input_source can be only one of the following:
  "text": string,
  "ssml": string,
  "multiSpeakerMarkup": {
    object (MultiSpeakerMarkup)
  }
  // End of list of possible types for union field input_source.
}
Fields
customPronunciations

object (CustomPronunciations)

Optional. The pronunciation customizations are applied to the input. If this is set, the input is synthesized using the given pronunciation customizations.

The initial support is for English, French, Italian, German, and Spanish (EFIGS) languages, as provided in

VoiceSelectionParams

. Journey and Instant Clone voices aren't supported.

In order to customize the pronunciation of a phrase, there must be an exact match of the phrase in the input types. If using SSML, the phrase must not be inside a phoneme tag.

Union field input_source. The input source, which is either plain text or SSML. input_source can be only one of the following:
text

string

The raw text to be synthesized.

ssml

string

The SSML document to be synthesized. The SSML document must be valid and well-formed. Otherwise the RPC will fail and return google.rpc.Code.INVALID_ARGUMENT. For more information, see SSML.

multiSpeakerMarkup

object (MultiSpeakerMarkup)

The multi-speaker input to be synthesized. Only applicable for multi-speaker synthesis.

MultiSpeakerMarkup

A collection of turns for multi-speaker synthesis.

JSON representation
{
  "turns": [
    {
      object (Turn)
    }
  ]
}
Fields
turns[]

object (Turn)

Required. Speaker turns.

Turn

A multi-speaker turn.

JSON representation
{
  "speaker": string,
  "text": string
}
Fields
speaker

string

Required. The speaker of the turn, for example, 'O' or 'Q'. Please refer to documentation for available speakers.

text

string

Required. The text to speak.

CustomPronunciations

A collection of pronunciation customizations.

JSON representation
{
  "pronunciations": [
    {
      object (CustomPronunciationParams)
    }
  ]
}
Fields
pronunciations[]

object (CustomPronunciationParams)

The pronunciation customizations are applied.

CustomPronunciationParams

Pronunciation customization for a phrase.

JSON representation
{
  "phrase": string,
  "phoneticEncoding": enum (PhoneticEncoding),
  "pronunciation": string
}
Fields
phrase

string

The phrase to which the customization is applied. The phrase can be multiple words, such as proper nouns, but shouldn't span the length of the sentence.

phoneticEncoding

enum (PhoneticEncoding)

The phonetic encoding of the phrase.

pronunciation

string

The pronunciation of the phrase. This must be in the phonetic encoding specified above.

PhoneticEncoding

The phonetic encoding of the phrase.

Enums
PHONETIC_ENCODING_UNSPECIFIED Not specified.
PHONETIC_ENCODING_IPA IPA, such as apple -> ˈæpəl. https://en.wikipedia.org/wiki/International_Phonetic_Alphabet
PHONETIC_ENCODING_X_SAMPA X-SAMPA, such as apple -> "{p@l". https://en.wikipedia.org/wiki/X-SAMPA