Text-to-Speech generates audio with natural, human-like quality, which creates speech that sounds like a real person. To start, specify a voice when sending a synthesis request.
Text-to-Speech offers a variety of voices based on language, gender, and accent. Some languages have multiple options. For a full list, check the Supported Voices page. To select a voice, use the VoiceSelectionParams
field in your API request. Refer to the Quickstarts for instructions on making a synthesize
request.
Overview
Voice Type | Intended for | Launch stage | Controllability | Streaming | |
---|---|---|---|---|---|
Journey |
Conversational Agents
|
Preview | - | Yes | |
Studio | Two speakers group |
Media - Discussions and Interviews
|
Experimental | - | - |
One speaker person |
Media - Narration
|
GA | SSML | - | |
Neural2 |
General purpose
|
GA | SSML | - | |
Standard |
Cost efficient
|
GA | SSML | - |
Journey voices
Journey Voices, powered by the AudioLM engine, lets you create more engaging and empathetic speech for conversational applications. Through text streaming, Journey Voices produces low-latency real-time communication and supports the languages listed in the table of supported voices.
Chat experiences
Voice: en-US-Journey-F
Other examples
Virtual assistants
en-US-Journey-D
Customer service chatbots
en-US-Journey-F
Interactive education applications
en-US-Journey-O
Sales and pitches
en-US-Journey-D
Storytime
en-US-Journey-F
Studio multispeaker voices
Create discussions and interviews with the new multispeaker studio voices, based on the same technology behind Journey voices.
Studio voices
Studio voices are designed for news reading and broadcast content.
Example 1. The en-US-Studio-O
voice reading the Great Gatsby.
Neural2 voices
The Text-to-Speech API provides a voice tier called Neural2. Neural2 voices are based on the same technology used to create a Custom Voice. Neural2 allows anyone to use Custom Voice technology without training their own custom voice. They're available in global and single region endpoints.
Example 1. Neural2 voice
Standard voices
The voices offered by Text-to-Speech differ in how they are produced, the synthetic speech technology used to create the machine model of the voice. One common speech technology, parametric text-to-speech, typically generates audio data by passing outputs through signal processing algorithms known as vocoders. Many of the standard voices available in Text-to-Speech use a variation of this technology.