AI & Machine Learning

Google Cloud Text-to-Speech API now supports custom voices

March 4, 2022

Calum Barnes

Product Manager, Cloud Speech

With the rise of digital assistants and conversational interfaces, people have grown accustomed to hearing and speaking to synthetic voices. But how do these voices sound and how do they reflect on your brand? It's important for all companies to build a strong identity and brand association with their conversational AI systems and this starts with the synthetic voice.

That’s why we are excited to announce the general availability of Custom Voice in our Cloud Text-to-Speech (TTS) API, a new feature that lets you train custom voice models with your own audio recordings to create unique experiences.

For businesses looking to build a strong brand identity, establishing a unique voice can help turn mobile app interactions or customer service based on interactive voice responses (IVR) into differentiated customer experiences. Our TTS API has included a speech synthesis service with a static list of voices for some time, but now, with Custom Voice, moving beyond these predefined options is easier than ever.

Custom Voice lets you simply submit your audio recordings to get access to the new voice directly in the TTS API. Custom Voice TTS includes guidance on the audio requirements to help make sure you generate a high quality custom TTS voice model. Once this new model is trained, all you have to do to start using the newly trained voice is reference the model ID in your calls to the Cloud TTS API.

At Google, we are committed to building safe and accountable AI products, not only because it’s the right thing to do, but because it is a critical step in ensuring successful use in production. As part of Google Cloud’s Responsible AI governance process, we conducted a deep ethical evaluation of Custom Voice TTS, and its relation to synthetic media, in order to surface and mitigate potential harms that it may create. If you are interested in Custom Voice TTS, there is a review process to help ensure each use case is aligned with our AI Principles and adequate voice actor consent is given.

Additionally, to verify that voice actors are actually the ones producing the audio, you will need to submit an audio file producing a sentence that Google Cloud chooses (for example: “I agree that my voice will be used to create a synthetic custom Text-to-Speech voice).

We’re looking forward to seeing this API help businesses solve problems in an easy, fast, and scalable way. TTS Custom Voice is now GA in these languages:

English (US)
English (AU)
English (UK)
Spanish (US)
Spanish (Spain)
French (France)
French (Canada)
Italian (Italy)
German (Germany)
Portugues (Brazil)
Japanese (Japan)

We plan to continue expanding this lineup in order to meet your needs. Ready to try for yourself? Contact your seller to get started on your use case evaluation today!

AI & Machine Learning