Custom Voice Overview

The Cloud Text-to-Speech now offers the Custom Voice feature. Custom Voice allows you to train a custom voice model using your own studio-quality audio recordings to create a unique voice. You can use your custom voice to synthesize audio using the Cloud Text-to-Speech API. Currently only American English (en-US), Australian English (en-AU), and American Spanish (es-US) are supported.

To request access to the Custom Voice feature, please fill out this form.

Sample Custom Voices

You can hear examples of custom voices by listening to the following examples. The first audio example is the original voice. Then you can listen to two custom voice examples based on the original.

Female - Original voice Male - Original voice
Female - Custom Voice example #1 Male - Custom Voice example #1
Female - Custom Voice example #2 Male - Custom Voice example #2

User-supplied training audio data

Custom Voice delivers a Text-to-Speech (TTS) model that sounds as similar to your supplied audio data as possible. Google will send you a script for the voice recordings after your use case is approved. We suggest that you find and work with a voice actor who represents the custom voice you're aiming for. You need to record studio-quality audio with your voice actor to use as training data. If your training data doesn't pass Google's internal verification and validation check, you might need to re-record or re-submit the data after fixing the identified issues.

Model training

It takes Google several weeks to train and evaluate your custom voice model. There is no SLA support for critical bugs for Beta features.

Evaluation and user acceptance tests

Google conducts an initial round of evaluation of the trained model. Once it passes our internal quality criteria, we will send you some offline audio samples recorded using your custom model. You will then follow a user acceptance testing process to evaluate the audio results and officially sign off on the model.