Text-to-Speech documentation

Custom Voice

The Cloud Text-to-Speech API now offers Custom Voices. This feature allows you to train a custom voice model using your own studio-quality audio recordings to create a unique voice. You can use your custom voice to synthesize audio using the Cloud Text-to-Speech API.

To implement Custom Voice, please contact a member of the sales team

Sample Custom Voices

You can hear examples of custom voices by listening to the following examples. The first audio example is the original voice. Then you can listen to two custom voice examples based on the original.

Female - Original voice Male - Original voice
Female - Custom Voice example #1 Male - Custom Voice example #1
Female - Custom Voice example #2 Male - Custom Voice example #2

User-supplied training audio data

Custom Voice delivers a Text-to-Speech (TTS) model that sounds as similar to your supplied audio data as possible. Google will send you a script for the voice recordings after your use case is approved. We suggest that you find and work with a voice actor who represents the custom voice you're aiming for. You need to record studio-quality audio with your voice actor to use as training data. If your training data doesn't pass Google's internal verification and validation check, you might need to re-record or re-submit the data after fixing the identified issues.

Model training

It takes Google several weeks to train and evaluate your custom voice model. There is no SLA support for critical bugs for Beta features.

Evaluation and user acceptance tests

Google conducts an initial round of evaluation of the trained model. Once it passes our internal quality criteria, we will send you some offline audio samples recorded using your custom model. You will then follow a user acceptance testing process to evaluate the audio results and officially sign off on the model.