The Cloud Text-to-Speech API now offers Custom Voice (Beta). This feature allows you to train a custom voice model using your own studio-quality audio recordings to create a unique voice. You can use your custom voice to synthesize audio using the Cloud Text-to-Speech API. Currently only American English (en-US) is supported.
To request access to the Custom Voice feature, please fill out this form.
User-supplied training audio data
Custom Voice delivers a Text-to-Speech (TTS) model that sounds as similar to your supplied audio data as possible. Google will send you a script for the voice recordings after your use case is approved. We suggest that you find and work with a voice actor who represents the custom voice you're aiming for. You need to record studio-quality audio with your voice actor to use as training data. If your training data doesn't pass Google's internal verification and validation check, you might need to re-record or re-submit the data after fixing the identified issues.
It takes Google several weeks to train and evaluate your custom voice model. There is no SLA support for critical bugs for Beta features.
Evaluation and user acceptance tests
Google conducts an initial round of evaluation of the trained model. Once it passes our internal quality criteria, we will send you some offline audio samples recorded using your custom model. You will then follow a user acceptance testing process to evaluate the audio results and officially sign off on the model.