Custom Voice basics

Overview

The Cloud Text-to-Speech API now offers Custom Voice. This feature allows you to train a custom voice model using your own studio-quality audio recordings to create a unique voice. You can the use your custom voice to synthesize audio using the Cloud Text-to-Speech API.

User-supplied training audio data

Custom Voice delivers a Text-to-Speech (TTS) model that sounds as similar to your supplied audio data as possible. Google will send you a script for the voice recordings after your use case is approved. We suggest that you select and hire a voice actor who represents the custom voice you're aiming for. You need to record studio-quality audio with your voice actor to use as training data (see the training data requirements page for more information). If your training data doesn't pass Google's internal quality checks, you might need to re-record or re-submit the data after fixing the identified issues.

Model training

It takes Google several weeks to train your custom voice model.

Deployment

After training, Google will deploy the custom voice model to projects of your choosing.

What's next

  • Implement the Custom Voice feature using our quickstart.