Create a custom voice model for your speech applications

You can create your own custom voice models within the Text-to-Speech UI.

Prerequisites

Enable the Text-to-Speech API.
Record your audio following the training data requirements.
Create a Cloud Storage bucket.
Upload the audio files to your new bucket in the format specified by the Training data requirements (0001.wav, 0002.wav... 0200.wav, etc).

Train a new custom voice model

Open the Custom Voice tab in the Text-to-Speech UI.
Click Create near the top of the screen.
Name your voice model.
Specify the language from the drop-down of supported languages.
Select the proper CSV file from the bucket configuration in step 4 of the prerequisites.
Upload a consent statement from the voice talent. Example: "I am the owner of this voice and I consent to Google using this voice to create a synthetic voice model."
Click Create to kick off the model creation. Model creation can take up to 3 days.
To see the status of the training job, view your console notifications in the top-right navigation header.

What's next

When your model training finishes, sample output audio files will be available in the console. You can use these files to do an initial evaluation of the quality of the model. If it meets your requirements, contact your sales team for model deployment assistance. Deployment takes two to three weeks, so we recommend evaluating quickly and communicating early.