Create a custom voice model for your speech applications

You can create your own custom voice models within the Text-to-Speech UI.

Prerequisites

  1. Enable the Text-to-Speech API.
  2. Record your audio following the training data requirements.
  3. Create a Cloud Storage bucket.
  4. Upload the audio files to your new bucket in the format specified by the Training data requirements (0001.wav, 0002.wav... 0200.wav, etc).

Train a new custom voice model

  1. Open the Custom Voice tab in the Text-to-Speech UI.
  2. Click Create near the top of the screen.
  3. Name your voice model.
  4. Specify the language from the drop-down of supported languages.
  5. Select the proper CSV file from the bucket configuration in step 4 of the prerequisites.
  6. Upload a consent statement from the voice talent. Example: "I, (name) consent that my voice will be used to create a synthetic custom voice."
  7. Click Create to kick off the model creation. Model creation can take up to 3 days.
  8. To see the status of the training job, view your console notifications in the top-right navigation header.

What's next

When your model training finishes, sample output audio files will be available in the console. You can use these files to do an initial evaluation of the quality of the model. If it meets your requirements, contact your sales team for model deployment assistance. Deployment takes two to three weeks, so we recommend evaluating quickly and communicating early.