Text-to-Speech

Convert text into natural-sounding speech using an API powered by Google’s AI technologies.

Try it free
  • action/check_circle_24px Created with Sketch.

    Improve customer interactions with intelligent, lifelike responses

  • action/check_circle_24px Created with Sketch.

    Engage users with voice user interface in your devices and applications

  • action/check_circle_24px Created with Sketch.

    Personalize your communication based on user preference of voice and language

High fidelity speech

Deploy Google’s groundbreaking technologies to generate speech with humanlike intonation. Built based on DeepMind’s speech synthesis expertise, the API delivers voices that are near human quality.

Widest voice selection

Choose from a set of 220+ voices across 40+ languages and variants. Pick the voice that works best for your user and application.

Accelerated innovation

Combine with the best of Google’s technologies in Translation and Speech-to-Text to unlock use cases like multilingual audio content and voice bots.

Put Text-to-Speech into action

Type what you want, select a language then click “Speak It” to hear.

Key features

WaveNet voices

Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that significantly closes the gap with human performance.

Voice tuning

Personalize the pitch of your selected voice, up to 20 semitones more or less from the default. Adjust your speaking rate to be 4x faster or slower than the normal rate.

Text and SSML support

Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.

View all features

What's new

Sign up for Google Cloud newsletters to receive product updates, event information, special offers, and more.

Documentation

Google Cloud Basics
Text-to-Speech basics

A guide to the fundamental concepts of using the Text-to-Speech API.

Quickstart
Quickstart: Using the command line

Set up your Google Cloud project and authorization and make a request for Text-to-Speech to create audio from text.

Google Cloud Basics
Supported voices and languages

See which languages are supported by Text-to-Speech and hear samples of the voices available for each.

Tutorial
WaveNet and other synthetic voices

Learn about the different synthetic voices available for use in Text-to-Speech, including the premium WaveNet voices.

Tutorial
Speaking addresses with SSML

This tutorial demonstrates how to use Speech Synthesis Markup Language (SSML) to speak a text file of addresses.

Use cases

Use case
Voice bots in contact centers

Deliver a better voice experience for customer service by dynamically generating speech, instead of playing static, pre-recorded audio. Engage with high-quality synthesized voices that give callers a sense of familiarity and personalization.

Voice bots in contact centers reference architecture
Use case
Voice generation in devices

Enable natural communications with your users by empowering your devices to speak humanlike voices. Build an end-to-end voice user interface together with Speech-to-Text and improve user experience with easy and engaging interactions.

Voice generation in devices reference architecture

All features

Voice and language selection Choose from an extensive selection of 220+ voices across 40+ languages and variants, with more to come soon.
WaveNet voices Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that significantly closes the gap with human performance.
Text and SSML support Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.
Pitch tuning Personalize the pitch of your selected voice, up to 20 semitones more or less than the default.
Speaking rate tuning Adjust your speaking rate to be 4x faster or slower than the normal rate.
Volume gain control Increase the volume of the output by up to 16db or decrease the volume up to -96db.
Integrated REST and gRPC APIs Easily integrate with any application or device that can send a REST or gRPC request including phones, PCs, tablets, and IoT devices (e.g., cars, TVs, speakers).
Audio format flexibility Choose from a number of audio formats including mp3, Linear16, and Ogg Opus.
Audio profiles Optimize for the type of speaker from which your speech is intended to play, such as headphones or phone lines.

Pricing

Text-to-Speech is priced per 1 million characters of text processed after the free tier.

If you pay in a currency other than USD, the prices listed in your currency on Google Cloud SKUs apply.