Cloud Text-to-Speech

Text-to-speech conversion powered by machine learning.

Try It Free View Documentation

High-fidelity speech synthesis

Google Cloud Text-to-Speech converts text into human-like speech in more than 180 voices across 30+ languages and variants. It applies groundbreaking research in speech synthesis (WaveNet) and Google's powerful neural networks to deliver high-fidelity audio. With this easy-to-use API, you can create lifelike interactions with your users that transform customer service, device interaction, and other applications.

High-Fidelity Speech Synthesis

Convert your text to speech right now

Type what you want, select a language then click “Speak It” to hear.

Powered by Google’s machine learning

Apply advanced deep learning neural network algorithms to synthesize text into a variety of voices and languages. Our neural networks were built based on Google’s speech synthesis expertise.

Google’s Machine Learning

Select from 180+ voices

Google Cloud Text-to-Speech offers a selection of 180+ voices across 30+ languages and variants, enabling developers to pick the voice that works best for their application.

Select from 180+ voices

Includes exclusive access to WaveNet technology

DeepMind has done groundbreaking research in machine learning models to generate speech that mimics human voices and sounds more natural, reducing the gap with human performance by 70%. Cloud Text-to-Speech offers exclusive access to 90+ WaveNet voices and will continue to add more over time.

WaveNet Voices from DeepMind

Easily integrates with existing applications and devices

Cloud Text-to-Speech supports any application or device that can send a REST or gRPC request including phones, PCs, tablets, and IoT devices (e.g., cars, TVs, speakers).

Easily Integrates with Existing Applications and Devices

Supports many common use cases

As an easy-to-use API, Google Cloud Text-to-Speech is a flexible solution to creating natural experiences for a variety of use cases. Common use cases include call center automation, interactive responses from IoT devices, or transforming text to be consumed as audio.

Supports Many Common Use-Cases
Text To Speech Symbol

Cloud Text-to-Speech features

Supports 180 voices across 30+ languages and variants, with more to come soon.
WaveNet Voices
Exclusive multilingual access to DeepMind WaveNet voices that provide the most natural-sounding speech.
Text and SSML Support
Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.
Speaking Rate Tuning
Customize your speaking rate to be 4x faster or slower than the normal rate.
Pitch Tuning
Customize the pitch of your selected voice, up to 20 semitones more or less than the default output.
Volume Gain Control
Increase the volume of the output by up to 16db or decrease the volume up to -96db.
Audio Format Flexibility
Choose from a number of audio formats including mp3, Linear16, and Ogg Opus.
Audio Profiles
Optimize for the type of speaker from which your speech is intended to play, such as headphones or phone lines.

Cloud Text-to-Speech pricing

High-Fidelity Speech Synthesis

Cloud Text-to-Speech is priced per 1 million characters of text processed after the free tier. For details, please see our pricing guide.

Feature Monthly free tier Paid usage
Standard (non-WaveNet) voices 0 to 4 million characters $4.00 USD / 1 million characters
WaveNet voices 0 to 1 million characters $16.00 USD / 1 million characters
If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
Text To Speech Logo
A product or feature listed on this page is in beta. For more information on our product launch stages, see here.
Cloud AI products comply with the SLA policies listed here. They may offer different latency or availability guarantees from other Google Cloud services.