Cloud Text-to-Speech

Text-to-speech conversion powered by machine learning.

Try It Free

View documentation for this product.

High-fidelity speech synthesis

Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 30 voices, available in multiple languages and variants. It applies DeepMind’s groundbreaking research in WaveNet and Google’s powerful neural networks to deliver high fidelity audio. With this easy-to-use API, you can create lifelike interactions with your users, across many applications and devices.

High-Fidelity Speech Synthesis

Convert your text to speech right now

Type what you want, select a language then click “Speak It” to hear.

Powered by Google’s machine learning

Apply advanced deep learning neural network algorithms to synthesize text into a variety of voices and languages. Our neural networks were built based on Google’s speech synthesis expertise.

Google’s Machine Learning

Includes exclusive access to WaveNet Voices from DeepMind

DeepMind has made groundbreaking research in machine learning models to generate speech that mimics human voices and sounds more natural, reducing the gap with human performance by over 50%. Cloud Text-to-Speech offers exclusive access to multiple WaveNet voices and will continue to add more over time.

WaveNet Voices from DeepMind

Select from 30+ voices

Google Cloud Text-to-Speech offers a selection of 30+ voices in 14 languages and variants, enabling developers to pick the voice that works best for their application.

Select from 30+ Voices

Easily integrates with existing applications and devices

Cloud Text-to-Speech supports any application or device that can send a REST or gRPC request including phones, PCs, tablets, and IoT devices (e.g., cars, TVs, speakers).

Easily Integrates with Existing Applications and Devices

Supports many common use cases

As an easy-to-use API, Google Cloud Text-to-Speech is a flexible solution to creating natural experiences for a variety of use cases. Common use cases include call center automation, interactive responses from IoT devices, or transforming text into audio that can be consumed as audio.

Supports Many Common Use-Cases

Cloud Text-to-Speech Features

Supports 30+ voices in 14 languages and variants, with more to come soon.
WaveNet Voices
Exclusive multilingual access to DeepMind WaveNet voices that provide the most natural-sounding speech.
Text and SSML Support
Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.
Speaking Rate Tuning
Customize your speaking rate to be 4x faster or slower than the normal rate.
Pitch Tuning
Customize the pitch of your selected voice, up to 20 semitones more or less than the default output.
Volume Gain Control
Increase the volume of the output by up to 16db or decrease the volume up to -96db.
Audio Format Flexibility
Choose from a number of audio formats including mp3, Linear16 and Ogg Opus.
Audio ProfilesBETA
Optimize for the type of speaker from which your speech is intended to play, such as headphones or phone lines.

CLOUD Text-to-Speech PRICING

High-Fidelity Speech Synthesis

Cloud Text-to-Speech is priced per 1 million characters of text processed after a 1 million character free tier. For details, please see our pricing guide.

Feature Monthly free tier Paid usage
Standard (non-WaveNet) voices 0 to 4 million characters $4.00 USD / 1 million characters
WaveNet voices 0 to 1 million characters $16.00 USD / 1 million characters
If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.
A product or feature listed on this page is in beta. For more information on our product launch stages, see here.

Send feedback om...

Cloud Text-to-Speech API