Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.
New customers get up to $300 in free credits to try Text-to-Speech and other Google Cloud products.
Improve customer interactions with intelligent, lifelike responses
Engage users with voice user interface in your devices and applications
Personalize your communication based on user preference of voice and language
Benefits
High fidelity speech
Deploy Google’s groundbreaking technologies to generate speech with humanlike intonation. Built based on DeepMind’s speech synthesis expertise, the API delivers voices that are near human quality.
Widest voice selection
Choose from a set of 380+ voices across 75+ languages and variants, including Mandarin, Hindi, Spanish, Arabic, Russian, and more. Pick the voice that works best for your user and application.
One-of-a-kind voice
Create a unique voice to represent your brand across all your customer touchpoints, instead of using a common voice shared with other organizations.
Demo
Type what you want, select a language then click “Speak It” to hear.
Key features
Synthesize single or multispeaker speech from short snippets to full-length narratives, maintaining contextuality. Precisely dictate style, accent, pace, tone, and emotional expression—all steerable through simple natural-language prompts in 75+ locales. Head to Media Studio or check out our documentation to learn more.
Build engaging agents using the latest spontaneous conversational voices based on AudioML. These voices offer high-quality audio, low-latency streaming, and natural-sounding speech, incorporating human disfluencies, emotional range, and accurate intonation. Head to Media Studio or check out our documentation to learn more.
Create personalized voice models with as little as 10 seconds of audio input. Perfect for video games, audiobooks, podcasts, and more. Available in more than 30+ locales. Head to Media Studio or check out our documentation to learn more.
Control number and time formatting, delivery, pronunciation, and emotion using simple plaintext scripting, SSML tags, or even powerful natural-language prompts depending on model support. Head to Media Studio or check out our documentation to learn more.
What's new
Sign up for Google Cloud newsletters to receive product updates, event information, special offers, and more.
Documentation
Use cases
Deliver a better voice experience for customer service with voicebots on Dialogflow that dynamically generate speech, instead of playing static, pre-recorded audio. Engage with high-quality synthesized voices that give callers a sense of familiarity and personalization.
Enable natural communications with your users by empowering your devices to speak humanlike voices as a text reader. Build an end-to-end voice user interface together with Speech-to-Text and Natural Language to improve user experience with easy and engaging interactions.
Easily have the EPGs read text aloud to provide a better user experience to your customers and meet accessibility requirements for your services and applications. Try the EPG demo.
Easily implement text-to-speech functionality in EPGs to provide a better user experience to your customers and meet accessibility requirements for your services and applications.
All features
Streaming audio synthesis | Power your AI agents with ultra-low-latency speech for seamless, real-time conversations with streaming audio synthesis. |
Long audio synthesis | Asynchronously synthesize up to 1 million bytes of input with long audio synthesis. |
Voice and language selection | Choose from an extensive selection of 380+ voices across 75+ languages and variants, with more to come soon. |
Text and SSML support | Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions. |
Pitch tuning | Personalize the pitch of your selected voice, up to 20 semitones more or less than the default. |
Speaking rate tuning | Adjust your speaking rate to be 4x faster or slower than the normal rate. |
Volume gain control | Increase the volume of the output by up to 16 db or decrease the volume up to -96 db. |
Integrated REST and gRPC APIs | Easily integrate with any application or device that can send a REST or gRPC request including phones, PCs, tablets, and IoT devices (for example cars, TVs, speakers). |
Audio format flexibility | Convert text to MP3, Linear16, OGG Opus, and a number of other audio formats. |
Audio profiles | Optimize for the type of speaker from which your speech is intended to play, such as headphones or phone lines. |
Pricing
Text-to-Speech is priced based on the number of characters sent to the service to be synthesized into audio each month. The first 1 million characters for WaveNet voices are free each month. For Standard (non-WaveNet) voices, the first 4 million characters are free each month. After the free tier has been reached, Text-to-Speech is priced per 1 million characters of text processed.
If you pay in a currency other than USD, the prices listed in your currency on Google Cloud SKUs apply.
New customers get $300 in free credits to try Text-to-Speech and other Google Cloud products.