Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.
New customers get $300 in free credits to spend on Text-to-Speech.
Improve customer interactions with intelligent, lifelike responses
Engage users with voice user interface in your devices and applications
Personalize your communication based on user preference of voice and language
Benefits
Deploy Google’s groundbreaking technologies to generate speech with humanlike intonation. Built based on DeepMind’s speech synthesis expertise, the API delivers voices that are near human quality.
Choose from a set of 380+ voices across 50+ languages and variants, including Mandarin, Hindi, Spanish, Arabic, Russian, and more. Pick the voice that works best for your user and application.
Create a unique voice to represent your brand across all your customer touchpoints, instead of using a common voice shared with other organizations.
Demo
Type what you want, select a language then click “Speak It” to hear.
Key features
Internationalize your voice experience with ready to use voices powered by the latest research behind Custom Voice.
Dazzle your listeners with professionally narrated content recorded in a studio-quality environment. Make sure to put your headphones on!
Train a custom voice model using your own audio recordings to create a unique and more natural sounding voice for your organization. You can define and choose the voice profile that suits your organization and quickly adjust to changes in voice needs without needing to record new phrases.
Personalize the pitch of your selected voice, up to 20 semitones more or less from the default. Adjust your speaking rate to be 4x faster or slower than the normal rate.
Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.
What's new
Sign up for Google Cloud newsletters to receive product updates, event information, special offers, and more.
Documentation
Use cases
Deliver a better voice experience for customer service with voicebots on Dialogflow that dynamically generate speech, instead of playing static, pre-recorded audio. Engage with high-quality synthesized voices that give callers a sense of familiarity and personalization.
Enable natural communications with your users by empowering your devices to speak humanlike voices as a text reader. Build an end-to-end voice user interface together with Speech-to-Text and Natural Language to improve user experience with easy and engaging interactions.
Easily have the EPGs read text aloud to provide a better user experience to your customers and meet accessibility requirements for your services and applications. Try the EPG demo.
Easily implement text-to-speech functionality in EPGs to provide a better user experience to your customers and meet accessibility requirements for your services and applications.
All features
Custom Voice (beta) | Train a custom speech synthesis model using your own audio recordings to create a unique and more natural-sounding voice for your organization. You can define and choose the voice profile that suits your organization and quickly adjust to changes in voice needs without needing to record new phrases. Learn more. |
Voice and language selection | Choose from an extensive selection of 220+ voices across 40+ languages and variants, with more to come soon. |
WaveNet voices | Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that significantly closes the gap with human performance. |
Text and SSML support | Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions. |
Pitch tuning | Personalize the pitch of your selected voice, up to 20 semitones more or less than the default. |
Speaking rate tuning |