Learn how to build the next generation of AI applications. Join the Applied AI Summit on December 13. 
Jump to
Speech-to-Text

Speech-to-Text

Accurately convert speech into text with an API powered by the best of Google’s AI research and technology.

New customers get $300 in free credits to spend on Speech-to-Text. All customers get 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.

  • Transcribe your content with accurate captions

  • Enable the power of voice to create better user experiences

  • Improve your service with insights from customer interactions

  • Get started quickly with our in-console tutorial

Benefits

State-of-the-art accuracy

Leverage Google’s most advanced deep learning neural network algorithms for automatic speech recognition (ASR).

Easy model customization

Experiment with, create, and manage custom resources with the Speech-to-Text UI.

Flexible model deployment

Deploy ASR wherever you need it, whether in the cloud with the API or on-premises with Speech-to-Text On-Prem.

Demo

Put Speech-to-Text into action

As in this demo, you can easily infuse speech transcription into your applications with the Speech-to-Text API.

Key features

Key features

Speech adaptation

Provide hints to boost the transcription accuracy of rare and domain-specific words or phrases. Use classes to automatically convert spoken numbers into addresses, years, currencies, and more.

Domain-specific models

Choose from a selection of trained models for voice control, phone call, and video transcription optimized for domain-specific quality requirements. 

Easily compare quality

Experiment on your speech audio with our easy-to-use user interface. Try different configurations to optimize quality and accuracy.

Speech On-Device

Run Google Cloud's speech algorithms locally on any device, regardless of internet connectivity. Promise users that their voice data will never leave their device.

Foundation model for Speech-to-Text

Build voice-enabled applications for global audiences with speech models that are powered by Chirp, Google Cloud’s foundation model for speech trained on millions of hours of audio data and billions of text sentences. 

View all features