Cloud Speech API

Speech to text conversion powered by machine learning

Try It Free

Powerful Speech Recognition

Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes over 110 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products.


Powered by Machine Learning

Apply the most advanced deep learning neural network algorithms to your user's audio for speech recognition with unparalleled accuracy. Speech API accuracy improves over time as Google improves the internal speech recognition technology used by Google products.

Over 110 Languages

Speech API recognizes over 110 languages and variants to support your global user base. You can also filter inappropriate content in text results.

Return Text Results in Real-Time

Speech API can stream text results, returning partial recognition results as they become available, with the recognized text appearing immediately while speaking. Alternatively, Speech API can return recognized text from audio stored in a file.

Accurate in Noisy Environments

You don’t need advanced signal processing or noise cancellation before sending audio to Speech API. The service can successfully handle noisy audio from a variety of environments.

Context-Aware Recognition

Speech recognition can be tailored to context by providing a separate set of word hints with each API call. Useful especially for device/app control use cases.

Works With Apps Across Any Device

Speech API supports any device that can send a REST or gRPC request including phones, PCs, tablets and IoT devices (e.g., cars, TVs, speakers)

Speech API Features

Speech to text conversion powered by machine learning

Automatic Speech Recognition
Automatic Speech Recognition (ASR) powered by deep learning neural networking to power your applications like voice search or speech transcription.
Global Vocabulary
Recognizes over 110 languages and variants with an extensive vocabulary.
Streaming Recognition
Returns recognition results while the user is still speaking.
Word Hints
Speech recognition can be customized to a specific context by providing a set of words and phrases that are likely to be spoken. Especially useful for adding custom words and names to the vocabulary and in voice-control use cases.
Real-time or Pre-recorded Audio Support
Audio input can be captured by an application’s microphone or sent from a pre-recorded audio file. Multiple audio encodings are supported, including FLAC, AMR, PCMU and Linear-16.
Noise Robustness
Handles noisy audio from many environments without requiring additional noise cancellation.
Inappropriate Content Filtering
Filter inappropriate content in text results for some languages.
Integrated API
Audio files can be uploaded in the request or integrated with Google Cloud Storage.


Powerful Speech Recognition

Cloud Speech API is priced per 15 seconds of audio processed after a 60 minute free tier. For details, please see our pricing guide.

Monthly Usage Price Per 15 seconds*
0 - 60 minutes Free
61 - 1,000,000 minutes** $0.006
If you pay in a currency other than USD, the prices listed in your currency on Cloud Platform SKUs apply.

* This pricing is for applications on personal systems (e.g., phones, tablets, laptops, desktops). Please contact us for approval and pricing to use Speech API on embedded devices (e.g., cars, TVs, appliances, or speakers).

** Monthly usage is capped at 1 million minutes per month