Convert your speech to text right now
Select a language and click "Start Now" to begin recording
- Automatic Speech Recognition
- Automatic Speech Recognition (ASR) powered by deep learning neural networking to power your applications like voice search or speech transcription.
- Global Vocabulary
- Recognizes 120 languages and variants with an extensive vocabulary.
- Phrase Hints
- Speech recognition can be customized to a specific context by providing a set of words and phrases that are likely to be spoken. This is especially useful for adding custom words and names to the vocabulary and in voice-control use cases.
- Real-time Streaming or Prerecorded Audio Support
- Audio input can be streamed from an application’s microphone or sent from a prerecorded audio file (inline or through Google Cloud Storage). Multiple audio encodings are supported, including FLAC, AMR, PCMU, and Linear-16.
- Auto-Detect Language BETA
- When you need to support multilingual scenarios, you can now specify two to four language codes and Cloud Speech-to-Text will identify the correct language spoken and provide the transcript.
- Noise Robustness
- Handles noisy audio from many environments without requiring additional noise cancellation.
- Inappropriate Content Filtering
- Filter inappropriate content in text results for some languages.
- Automatic Punctuation BETA
- Accurately punctuates transcriptions (e.g., commas, question marks, and periods) with machine learning.
- Model Selection
- Choose from a selection of four pre-built models: default, voice commands and search, phone calls, and video transcription.
- Speaker Diarization BETA
- Know who said what - you can now get automatic predictions about which of the speakers in a conversation spoke each utterance.
- Multichannel Recognition
- In multiparticipant recordings where each participant is recorded in a separate channel (e.g., phone call with two channels or video conference with four channels), Cloud Speech-to-Text will recognize each channel separately and then annotate the transcripts so that they follow the same order as in real life.
|Feature||Standard models (all models except enhanced phone and video)||Premium models* (enhanced phone, video)|
|0-60 Minutes||Over 60 Mins up to 1 Million Mins||0-60 Minutes||Over 60 Mins up to 1 Million Mins|
|Speech Recognition (without Data Logging - default)||Free||$0.006 / 15 seconds **||Free||$0.009 / 15 seconds **|
|Speech Recognition (with Data Logging opt-in)||Free||$0.004 / 15 seconds **||Free||$0.006 / 15 seconds **|
This pricing is for applications on personal systems (e.g., phones, tablets, laptops, desktops). Please contact us for approval and pricing to use the Cloud Speech-to-Text API on embedded devices (e.g., cars, TVs, appliances, or speakers).
* Currently available for US English only
** Each request is rounded up to the nearest increment of 15 seconds. For example, if you make three separate requests (Standard model), each containing 7 seconds of audio, you are billed $0.018 USD for 45 seconds (3 × 15 seconds) of audio. Fractions of seconds are included when rounding up to the nearest increment of 15 seconds. That is, 15.14 seconds are rounded up and billed as 30 seconds.
A product or feature listed on this page is in beta. For more information on our product launch stages, see here.
Cloud AI products comply with the SLA policies listed here. They may offer different latency or availability guarantees from other Google Cloud services.