Convert audio into text transcriptions and integrate speech recognition into applications with easy-to-use APIs.
New customers also get up to $300 in free credits to try Speech-to-Text and other Google Cloud products.
Features
Speech-to-Text can utilize Chirp, Google Cloud’s foundation model for speech trained on millions of hours of audio data and billions of text sentences. This contrasts with traditional speech recognition techniques that focus on large amounts of language-specific supervised data. These techniques give users improved recognition and transcription for more spoken languages and accents.
Build for a global user base with extensive language support. Transcribe short, long, and even streaming audio data. Speech-to-Text also offers users more accurate and globe-spanning translation and recognition with Chirp, the next generation of universal speech models. Chirp was built using self-supervised training on millions of hours of audio and 28 billion sentences of text spanning 100+ languages.
Choose from a selection of trained models for voice control, phone call, and video transcription optimized for domain-specific quality requirements. Easily customize, experiment with, create, and manage custom resources with the Speech-to-Text UI.
Speech-to-Text API v2 gives enterprise and business customers added security and regulatory requirements out of the box. Data residency enables the invocation of transcription models through a fully regionalized service that taps into Google Cloud regions like Singapore and Belgium. Recognizer resourcefulness eliminates the need for dedicated service accounts for authentication and authorization. Logs for resource generation and transcription are made easily available in the Google Cloud console. And Speech-to-Text API v2 offers enterprise-grade encryption with customer-managed encryption keys for all resources as well as batch transcription.
Speech-to-Text uses model adaptation to improve the accuracy of frequently used words, expand the vocabulary available for transcription, and improve transcription from noisy audio. Model adaptation lets users customize Speech-to-Text to recognize specific words or phrases more frequently than other options that might otherwise be suggested. For example, you could bias Speech-to-Text towards transcribing "weather" over "whether."
Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage).
Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started.
Speech-to-Text can recognize distinct channels in multichannel situations (for example, video conference) and annotate the transcripts to preserve the order.
Speech-to-Text can handle noisy audio from many environments without requiring additional noise cancellation.
Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.
Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter out profane words in text results.
Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on your configuration.
Speech-to-Text accurately punctuates transcriptions, such as by providing commas, question marks, and periods.
Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance.
How It Works
Speech-to-Text has three main methods to perform speech recognition: synchronous, asynchronous, and streaming. Each method returns text results based on if transcription is needed in post processing, periodically, or in real time. Simply put, you'll input audio data and then receive a text-based response.
Demo
Quickly create audio transcription from a file upload or directly speaking into a mic.
Common Uses
Create an audio transcription
Learn how to use the Speech-to-Text API from within the Cloud Console by creating an audio transcription in just a few steps. You can also transcribe short, long, and streaming audio.
Create an audio transcription
Learn how to use the Speech-to-Text API from within the Cloud Console by creating an audio transcription in just a few steps. You can also transcribe short, long, and streaming audio.
How to add Speech-to-Text to apps
Learn how you can quickly and easily enable Speech-to-Text for your application with Google Cloud. This video covers how to add AI to your application without extensive machine learning model experience. Using the pretrained Speech-to-Text API you'll quickly and easily enable AI for your application.
How to add Speech-to-Text to apps
Learn how you can quickly and easily enable Speech-to-Text for your application with Google Cloud. This video covers how to add AI to your application without extensive machine learning model experience. Using the pretrained Speech-to-Text API you'll quickly and easily enable AI for your application.
Language, speech, text, and translation with Google Cloud APIs
In this course, you'll use the Speech-to-Text API to transcribe an audio file into a text file, translate with the Google Cloud Translation API, and create synthetic speech with Natural Language AI.
Language, speech, text, and translation with Google Cloud APIs
In this course, you'll use the Speech-to-Text API to transcribe an audio file into a text file, translate with the Google Cloud Translation API, and create synthetic speech with Natural Language AI.
Pricing
How Speech-to-Text pricing works | Speech-to-Text pricing is based on the API version, channels, batch methods, and any additional Google Cloud service costs like storage. | |
---|---|---|
API version | Service and capability | Pricing |
Speech-to-Text V1 API | V1 offers data residency for multi region only. Models include short, long, phone call, and video. V1 does not include audit logging. New customers get $300 in free credits and 60 minutes for transcribing and analyzing audio free per month, not charged against your credits. | $0.024 per min |
Speech-to-Text V2 API | V2 offers data residency for multi and single region. Models include short, long, telephony, video, and Chirp. V2 does include audit logging and support for customer managed encryption keys. | $0.016 per min |
View pricing details for Speech-to-Text.
How Speech-to-Text pricing works
Speech-to-Text pricing is based on the API version, channels, batch methods, and any additional Google Cloud service costs like storage.
Speech-to-Text V1 API
V1 offers data residency for multi region only. Models include short, long, phone call, and video. V1 does not include audit logging. New customers get $300 in free credits and 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.
$0.024
per min
Speech-to-Text V2 API
V2 offers data residency for multi and single region. Models include short, long, telephony, video, and Chirp. V2 does include audit logging and support for customer managed encryption keys.
$0.016
per min
View pricing details for Speech-to-Text.