This page documents production updates to Speech-to-Text . You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.
To get the latest product updates delivered to you, add the URL of this page to your feed reader.
February 20, 2019
Data logging is now available for general use. When you enable data logging, you can reduce the cost of using Speech-to-Text by allowing Google to log your data in order to improve the service.
Enhanced models are now available for general use.
Using enhanced models can improve audio transcription results.
Selecting a transcription model is now available for general use.
You can select different speech recognition models when you send a request to Cloud Speech-to-Text, including a model optimized for transcribing audio data from video files.
Cloud Speech-to-Text can transcribe audio data that includes multiple channels. This feature is now available for general use.
July 24, 2018
Cloud Speech-to-Text provides word-level confidence. Developers can use this feature to get the degree of confidence on a word-by-word level. This feature is in Beta.
Cloud Speech-to-Text can automatically detect the language used in an audio file. To use this feature, developers must specify alternative languages in their transcription request. This feature is in Beta.
Cloud Speech-to-Text can identify different speakers present in an audio file. This feature is in Beta.
Cloud Speech-to-Text can transcribe audio data that includes multiple channels. This feature is in Beta.
April 9, 2018
Cloud Speech-to-Text now provides data logging and enhanced models. Developers that want to take advantage of the enhanced speech recognition models can opt-in for data logging. This feature is in Beta.
Cloud Speech-to-Text can insert punctuation into transcription results, including commas, periods, and question marks. This feature is in Beta.
You can now select different speech recognition models when you send a request to Cloud Speech-to-Text, including a model optimized for transcribing audio from video files. This feature is in Beta.
You can now include more details about your audio source files in transcription requests to Cloud Speech-to-Text in the form of recognition metadata, which can improve the results of the speech recognition. This feature is in Beta.
January 16, 2018
Support for the
OGG_OPUS audio encoding has been expanded
to support 8000 Hz, 12000 Hz, 16000 Hz, 24000 Hz, or 48000 Hz.
August 10, 2017
Time offsets (timestamps) are now available. Set the
enableWordTimeOffsets parameter to true in your request
configuration and Speech-to-Text will include
time offset values for the beginning
and end of each spoken word that is recognized in the audio for your
request. For more information, see Time offsets (timestamps).
Speech-to-Text has added recognition support for 30 new languages. For a complete list of all supported languages, see Language Support.
The limit on the length of audio that you can send with an asynchronous recognition request has been increased from ~80 to ~180 minutes. For information on Speech-to-Text limits, see Quotas & Limits. For information on asynchronous recognition requests, see
April 18, 2017
Release of Speech-to-Text v1.
v1beta1 release of Speech-to-Text has been
v1beta1 continues to be available for
a period of time as defined in the terms of service.
To avoid being impacted when the
is discontinued, replace references to
v1beta1 in your code
v1 and update your code with valid
API names and values.
language_code is now required with requests to Speech-to-Text .
Requests with a missing or
language_code will return an error. (Pre-release versions of the
en-US if the
language_code was omitted from the request.)
SyncRecognize is renamed to
is renamed to
v1/speech:recognize. The behavior is unchanged.
AsyncRecognize is renamed to
is renamed to
The behavior is unchanged except that the
LongRunningRecognize method now
supports all of the
enum values. (Pre-release versions only supported the
sample_rate field has been renamed to
The behavior is unchanged.
EndpointerType enum has been renamed to
SpeechEventType enums have been removed.