This page documents production updates to Speech-to-Text. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.
You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.
To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly.
November 11, 2022
Speech-to-Text has updated its pricing policy. Enhanced models are no longer priced differently than standard models. Usage of all models will be reported to and priced like standard models. Also, all Cloud Speech-to-Text requests will now be rounded up to the nearest 1 second, with no minimum audio length (requests were previously rounded up to the nearest 15 seconds). See the Pricing page for details.
July 21, 2021
Speech-to-Text has launched a GA version of the Spoken Emoji and Spoken Punctuation features. See the documentation for details.
May 07, 2021
The Speech-to-Text model adaptation feature is now a GA feature. See the model adaptation concepts page for more information about using this feature.
March 23, 2021
Speech-to-Text now allows you to upload your longrunning transcription results directly into a Cloud Storage bucket. See the asynchronous speech recognition documentation for more details.
March 15, 2021
Speech-to-Text has launched the Model Adaptation feature. You can now create custom classes and build phrase sets to improve your transcription results.
August 25, 2020
Speech-to-Text has launched the new On-Prem API. Speech-to-Text On-Prem enables easy integration of Google speech recognition technologies into your on-premises solution.
March 05, 2020
Cloud Speech-to-Text now supports seven new languages: Burmese, Estonian, Uzbek, Punjabi, Albanian, Macedonian, and Mongolian.
April 04, 2019
The v1beta
version of the service is no longer available for use. You must migrate your solutions to either the
v1
or
v1p1beta1
version of the API.
February 20, 2019
Data logging is now available for general use. When you enable data logging, you can reduce the cost of using Cloud Speech-to-Text by allowing Google to log your data in order to improve the service.
Selecting a transcription model is now available for general use. You can select different speech recognition models when you send a request to Cloud Speech-to-Text, including a model optimized for transcribing audio data from video files.
July 24, 2018
Cloud Speech-to-Text can automatically detect the language used in an audio file. To use this feature, developers must specify alternative languages in their transcription request. This feature is in Beta.
April 09, 2018
You can now select different speech recognition models when you send a request to Cloud Speech-to-Text, including a model optimized for transcribing audio from video files. This feature is in Beta.
August 10, 2017
Cloud Speech-to-Text has added recognition support for 30 new languages. For a complete list of all supported languages, see the language support reference.
The limit on the length of audio that you can send with an asynchronous recognition request has been increased from ~80 to ~180 minutes. For information on Cloud Speech-to-Text limits, see the quotas & limits. For more information on asynchronous recognition requests, see the transcribing long audio files guide.
April 18, 2017
The sample_rate
field has been renamed to sample_rate_hertz
. The behavior is unchanged.
You are no longer required to specify the encoding and sample rate for WAV or FLAC files. If omitted, Cloud Speech-to-Text automatically determines the encoding and sample rate for WAV or FLAC files based on the file header. If you specify an encoding or sample rate value that does not match the value in the file header, then Cloud Speech-to-Text will return an error. This change is backwards-compatible and will not invalidate any currently valid requests.
The speech_context
field has been replaced by the
speech_contexts
field, which is a repeated field. However, you can specify,
at most, one speech context. The behavior is unchanged.