Release notes

This page documents production updates to Cloud Speech-to-Text. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/cloud-speech-to-text-release-notes.xml

July 23, 2019

Cloud Speech-to-Text has several endless streaming tutorials that demonstrate how to transcribe an infinite audio stream.

You can now use speech adaptation to provide 'hints' to Cloud Speech-to-Text when it performs speech recognition. This feature is now in beta.

June 18, 2019

Cloud Speech-to-Text now has expanded to a 5 minute limit for streaming recognition. To use streaming recognition with the 5 minute limit, you must use the v1p1beta1 API version.

Cloud Speech-to-Text now supports transcription of MP3 encoded audio data. As this feature is in beta, you must use the v1p1beta1 API version.

April 04, 2019

The v1beta version of the service is no longer available for use. You must migrate your solutions to either the v1 or v1p1beta1 version of the API.

February 20, 2019

Data logging is now available for general use. When you enable data logging, you can reduce the cost of using Cloud Speech-to-Text by allowing Google to log your data in order to improve the service.

Enhanced models are now available for general use. Using enhanced models can improve audio transcription results.

Using enhanced models no longer requires you to opt-in for data logging. Enhanced models are available for use by any transcription requests for a different price as standard models.

Selecting a transcription model is now available for general use. You can select different speech recognition models when you send a request to Cloud Speech-to-Text, including a model optimized for transcribing audio data from video files.

Cloud Speech-to-Text can transcribe audio data that includes multiple channels. This feature is now available for general use.

You can now include more details about your audio source files in transcription requests to Cloud Speech-to-Text in the form of recognition metadata, which can improve the results of the speech recognition. This feature is now available for general use.

July 24, 2018

Cloud Speech-to-Text provides word-level confidence Developers can use this feature to get the degree of confidence on a word-by-word level. This feature is in Beta.

Cloud Speech-to-Text can automatically detect the language used in an audio file. To use this feature, developers must specify alternative languages in their transcription request. This feature is in Beta.

Cloud Speech-to-Text can identify different speakers present in an audio file. This feature is in Beta.

Cloud Speech-to-Text can transcribe audio data that includes multiple channels. This feature is in Beta.

April 09, 2018

Cloud Speech-to-Text now provides data logging and enhanced models. Developers that want to take advantage of the enhanced speech recognition models can opt-in for data logging. This feature is in Beta.

Cloud Speech-to-Text can insert punctuation into transcription results, including commas, periods, and question marks. This feature is in Beta.

You can now select different speech recognition models when you send a request to Cloud Speech-to-Text, including a model optimized for transcribing audio from video files. This feature is in Beta.

You can now include more details about your audio source files in transcription requests to Cloud Speech-to-Text in the form of recognition metadata, which can improve the results of the speech recognition. This feature is in Beta.

January 16, 2018

Support for the OGG_OPUS audio encoding has been expanded to support 8000 Hz, 12000 Hz, 16000 Hz, 24000 Hz, or 48000 Hz.

August 10, 2017

Time offsets (timestamps) are now available. Set the enableWordTimeOffsets parameter to true in your request configuration and Cloud Speech-to-Text will include time offset values for the beginning and end of each spoken word that is recognized in the audio for your request. For more information, see Time offsets (timestamps).

Cloud Speech-to-Text has added recognition support for 30 new languages. For a complete list of all supported languages, see the language support reference.

The limit on the length of audio that you can send with an asynchronous recognition request has been increased from ~80 to ~180 minutes. For information on Cloud Speech-to-Text limits, see the quotas & limits. For more information on asynchronous recognition requests, see the transcribing long audio files guide.

April 18, 2017

Release of Cloud Speech-to-Text v1.

The v1beta1 release of Cloud Speech-to-Text has been deprecated. The v1beta1 endpoint continues to be available for a period of time as defined in the terms of service. To avoid being impacted when the v1beta1 is discontinued, replace references to v1beta1 in your code with v1 and update your code with valid v1 API names and values.

A language_code is now required with requests to Cloud Speech-to-Text. Requests with a missing or invalid language_code will return an error. (Pre-release versions of the API used en-US if the language_code was omitted from the request.)

SyncRecognize is renamed to Recognize. v1beta1/speech:syncrecognize is renamed to v1/speech:recognize. The behavior is unchanged.

The sample_rate field has been renamed to sample_rate_hertz. The behavior is unchanged.

The EndpointerType enum has been renamed to SpeechEventType.

The following SpeechEventType enums have been removed.

  • START_OF_SPEECH
  • END_OF_SPEECH
  • END_OF_AUDIO

The END_OF_UTTERANCE enum has been renamed to END_OF_SINGLE_UTTERANCE. The behavior is unchanged.

The result_index field has been removed.

The speech_context field has been replaced by the speech_contexts field, which is a repeated field. However, you can specify, at most, one speech context. The behavior is unchanged.

The SPEEX_WITH_HEADER_BYTE and OGG_OPUS codecs have been added to support audio encoder implementations for legacy applications. We do not recommend using lossy codes, as they result in a lower-quality speech transcription. If you must use a low-bitrate encoder, OGG_OPUS is preferred.

You are no longer required to specify the encoding and sample rate for WAV or FLAC files. If omitted, Cloud Speech-to-Text automatically determines the encoding and sample rate for WAV or FLAC files based on the file header. If you specify an encoding or sample rate value that does not match the value in the file header, then Cloud Speech-to-Text will return an error. This change is backwards-compatible and will not invalidate any currently valid requests.

Trang này có hữu ích không? Hãy cho chúng tôi biết đánh giá của bạn:

Gửi phản hồi về...

Cloud Speech-to-Text Documentation
Bạn cần trợ giúp? Truy cập trang hỗ trợ của chúng tôi.