The "latest" model tags in the Speech-to-Text API give access to two new model tags that can be used when you specify the model field. These models are designed to give you access to the latest speech technology and machine learning research from Google, and can provide higher accuracy for speech recognition over other available models. However, some features that are supported by other available models are not yet supported by the "latest" models.
The latest models are based on the Conformer Speech Model technology from Google. To find out more, see Google Research Publications.
Using the latest models requires a general understanding of using the Speech-to-Text API or UI. Please see our Quickstarts if this is your first time using it.
Model Identifiers
The latest models are available in two different versions:
The
latest_short
model is for short utterances that are a few seconds in length. It is useful for trying to capture commands or other single shot directed speech use cases. Consider usinglatest_short
instead of thecommand_and_search
model.The
latest_long
model is for any kind of long form content such as media or spontaneous speech and conversations. Consider usinglatest_long
in place ofvideo
, especially ifvideo
is not available in your target language. You can also uselatest_long
in place of thedefault
model.
Model Technology
The goal of the latest models is to bring the latest in speech technology directly to Google Cloud users. Our current Latest models are based on the Conformer Speech Model technology from Google, but this may change in the future. To find out more, check out Google Research Publications list.
Pricing
The latest_long
and latest_short
models are billed as "Standard" and
subject to the same usage and costs as the command_and_search
or default
models. For more information, see Pricing.
Model Updates
Latest models are based on rapidly advancing machine learning technology. For this reason we might perform model updates or refreshes more frequently than on our other models. These updates can add additional features or make slight changes to accuracy or latency.
Languages
Latest models are available in more than 20 languages and more than 50 variants. We are always adding languages, so refer to Languages for the most up to date list.
Feature Support and Limitations
Feature support varies by language. See Languages for a full list of supported features.
The latest models do not currently support the following feature:
- Confidence Scores - The API will return a value, but it is not truly a confidence score.
Model Service Level Agreement
The Latest models are considered a Generally Available part of the Speech-to-Text API. As such the functionality they support is available in the v1 API and eligible for the same Service Level Agreement and other protections afforded to Generally Available products and features.