通过 Speech-to-Text API 中的“最新”模型标签,您可以访问两个新的模型标签,在指定模型字段时可使用这些标签。这些模型旨在让您可以使用 Google 提供的最新语音技术和机器学习研究,并且可以比其他可用模型提供更高的语音识别准确率。但是,“最新”模型尚不支持其他可用模型支持的某些功能。
最新模型基于 Google 的 Conformer Speech Model 技术。如需了解详情,请参阅 Google 研究报告。
若要使用最新模型,您需要大致了解如何使用 Speech-to-Text 的 API 或界面。如果您是第一次使用该产品,请参阅我们的快速入门。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[],[],null,["# Introduction to Latest Models\n\nThe \"latest\" model tags in the Speech-to-Text API give access to two new model\ntags that can be used when you specify the model field. These models are designed\nto give you access to the latest speech technology and machine learning research\nfrom Google, and can provide higher accuracy for speech recognition over other\navailable models. However, some features that are supported by other available\nmodels are not yet supported by the \"latest\" models.\n\nThe latest models are based on the Conformer Speech Model technology from\nGoogle. To find out more, see\n[Google Research Publications](https://research.google/pubs/).\n\nUsing the latest models requires a general understanding of using the\nSpeech-to-Text API or UI. Please see our [Quickstarts](/speech-to-text/docs/quickstart)\nif this is your first time using it.\n\n### Model Identifiers\n\nThe latest models are available in two different versions:\n\n- The `latest_short` model is for short\n utterances that are a few seconds in length. It is useful for trying to\n capture commands or other single shot directed speech use cases. Consider\n using `latest_short` instead of the `command_and_search` model.\n\n- The `latest_long` model is for any kind of\n long form content such as media or spontaneous speech and conversations.\n Consider using `latest_long` in place of `video`, especially if `video` is not\n available in your target language. You can also use `latest_long` in place of\n the `default` model.\n\n### Model Technology\n\nThe goal of the latest models is to bring the latest in speech technology\ndirectly to Google Cloud users. Our current Latest models are based on the\nConformer Speech Model technology from Google, but this may change in the\nfuture. To find out more,\ncheck out [Google Research Publications](https://research.google/pubs/) list.\n\n### Pricing\n\nThe `latest_long` and `latest_short` models are billed as \"Standard\" and\nsubject to the same usage and costs as the `command_and_search` or `default`\nmodels. For more information, see [Pricing](/speech-to-text/pricing).\n\n### Model Updates\n\nLatest models are based on rapidly advancing machine learning technology. For\nthis reason we might perform model updates or refreshes more frequently than on\nour other models. These updates can add additional features or make slight\nchanges to accuracy or latency.\n\n### Languages\n\nLatest models are available in more than 20 languages and more than 50 variants.\nWe are always adding languages, so refer to\n[Languages](/speech-to-text/docs/speech-to-text-supported-languages) for the most up to date list.\n\n### Feature Support and Limitations\n\nFeature support varies by language. See [Languages](/speech-to-text/docs/speech-to-text-supported-languages)\nfor a full list of supported features.\n\nThe latest models do not currently support the following feature:\n\n- **Confidence Scores** - The API will return a value, but it is not truly a confidence score.\n\n### Model Service Level Agreement\n\nThe Latest models are considered a Generally Available part of the Speech-to-Text API.\nAs such the functionality they support is available in the v1 API and eligible\nfor the same Service Level Agreement and other protections afforded\nto Generally Available products and features."]]