You can improve the accuracy of the transcription results you get from Speech-to-Text by using model adaptation. The model adaptation feature lets you specify words and/or phrases that Speech-to-Text must recognize more frequently in your audio data than other alternatives that might otherwise be suggested. Model adaptation is particularly useful for improving transcription accuracy in the following use cases:
- Your audio contains words or phrases that are likely to occur frequently.
- Your audio is likely to contain words that are rare (such as proper names) or words that do not exist in general use.
- Your audio contains noise or is otherwise not very clear.
For more information about using this feature, see Improve transcription results with model adaptation. For information about phrase and character limits per model adaptation request, see Quotas and limits. Not all models support speech adaptation. See Language Support to see which models support adaptation.
Code sample
Speech Adaptation is an optional Speech-to-Text configuration that you
can use to customize your transcription results according to your needs. See the
RecognitionConfig
documentation for more information about configuring the recognition request
body.
The following code sample shows how to improve transcription accuracy using a
SpeechAdaptation
resource:
PhraseSet
,
CustomClass
,
and model adaptation boost.
To use a PhraseSet
or CustomClass
in future requests, make a note of its
resource name
, returned in the response when you create the resource.
For a list of the pre-built classes available for your language, see Supported class tokens.
Python
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Python API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.