You can improve the accuracy of the transcription results you get from Cloud Speech-to-Text by using model adaptation. Model adaptation lets you specify words and phrases that Cloud STT will recognize more frequently in your audio data than other alternatives that might otherwise be suggested. Model adaptation is particularly useful for improving transcription accuracy in the following use cases:
- Your audio contains words or phrases that are likely to occur frequently.
- Your audio is likely to contain words that are rare (such as proper names) or words that don't exist in general use.
- Your audio contains noise or is otherwise not very clear.
Before reading this document, read Introduction to model adaptation for a high-level overview of how this feature works. For information about phrase and character limits per model adaptation request, see Quotas and limits.
Code sample
Model adaptation is an optional Cloud STT configuration that you
can use to customize your transcription results according to your needs. See the
RecognitionConfig
documentation for more information about configuring the recognition request
body.
The following code sample shows how to improve transcription accuracy using a
SpeechAdaptation
resource:
PhraseSet,
CustomClass,
and model adaptation
boost.
To use a PhraseSet or CustomClass in future requests, make a note of its
resource name, returned in the response when you create the resource.
For a list of the prebuilt classes available for your language, see Supported class tokens.
Python
To learn how to install and use the client library for Cloud STT, see Cloud STT client libraries. For more information, see the Cloud STT Python API reference documentation.
To authenticate to Cloud STT, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.