You can improve the accuracy of the transcription results you get from Speech-to-Text by using speech adaptation. The speech adaptation feature allows you to specify words and/or phrases that STT should recognize more frequently in your audio data than other alternatives that might otherwise be suggested. Speech adaptation is particularly useful for improving transcription accuracy in the following cases:
- Your audio contains words/phrases that are likely to occur very frequently.
- Your audio is likely to contain words that are rare (such as proper names) or words that do not exist in general use.
- Your audio contains noise or is otherwise not very clear.
See the speech adaptation concepts page for speech adaptation and speech adaptation boost best practices information.
The following code sample demonstrates how to improve transcription accuracy by setting speech contexts in a request sent to Speech-to-Text API. See the class tokens page for a list of the classes available for your language.
REST & CMD LINE
Refer to the speech:recognize
API endpoint for complete details.
Before using any of the request data below, make the following replacements:
- language-code: the BCP-47 code of the language spoken in your audio clip.
- phrases-to-boost: phrase or phrases that you want Speech-to-Text to boost, as an array of strings.
- storage-bucket: a Cloud Storage bucket.
- input-audio: the audio data that you want to transcribe.
HTTP method and URL:
POST https://speech.googleapis.com/v1p1beta1/speech:recognize
Request JSON body:
{ "config":{ "languageCode":"language-code", "speechContexts":[{ "phrases":[phrases-to-boost], "boost": 2 }] }, "audio":{ "uri":"gs:storage-bucket/input-file" } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "results": [ { "alternatives": [ { "transcript": "When deciding whether to bring an umbrella, I consider the weather", "confidence": 0.9463943 } ], "languageCode": "en-us" } ] }
Java
Node.js
Python
Go