Send a recognition request with model adaptation

You can improve the accuracy of the transcription results you get from Speech-to-Text by using model adaptation. The model adaptation feature allows you to specify words and/or phrases that STT should recognize more frequently in your audio data than other alternatives that might otherwise be suggested. Model adaptation is particularly useful for improving transcription accuracy in the following use cases:

  1. Your audio contains words/phrases that are likely to occur very frequently.
  2. Your audio is likely to contain words that are rare (such as proper names) or words that do not exist in general use.
  3. Your audio contains noise or is otherwise not very clear.

See the model adaptation concepts page for more information on using this feature.

The following code sample demonstrates how to improve transcription accuracy using a SpeechAdaptation resource: PhraseSet, CustomClass, and model adaptation boost. See the class tokens page for a list of the pre-built classes available for your language.

Python


from google.cloud import speech_v1p1beta1 as speech


def transcribe_with_model_adaptation(
    project_id, location, storage_uri, custom_class_id, phrase_set_id
):

    """
    Create`PhraseSet` and `CustomClasses` to create custom lists of similar
    items that are likely to occur in your input data.
    """

    # Create the adaptation client
    adaptation_client = speech.AdaptationClient()

    # The parent resource where the custom class and phrase set will be created.
    parent = f"projects/{project_id}/locations/{location}"

    # Create the custom class
    custom_class_response = adaptation_client.create_custom_class(
        {
            "parent": parent,
            "custom_class_id": custom_class_id,
            "custom_class": {
                "items": [
                    {"value": "sushido"},
                    {"value": "altura"},
                    {"value": "taneda"},
                ]
            },
        }
    )

    # Create the phrase set
    phrase_set_response = adaptation_client.create_phrase_set(
        {
            "parent": parent,
            "phrase_set_id": phrase_set_id,
            "phrase_set": {
                "boost": 10,
                "phrases": [{"value": f"Visit restaurants like ${custom_class_id}"}],
            },
        }
    )

    # The next section shows how to use the newly created custom
    # class and phrase set to send a transcription request with speech adaptation

    # Speech adaptation configuration
    speech_adaptation = speech.SpeechAdaptation(
        phrase_sets=[phrase_set_response], custom_classes=[custom_class_response]
    )

    # speech configuration object
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=24000,
        language_code="en-US",
        adaptation=speech_adaptation,
    )

    # The name of the audio file to transcribe
    # storage_uri URI for audio file in Cloud Storage, e.g. gs://[BUCKET]/[FILE]

    audio = speech.RecognitionAudio(uri=storage_uri)

    # Create the speech client
    speech_client = speech.SpeechClient()

    response = speech_client.recognize(config=config, audio=audio)

    for result in response.results:
        print("Transcript: {}".format(result.alternatives[0].transcript))