모델 적응이 포함된 인식 요청 보내기

모델 적응을 사용하면 Speech-to-Text에서 가져온 텍스트 변환 결과의 정확도를 향상시킬 수 있습니다. 모델 적응 기능을 사용하면 Speech-to-Text가 오디오 데이터에서 제안해야 하는 다른 대안보다 더 자주 인식해야 하는 단어나 문구를 지정할 수 있습니다. 모델 적응은 특히 다음과 같은 사용 사례에서 텍스트 변환 정확도를 향상시키는 데 유용합니다.

오디오에 매우 자주 나타날 가능성이 높은 단어나 문구가 포함되어 있습니다.
고유 이름과 같이 일반적인 오디오에 포함될 가능성이 높지 않은 사용 빈도가 낮은 단어가 오디오에 포함될 가능성이 높은 경우
오디오에 노이즈가 있거나 명확하게 들리지 않는 경우

이 기능 사용에 대한 자세한 내용은 모델 적응으로 텍스트 변환 결과 개선을 참조하세요. 모델 적응 요청별 구문 및 글자 수 제한에 대한 자세한 내용은 할당량 및 한도를 참조하세요. 모든 모델이 음성 적응을 지원하는 것은 아닙니다. 어떤 모델이 적응을 지원하는지 알아보려면 언어 지원을 참조하세요.

코드 샘플

음성 적응은 필요에 따라 스크립트 작성 결과를 맞춤설정하는 데 사용할 수 있는 Speech-to-Text 구성(선택사항)입니다. 인식 요청 본문 구성에 대한 자세한 내용은 RecognitionConfig 문서를 참조하세요.

다음 코드 샘플은 SpeechAdaptation 리소스인 PhraseSet, CustomClass, 모델 조정 부스트를 사용하여 텍스트 변환 정확도를 향상시키는 방법을 보여줍니다. 이후 요청에서 PhraseSet 또는 CustomClass를 사용하려면 리소스를 만들 때 응답에 반환되는 리소스 name을 기록해 둡니다.

사용 중인 언어에 사용할 수 있는 사전 빌드된 클래스 목록은 지원되는 클래스 토큰을 참조하세요.

Python

Speech-to-Text용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Speech-to-Text 클라이언트 라이브러리를 참조하세요. 자세한 내용은 Speech-to-Text Python API 참조 문서를 확인하세요.

Speech-to-Text에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

import os

from google.cloud import speech_v1p1beta1 as speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_with_model_adaptation(
    audio_uri: str,
    custom_class_id: str,
    phrase_set_id: str,
) -> str:
    """Create `PhraseSet` and `CustomClasses` for custom item lists in input data.
    Args:
        audio_uri (str): The Cloud Storage URI of the input audio. e.g. gs://[BUCKET]/[FILE]
        custom_class_id (str): The unique ID of the custom class to create
        phrase_set_id (str): The unique ID of the PhraseSet to create.
    Returns:
        The transcript of the input audio.
    """
    # Specifies the location where the Speech API will be accessed.
    location = "global"

    # Audio object
    audio = speech.RecognitionAudio(uri=audio_uri)

    # Create the adaptation client
    adaptation_client = speech.AdaptationClient()

    # The parent resource where the custom class and phrase set will be created.
    parent = f"projects/{PROJECT_ID}/locations/{location}"

    # Create the custom class resource
    adaptation_client.create_custom_class(
        {
            "parent": parent,
            "custom_class_id": custom_class_id,
            "custom_class": {
                "items": [
                    {"value": "sushido"},
                    {"value": "altura"},
                    {"value": "taneda"},
                ]
            },
        }
    )
    custom_class_name = (
        f"projects/{PROJECT_ID}/locations/{location}/customClasses/{custom_class_id}"
    )
    # Create the phrase set resource
    phrase_set_response = adaptation_client.create_phrase_set(
        {
            "parent": parent,
            "phrase_set_id": phrase_set_id,
            "phrase_set": {
                "boost": 10,
                "phrases": [
                    {"value": f"Visit restaurants like ${{{custom_class_name}}}"}
                ],
            },
        }
    )
    phrase_set_name = phrase_set_response.name
    # The next section shows how to use the newly created custom
    # class and phrase set to send a transcription request with speech adaptation

    # Speech adaptation configuration
    speech_adaptation = speech.SpeechAdaptation(phrase_set_references=[phrase_set_name])

    # speech configuration object
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=24000,
        language_code="en-US",
        adaptation=speech_adaptation,
    )

    # Create the speech client
    speech_client = speech.SpeechClient()

    response = speech_client.recognize(config=config, audio=audio)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")