发送使用模型自适应功能的识别请求

通过模型自适应,您可以提高从 Speech-to-Text 获得的转录结果的准确率。利用模型自适应功能,您可以指定 Speech-to-Text 必须在音频数据中更频繁地识别的字词和/或短语,而不会建议其他可能合适的替代性说法。模型自适应功能对在以下使用场景中提高转录准确率特别有用:

  1. 您的音频中包含可能会频繁出现的字词或短语。
  2. 您的音频可能包含罕见的字词(例如专有名词)或者常规用语中不使用的字词。
  3. 您的音频包含噪音或不够清晰。

如需详细了解如何使用此功能,请参阅利用模型自适应功能改进转录结果。如需了解每个模型适配请求的短语和字符限制,请参阅配额和限制。并非所有模型都支持语音自适应。请参阅语言支持,了解哪些模型支持适配。

代码示例

语音自适应是可选的 Speech-to-Text 配置,可用于根据需要自定义转写结果。如需详细了解如何配置识别请求正文,请参阅 RecognitionConfig 文档。

以下代码示例演示了如何使用 SpeechAdaptation 资源来提高转录准确率:PhraseSetCustomClass增强型模型自适应。如需在将来的请求中使用 PhraseSetCustomClass,请记录在创建资源时在响应中返回的资源 name

如需查看适用于您的语言的预构建类列表,请参阅支持的类令牌

Python

如需了解如何安装和使用 Speech-to-Text 客户端库,请参阅 Speech-to-Text 客户端库。如需了解详情,请参阅 Speech-to-Text Python API 参考文档

如需向 Speech-to-Text 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


from google.cloud import speech_v1p1beta1 as speech

def transcribe_with_model_adaptation(
    project_id: str,
    location: str,
    storage_uri: str,
    custom_class_id: str,
    phrase_set_id: str,
) -> str:
    """Create`PhraseSet` and `CustomClasses` to create custom lists of similar
    items that are likely to occur in your input data.

    Args:
        project_id: The GCP project ID.
        location: The GCS location of the input audio.
        storage_uri: The Cloud Storage URI of the input audio.
        custom_class_id: The ID of the custom class to create

    Returns:
        The transcript of the input audio.
    """

    # Create the adaptation client
    adaptation_client = speech.AdaptationClient()

    # The parent resource where the custom class and phrase set will be created.
    parent = f"projects/{project_id}/locations/{location}"

    # Create the custom class resource
    adaptation_client.create_custom_class(
        {
            "parent": parent,
            "custom_class_id": custom_class_id,
            "custom_class": {
                "items": [
                    {"value": "sushido"},
                    {"value": "altura"},
                    {"value": "taneda"},
                ]
            },
        }
    )
    custom_class_name = (
        f"projects/{project_id}/locations/{location}/customClasses/{custom_class_id}"
    )
    # Create the phrase set resource
    phrase_set_response = adaptation_client.create_phrase_set(
        {
            "parent": parent,
            "phrase_set_id": phrase_set_id,
            "phrase_set": {
                "boost": 10,
                "phrases": [
                    {"value": f"Visit restaurants like ${{{custom_class_name}}}"}
                ],
            },
        }
    )
    phrase_set_name = phrase_set_response.name
    # The next section shows how to use the newly created custom
    # class and phrase set to send a transcription request with speech adaptation

    # Speech adaptation configuration
    speech_adaptation = speech.SpeechAdaptation(phrase_set_references=[phrase_set_name])

    # speech configuration object
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=24000,
        language_code="en-US",
        adaptation=speech_adaptation,
    )

    # The name of the audio file to transcribe
    # storage_uri URI for audio file in Cloud Storage, e.g. gs://[BUCKET]/[FILE]

    audio = speech.RecognitionAudio(uri=storage_uri)

    # Create the speech client
    speech_client = speech.SpeechClient()

    response = speech_client.recognize(config=config, audio=audio)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")