自动检测语言

本页面介绍如何设置识别器,以便自动根据可能的语言列表识别音频文件中所用的语言。

有些时候,您并不确定音频录音中会包含哪些语言。例如,如果您在具有多种官方语言的国家/地区发布服务、应用或产品,则可能会接收用户以多种语言提供的音频输入。这种情况下,为转录请求指定单独一种语言代码的难度很大。

多语言识别

Speech-to-Text 为您提供了一种方法,让您可以指定音频数据可能包含的一组语言。创建 Recognizer 或者发送识别请求时,可以在 language_codes 字段中提供音频数据可能包含的一种或多种语言。在具有多种语言的请求中,Speech-to-Text 会尝试使用您提供的备用语言列表中最适合的语言转写音频。随后,Speech-to-Text 会使用预测的语言代码标记转写结果。

此功能非常适合需要转写语音指令或搜索等简短语句的应用。您最多可以列出三种语言以进行自动语言识别。

准备工作

  1. 登录您的 Google Cloud 账号。如果您是 Google Cloud 新手,请创建一个账号来评估我们的产品在实际场景中的表现。新客户还可获享 $300 赠金,用于运行、测试和部署工作负载。
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. 确保您的 Google Cloud 项目已启用结算功能

  4. Enable the Speech-to-Text APIs.

    Enable the APIs

  5. Make sure that you have the following role or roles on the project: Cloud Speech Administrator

    Check for the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

    4. For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.

    Grant the roles

    1. In the Google Cloud console, go to the IAM page.

      前往 IAM
    2. 选择项目。
    3. 点击 授予访问权限
    4. 新的主账号字段中,输入您的用户标识符。 这通常是 Google 账号的电子邮件地址。

    5. 选择角色列表中,选择一个角色。
    6. 如需授予其他角色,请点击 添加其他角色,然后添加其他各个角色。
    7. 点击保存
    8. Install the Google Cloud CLI.
    9. To initialize the gcloud CLI, run the following command:

      gcloud init
    10. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

      Go to project selector

    11. 确保您的 Google Cloud 项目已启用结算功能

    12. Enable the Speech-to-Text APIs.

      Enable the APIs

    13. Make sure that you have the following role or roles on the project: Cloud Speech Administrator

      Check for the roles

      1. In the Google Cloud console, go to the IAM page.

        Go to IAM
      2. Select the project.
      3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

      4. For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.

      Grant the roles

      1. In the Google Cloud console, go to the IAM page.

        前往 IAM
      2. 选择项目。
      3. 点击 授予访问权限
      4. 新的主账号字段中,输入您的用户标识符。 这通常是 Google 账号的电子邮件地址。

      5. 选择角色列表中,选择一个角色。
      6. 如需授予其他角色,请点击 添加其他角色,然后添加其他各个角色。
      7. 点击保存
      8. Install the Google Cloud CLI.
      9. To initialize the gcloud CLI, run the following command:

        gcloud init
      10. 客户端库可以使用应用默认凭据轻松进行 Google API 身份验证,并向这些 API 发送请求。借助应用默认凭据,您可以在本地测试应用并部署它,无需更改底层代码。有关详情,请参阅使用客户端库进行身份验证

      11. If you're using a local shell, then create local authentication credentials for your user account:

        gcloud auth application-default login

        You don't need to do this if you're using Cloud Shell.

      此外,请确保您已安装客户端库

      在音频转录请求中启用语言识别

      下面的示例展示了如何对采用多种语言的本地音频文件执行同步语音识别。

      Python

      import os
      
      from typing import List
      
      from google.cloud.speech_v2 import SpeechClient
      from google.cloud.speech_v2.types import cloud_speech
      
      PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
      
      
      def transcribe_multiple_languages_v2(
          audio_file: str,
          language_codes: List[str],
      ) -> cloud_speech.RecognizeResponse:
          """Transcribe an audio file using Google Cloud Speech-to-Text API with support for multiple languages.
          Args:
              audio_file (str): Path to the local audio file to be transcribed.
                  Example: "resources/audio.wav"
              language_codes (List[str]): A list of BCP-47 language codes to be used for transcription.
                  Example: ["en-US", "fr-FR"]
          Returns:
              cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing the
                  transcription results.
          """
          client = SpeechClient()
      
          # Reads a file as bytes
          with open(audio_file, "rb") as f:
              audio_content = f.read()
      
          config = cloud_speech.RecognitionConfig(
              auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
              language_codes=language_codes,
              model="latest_long",
          )
      
          request = cloud_speech.RecognizeRequest(
              recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
              config=config,
              content=audio_content,
          )
      
          # Transcribes the audio into text
          response = client.recognize(request=request)
          # Prints the transcription results
          for result in response.results:
              print(f"Transcript: {result.alternatives[0].transcript}")
      
          return response
      
      

      清理

      为避免因本页中使用的资源导致您的 Google Cloud 账号产生费用,请按照以下步骤操作。

      1. Optional: Revoke the authentication credentials that you created, and delete the local credential file.

        gcloud auth application-default revoke
      2. Optional: Revoke credentials from the gcloud CLI.

        gcloud auth revoke

      控制台

    14. In the Google Cloud console, go to the Manage resources page.

      Go to Manage resources

    15. In the project list, select the project that you want to delete, and then click Delete.
    16. In the dialog, type the project ID, and then click Shut down to delete the project.
    17. gcloud

      Delete a Google Cloud project:

      gcloud projects delete PROJECT_ID

      后续步骤