识别器

Speech-to-Text V2 支持名为识别器的 Google Cloud 资源。识别器代表存储的和可重复使用的识别配置。您可以使用它们对应用的转写或流量进行逻辑分组。

准备工作

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Speech-to-Text APIs.

    Enable the APIs

  5. Make sure that you have the following role or roles on the project: Cloud Speech Administrator

    Check for the roles

    1. In the Google Cloud console, go to the IAM page.

      Go to IAM
    2. Select the project.
    3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

    4. For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.

    Grant the roles

    1. In the Google Cloud console, go to the IAM page.

      进入 IAM
    2. 选择项目。
    3. 点击 授予访问权限
    4. 新的主账号字段中,输入您的用户标识符。 这通常是 Google 账号的电子邮件地址。

    5. 选择角色列表中,选择一个角色。
    6. 如需授予其他角色,请点击 添加其他角色,然后添加其他各个角色。
    7. 点击 Save(保存)。
    8. Install the Google Cloud CLI.
    9. To initialize the gcloud CLI, run the following command:

      gcloud init
    10. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

      Go to project selector

    11. Make sure that billing is enabled for your Google Cloud project.

    12. Enable the Speech-to-Text APIs.

      Enable the APIs

    13. Make sure that you have the following role or roles on the project: Cloud Speech Administrator

      Check for the roles

      1. In the Google Cloud console, go to the IAM page.

        Go to IAM
      2. Select the project.
      3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.

      4. For all rows that specify or include you, check the Role colunn to see whether the list of roles includes the required roles.

      Grant the roles

      1. In the Google Cloud console, go to the IAM page.

        进入 IAM
      2. 选择项目。
      3. 点击 授予访问权限
      4. 新的主账号字段中,输入您的用户标识符。 这通常是 Google 账号的电子邮件地址。

      5. 选择角色列表中,选择一个角色。
      6. 如需授予其他角色,请点击 添加其他角色,然后添加其他各个角色。
      7. 点击 Save(保存)。
      8. Install the Google Cloud CLI.
      9. To initialize the gcloud CLI, run the following command:

        gcloud init
      10. 客户端库可以使用应用默认凭据轻松进行 Google API 身份验证,并向这些 API 发送请求。借助应用默认凭据,您可以在本地测试应用并部署它,无需更改底层代码。如需了解详情,请参阅 使用客户端库时进行身份验证

      11. If you're using a local shell, then create local authentication credentials for your user account:

        gcloud auth application-default login

        You don't need to do this if you're using Cloud Shell.

      此外,请确保您已安装客户端库

      了解识别器

      识别器是可配置且可重复使用的识别配置。使用常用识别配置创建识别器有助于简化和减小识别请求的大小。

      识别器的核心元素是其默认配置。这是此识别器执行的每个识别请求的配置。您可以为每个请求替换此默认值。为给定识别器的各项请求保留所需功能的默认配置,同时为特定请求重写特定功能。

      尽可能频繁地重复使用识别器。为每个请求创建一个识别器会显著增加应用的延迟时间,并消耗资源配额。在集成和设置期间不要频繁创建识别器,而是可以随后将其重复用于识别请求。

      创建识别器

      以下示例展示了如何创建可用于发送识别请求的识别器:

      Python

      import os
      
      from google.cloud.speech_v2 import SpeechClient
      from google.cloud.speech_v2.types import cloud_speech
      
      PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
      
      
      def create_recognizer(recognizer_id: str) -> cloud_speech.Recognizer:
          """Сreates a recognizer with an unique ID and default recognition configuration.
          Args:
              recognizer_id (str): The unique identifier for the recognizer to be created.
          Returns:
              cloud_speech.Recognizer: The created recognizer object with configuration.
          """
          # Instantiates a client
          client = SpeechClient()
      
          request = cloud_speech.CreateRecognizerRequest(
              parent=f"projects/{PROJECT_ID}/locations/global",
              recognizer_id=recognizer_id,
              recognizer=cloud_speech.Recognizer(
                  default_recognition_config=cloud_speech.RecognitionConfig(
                      language_codes=["en-US"], model="long"
                  ),
              ),
          )
          # Sends the request to create a recognizer and waits for the operation to complete
          operation = client.create_recognizer(request=request)
          recognizer = operation.result()
      
          print("Created Recognizer:", recognizer.name)
          return recognizer
      
      

      使用现有识别器发送请求

      以下示例展示了如何使用同一识别器发送多个识别请求:

      Python

      import os
      
      from google.cloud.speech_v2 import SpeechClient
      from google.cloud.speech_v2.types import cloud_speech
      
      PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
      
      
      def transcribe_reuse_recognizer(
          audio_file: str,
          recognizer_id: str,
      ) -> cloud_speech.RecognizeResponse:
          """Transcribe an audio file using an existing recognizer.
          Args:
              audio_file (str): Path to the local audio file to be transcribed.
                  Example: "resources/audio.wav"
              recognizer_id (str): The ID of the existing recognizer to be used for transcription.
          Returns:
              cloud_speech.RecognizeResponse: The response containing the transcription results.
          """
          # Instantiates a client
          client = SpeechClient()
      
          # Reads a file as bytes
          with open(audio_file, "rb") as f:
              audio_content = f.read()
      
          request = cloud_speech.RecognizeRequest(
              recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/{recognizer_id}",
              content=audio_content,
          )
      
          # Transcribes the audio into text
          response = client.recognize(request=request)
      
          for result in response.results:
              print(f"Transcript: {result.alternatives[0].transcript}")
      
          return response
      
      

      启用识别器中的功能

      识别器可用于在识别过程中启用各种功能,例如自动加注标点符号脏话过滤

      以下示例展示了如何在识别器中启用自动加注标点符号功能,该示例在使用此识别器的识别请求中启用自动加注标点符号功能:

      Python

      
      from google.cloud.speech_v2 import SpeechClient
      from google.cloud.speech_v2.types import cloud_speech
      
      from google.api_core.exceptions import NotFound
      
      # Instantiates a client
      client = SpeechClient()
      
      # TODO(developer): Update and un-comment below line
      # PROJECT_ID = "your-project-id"
      # recognizer_id = "id-recognizer"
      recognizer_name = (
          f"projects/{PROJECT_ID}/locations/global/recognizers/{recognizer_id}"
      )
      try:
          # Use an existing recognizer
          recognizer = client.get_recognizer(name=recognizer_name)
          print("Using existing Recognizer:", recognizer.name)
      except NotFound:
          # Create a new recognizer
          request = cloud_speech.CreateRecognizerRequest(
              parent=f"projects/{PROJECT_ID}/locations/global",
              recognizer_id=recognizer_id,
              recognizer=cloud_speech.Recognizer(
                  default_recognition_config=cloud_speech.RecognitionConfig(
                      auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
                      language_codes=["en-US"],
                      model="latest_long",
                      features=cloud_speech.RecognitionFeatures(
                          enable_automatic_punctuation=True,
                      ),
                  ),
              ),
          )
          operation = client.create_recognizer(request=request)
          recognizer = operation.result()
          print("Created Recognizer:", recognizer.name)
      
      # Reads a file as bytes
      with open(audio_file, "rb") as f:
          audio_content = f.read()
      
      request = cloud_speech.RecognizeRequest(
          recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/{recognizer_id}",
          content=audio_content,
      )
      
      # Transcribes the audio into text
      response = client.recognize(request=request)
      
      for result in response.results:
          print(f"Transcript: {result.alternatives[0].transcript}")
      

      替换识别请求中的识别器功能

      以下示例展示了如何在识别器中启用多项功能,但为此识别请求停用自动加注标点符号功能:

      Python

      import os
      
      from google.cloud.speech_v2 import SpeechClient
      from google.cloud.speech_v2.types import cloud_speech
      from google.protobuf.field_mask_pb2 import FieldMask
      
      PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
      
      
      def transcribe_override_recognizer(
          audio_file: str,
          recognizer_id: str,
      ) -> cloud_speech.RecognizeResponse:
          """Transcribe an audio file using an existing recognizer with overridden settings for the recognition request.
          Args:
              audio_file (str): Path to the local audio file to be transcribed.
                  Example: "resources/audio.wav"
              recognizer_id (str): The unique ID of the recognizer to be used for transcription.
          Returns:
              cloud_speech.RecognizeResponse: The response containing the transcription results.
          """
          # Instantiates a client
          client = SpeechClient()
      
          request = cloud_speech.CreateRecognizerRequest(
              parent=f"projects/{PROJECT_ID}/locations/global",
              recognizer_id=recognizer_id,
              recognizer=cloud_speech.Recognizer(
                  default_recognition_config=cloud_speech.RecognitionConfig(
                      auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
                      language_codes=["en-US"],
                      model="latest_long",
                      features=cloud_speech.RecognitionFeatures(
                          enable_automatic_punctuation=True,
                          enable_word_time_offsets=True,
                      ),
                  ),
              ),
          )
      
          operation = client.create_recognizer(request=request)
          recognizer = operation.result()
      
          print("Created Recognizer:", recognizer.name)
      
          # Reads a file as bytes
          with open(audio_file, "rb") as f:
              audio_content = f.read()
      
          request = cloud_speech.RecognizeRequest(
              recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/{recognizer_id}",
              config=cloud_speech.RecognitionConfig(
                  features=cloud_speech.RecognitionFeatures(
                      enable_word_time_offsets=False,
                  ),
              ),
              config_mask=FieldMask(paths=["features.enable_word_time_offsets"]),
              content=audio_content,
          )
      
          # Transcribes the audio into text
          response = client.recognize(request=request)
      
          for result in response.results:
              print(f"Transcript: {result.alternatives[0].transcript}")
      
          return response
      
      

      在不使用识别器的情况下发送请求

      识别器在识别请求中是可选的。如需在不使用识别器的情况下发出请求,只需在发出请求的位置使用识别器资源 ID _。示例如下:

      Python

      import os
      
      from google.cloud.speech_v2 import SpeechClient
      from google.cloud.speech_v2.types import cloud_speech
      
      PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
      
      
      def quickstart_v2(audio_file: str) -> cloud_speech.RecognizeResponse:
          """Transcribe an audio file.
          Args:
              audio_file (str): Path to the local audio file to be transcribed.
          Returns:
              cloud_speech.RecognizeResponse: The response from the recognize request, containing
              the transcription results
          """
          # Reads a file as bytes
          with open(audio_file, "rb") as f:
              audio_content = f.read()
      
          # Instantiates a client
          client = SpeechClient()
      
          config = cloud_speech.RecognitionConfig(
              auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
              language_codes=["en-US"],
              model="long",
          )
      
          request = cloud_speech.RecognizeRequest(
              recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
              config=config,
              content=audio_content,
          )
      
          # Transcribes the audio into text
          response = client.recognize(request=request)
      
          for result in response.results:
              print(f"Transcript: {result.alternatives[0].transcript}")
      
          return response
      
      

      清理

      为避免因本页中使用的资源导致您的 Google Cloud 账号产生费用,请按照以下步骤操作。

      1. Optional: Revoke the authentication credentials that you created, and delete the local credential file.

        gcloud auth application-default revoke
      2. Optional: Revoke credentials from the gcloud CLI.

        gcloud auth revoke

      控制台

    14. In the Google Cloud console, go to the Manage resources page.

      Go to Manage resources

    15. In the project list, select the project that you want to delete, and then click Delete.
    16. In the dialog, type the project ID, and then click Shut down to delete the project.
    17. gcloud

      Delete a Google Cloud project:

      gcloud projects delete PROJECT_ID

      后续步骤