Speech-to-Text 客户端库

}

本页面介绍了如何开始使用 Speech-to-Text API 的 Cloud 客户端库。通过客户端库,您可以更轻松地使用支持的语言访问 Google Cloud API。虽然您可以通过向服务器发出原始请求来直接使用 Google Cloud API,但客户端库可实现简化,从而显著减少您需要编写的代码量。

请参阅客户端库说明,详细了解 Cloud 客户端库和旧版 Google API 客户端库。

安装客户端库

C#

Install-Package Google.Cloud.Speech.V2

如需了解详情,请参阅设置 C# 开发环境

Go

go get cloud.google.com/go/speech/apiv2

如需了解详情,请参阅设置 Go 开发环境

Java

如果您使用的是 Maven,请将以下代码添加到您的 pom.xml 文件中。如需详细了解 BOM,请参阅 Google Cloud Platform 库 BOM

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.37.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-speech</artifactId>
  </dependency>

如果您使用的是 Gradle,请将以下代码添加到您的依赖项中:

implementation 'com.google.cloud:google-cloud-speech:4.36.0'

如果您使用的是 sbt,请将以下代码添加到您的依赖项中:

libraryDependencies += "com.google.cloud" % "google-cloud-speech" % "4.36.0"

如果您使用的是 Visual Studio Code、IntelliJ 或 Eclipse,可以通过以下 IDE 插件将客户端库添加到您的项目中:

上述插件还提供其他功能,例如服务账号密钥管理。如需了解详情,请参阅各个插件相应的文档。

如需了解详情,请参阅设置 Java 开发环境

Node.js

npm install --save @google-cloud/speech

如需了解详情,请参阅设置 Node.js 开发环境

PHP

composer require google/cloud/speech

如需了解详情,请参阅在 Google Cloud 上使用 PHP

Python

pip install --upgrade google-cloud-speech

如需了解详情,请参阅设置 Python 开发环境

Ruby

gem install google-cloud-speech

如需了解详情,请参阅设置 Ruby 开发环境

设置身份验证

为了对 Google Cloud API 的调用进行身份验证,客户端库支持应用默认凭据 (ADC);这些库会在一组指定的位置查找凭据,并使用这些凭据对发送到 API 的请求进行身份验证。借助 ADC,您可以在各种环境(例如本地开发或生产环境)中为您的应用提供凭据,而无需修改应用代码。

对于生产环境,设置 ADC 的方式取决于服务和上下文。如需了解详情,请参阅设置应用默认凭据

对于本地开发环境,您可以使用与您的 Google 账号关联的凭据设置 ADC:

  1. 安装并初始化 gcloud CLI

    初始化 gcloud CLI 时,请务必指定您在其中有权访问应用所需的资源的 Google Cloud 项目。

  2. 创建凭据文件:

    gcloud auth application-default login

    登录屏幕随即出现。在您登录后,您的凭据会存储在 ADC 使用的本地凭据文件中。

使用客户端库

以下示例展示了如何使用客户端库。

Java

// Imports the Google Cloud client library
import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.speech.v2.AutoDetectDecodingConfig;
import com.google.cloud.speech.v2.CreateRecognizerRequest;
import com.google.cloud.speech.v2.OperationMetadata;
import com.google.cloud.speech.v2.RecognitionConfig;
import com.google.cloud.speech.v2.RecognizeRequest;
import com.google.cloud.speech.v2.RecognizeResponse;
import com.google.cloud.speech.v2.Recognizer;
import com.google.cloud.speech.v2.SpeechClient;
import com.google.cloud.speech.v2.SpeechRecognitionAlternative;
import com.google.cloud.speech.v2.SpeechRecognitionResult;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.ExecutionException;

public class QuickstartSampleV2 {

  public static void main(String[] args) throws IOException, ExecutionException,
      InterruptedException {
    String projectId = "my-project-id";
    String filePath = "path/to/audioFile.raw";
    String recognizerId = "my-recognizer-id";
    quickstartSampleV2(projectId, filePath, recognizerId);
  }

  public static void quickstartSampleV2(String projectId, String filePath, String recognizerId)
      throws IOException, ExecutionException, InterruptedException {

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (SpeechClient speechClient = SpeechClient.create()) {
      Path path = Paths.get(filePath);
      byte[] data = Files.readAllBytes(path);
      ByteString audioBytes = ByteString.copyFrom(data);

      String parent = String.format("projects/%s/locations/global", projectId);

      // First, create a recognizer
      Recognizer recognizer = Recognizer.newBuilder()
          .setModel("latest_long")
          .addLanguageCodes("en-US")
          .build();

      CreateRecognizerRequest createRecognizerRequest = CreateRecognizerRequest.newBuilder()
          .setParent(parent)
          .setRecognizerId(recognizerId)
          .setRecognizer(recognizer)
          .build();

      OperationFuture<Recognizer, OperationMetadata> operationFuture =
          speechClient.createRecognizerAsync(createRecognizerRequest);
      recognizer = operationFuture.get();

      // Next, create the transcription request
      RecognitionConfig recognitionConfig = RecognitionConfig.newBuilder()
          .setAutoDecodingConfig(AutoDetectDecodingConfig.newBuilder().build())
          .build();

      RecognizeRequest request = RecognizeRequest.newBuilder()
          .setConfig(recognitionConfig)
          .setRecognizer(recognizer.getName())
          .setContent(audioBytes)
          .build();

      RecognizeResponse response = speechClient.recognize(request);
      List<SpeechRecognitionResult> results = response.getResultsList();

      for (SpeechRecognitionResult result : results) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        if (result.getAlternativesCount() > 0) {
          SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
          System.out.printf("Transcription: %s%n", alternative.getTranscript());
        }
      }
    }
  }
}

Python

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

def quickstart_v2(
    project_id: str,
    audio_file: str,
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file."""
    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{project_id}/locations/global/recognizers/_",
        config=config,
        content=content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

其他资源

C#

以下列表包含与 C# 版客户端库相关的更多资源的链接:

Go

以下列表包含与 Go 版客户端库相关的更多资源的链接:

Java

以下列表包含与 Java 版客户端库相关的更多资源的链接:

Node.js

以下列表包含与 Node.js 版客户端库相关的更多资源的链接:

PHP

以下列表包含与 PHP 版客户端库相关的更多资源的链接:

Python

以下列表包含与 Python 版客户端库相关的更多资源的链接:

Ruby

以下列表包含与 Ruby 版客户端库相关的更多资源的链接: