Cloud Speech-to-Text Client Libraries

This page shows how to get started with the Cloud Client Libraries for the Speech-to-Text API. Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained.

Installing the client library

C#

For more information, see Setting Up a C# Development Environment.
Install-Package Google.Cloud.Speech.V1 -Pre

Go

go get -u cloud.google.com/go/speech/apiv1

Java

For more information, see Setting Up a Java Development Environment. 如果您是使用 Maven,請在 pom.xml 檔案中新增以下指令:
<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-speech</artifactId>
  <version>1.5.0</version>
</dependency>
如果您是使用 Gradle,請在相依元件中新增以下指令:
compile 'com.google.cloud:google-cloud-speech:1.5.0'
如果您是使用 SBT,請在相依元件中新增以下指令:
libraryDependencies += "com.google.cloud" % "google-cloud-speech" % "1.5.0"

如果您使用 IntelliJ 或 Eclipse,可以使用以下 IDE 外掛程式將用戶端程式庫加入專案中:

外掛程式還提供其他功能,例如服務帳戶適用的金鑰管理。詳情請參閱各外掛程式的說明文件。

Node.js

For more information, see Setting Up a Node.js Development Environment.
npm install --save @google-cloud/speech

PHP

composer require google/cloud-speech

Python

For more information, see Setting Up a Python Development Environment.
pip install --upgrade google-cloud-speech

Ruby

For more information, see Setting Up a Ruby Development Environment.
gem install google-cloud-speech

Setting up authentication

To run the client library, you must first set up authentication by creating a service account and setting an environment variable. Complete the following steps to set up authentication. For other ways to authenticate, see the GCP authentication documentation.

GCP 主控台

  1. 在 GCP 主控台中,前往「Create service account key」(建立服務帳戶金鑰) 頁面。

    前往「Create Service Account Key」(建立服務帳戶金鑰) 頁面
  2. 從 [Service account] (服務帳戶) 清單中選取 [New service account] (新增服務帳戶)
  3. 在 [Service account name] (服務帳戶名稱) 欄位中輸入一個名稱。
  4. 從 [Role] (角色) 清單中,選取 [Project] (專案) > [Owner] (擁有者)

    附註:「Role」(角色) 欄位會授權服務帳戶存取資源。以後您可以使用 GCP 主控台查看及變更這個欄位。如果您要開發正式版應用程式,請指定比 [Project] (專案) > [Owner] (擁有者) 更精細的權限。詳情請參閱為服務帳戶授予角色一文。
  5. 點選 [建立]。一個包含您金鑰的 JSON 檔案會下載到電腦中。

指令列

您可以使用本機電腦上的 Cloud SDK,或在 Cloud Shell 內執行下列指令。

  1. 建立服務帳戶。將 [NAME] 換成服務帳戶的名稱。

    gcloud iam service-accounts create [NAME]
  2. 向服務帳戶授予權限。用您的專案 ID 取代 [PROJECT_ID]

    gcloud projects add-iam-policy-binding [PROJECT_ID] --member "serviceAccount:[NAME]@[PROJECT_ID].iam.gserviceaccount.com" --role "roles/owner"
    附註:「Role」(角色) 欄位會授權服務帳戶存取資源。您稍後可以使用 GCP 主控台查看及變更這個欄位。如果您要開發正式版應用程式,請指定比 [Project] (專案) > [Owner] (擁有者) 更精細的權限。詳情請參閱為服務帳戶授予角色一文。
  3. 產生金鑰檔案。用金鑰檔案的名稱取代 [FILE_NAME]

    gcloud iam service-accounts keys create [FILE_NAME].json --iam-account [NAME]@[PROJECT_ID].iam.gserviceaccount.com

設定環境變數 GOOGLE_APPLICATION_CREDENTIALS 來為應用程式程式碼提供驗證憑證。 將 [PATH] 改成包含您的服務帳戶金鑰的 JSON 檔案路徑,並將 [FILE_NAME] 改成檔案名稱。 此變數僅適用於您目前的殼層工作階段,所以如果您開啟新的工作階段,請再次設定變數。

Linux 或 macOS

export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

例如:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"

Windows

使用 PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

例如:

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\[FILE_NAME].json"

使用命令提示字元:

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]

Using the client library

The following example shows how to use the client library.

C#


using Google.Cloud.Speech.V1;
using System;

namespace GoogleCloudSamples
{
    public class QuickStart
    {
        // The name of the local audio file to transcribe
        public static string DEMO_FILE = "audio.raw";
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var response = speech.Recognize(new RecognitionConfig()
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
                SampleRateHertz = 16000,
                LanguageCode = "en",
            }, RecognitionAudio.FromFile(DEMO_FILE));
            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine(alternative.Transcript);
                }
            }
        }
    }
}

Go


// Sample speech-quickstart uses the Google Cloud Speech API to transcribe
// audio.
package main

import (
	"context"
	"fmt"
	"io/ioutil"
	"log"

	speech "cloud.google.com/go/speech/apiv1"
	speechpb "google.golang.org/genproto/googleapis/cloud/speech/v1"
)

func main() {
	ctx := context.Background()

	// Creates a client.
	client, err := speech.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}

	// Sets the name of the audio file to transcribe.
	filename := "/path/to/audio.raw"

	// Reads the audio file into memory.
	data, err := ioutil.ReadFile(filename)
	if err != nil {
		log.Fatalf("Failed to read file: %v", err)
	}

	// Detects speech in the audio file.
	resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		},
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Content{Content: data},
		},
	})
	if err != nil {
		log.Fatalf("failed to recognize: %v", err)
	}

	// Prints the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
		}
	}
}

Java

// Imports the Google Cloud client library
import com.google.cloud.speech.v1.RecognitionAudio;
import com.google.cloud.speech.v1.RecognitionConfig;
import com.google.cloud.speech.v1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1.RecognizeResponse;
import com.google.cloud.speech.v1.SpeechClient;
import com.google.cloud.speech.v1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1.SpeechRecognitionResult;
import com.google.protobuf.ByteString;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

public class QuickstartSample {

  /**
   * Demonstrates using the Speech API to transcribe an audio file.
   */
  public static void main(String... args) throws Exception {
    // Instantiates a client
    try (SpeechClient speechClient = SpeechClient.create()) {

      // The path to the audio file to transcribe
      String fileName = "./resources/audio.raw";

      // Reads the audio file into memory
      Path path = Paths.get(fileName);
      byte[] data = Files.readAllBytes(path);
      ByteString audioBytes = ByteString.copyFrom(data);

      // Builds the sync recognize request
      RecognitionConfig config = RecognitionConfig.newBuilder()
          .setEncoding(AudioEncoding.LINEAR16)
          .setSampleRateHertz(16000)
          .setLanguageCode("en-US")
          .build();
      RecognitionAudio audio = RecognitionAudio.newBuilder()
          .setContent(audioBytes)
          .build();

      // Performs speech recognition on the audio file
      RecognizeResponse response = speechClient.recognize(config, audio);
      List<SpeechRecognitionResult> results = response.getResultsList();

      for (SpeechRecognitionResult result : results) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
        System.out.printf("Transcription: %s%n", alternative.getTranscript());
      }
    }
  }
}

Node.js

async function main() {
  // Imports the Google Cloud client library
  const speech = require('@google-cloud/speech');
  const fs = require('fs');

  // Creates a client
  const client = new speech.SpeechClient();

  // The name of the audio file to transcribe
  const fileName = './resources/audio.raw';

  // Reads a local audio file and converts it to base64
  const file = fs.readFileSync(fileName);
  const audioBytes = file.toString('base64');

  // The audio file's encoding, sample rate in hertz, and BCP-47 language code
  const audio = {
    content: audioBytes,
  };
  const config = {
    encoding: 'LINEAR16',
    sampleRateHertz: 16000,
    languageCode: 'en-US',
  };
  const request = {
    audio: audio,
    config: config,
  };

  // Detects speech in the audio file
  const [response] = await client.recognize(request);
  const transcription = response.results
    .map(result => result.alternatives[0].transcript)
    .join('\n');
  console.log(`Transcription: ${transcription}`);
}
main().catch(console.error);

PHP

# Includes the autoloader for libraries installed with composer
require __DIR__ . '/vendor/autoload.php';

# Imports the Google Cloud client library
use Google\Cloud\Speech\V1\SpeechClient;
use Google\Cloud\Speech\V1\RecognitionAudio;
use Google\Cloud\Speech\V1\RecognitionConfig;
use Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding;

# The name of the audio file to transcribe
$audioFile = __DIR__ . '/test/data/audio32KHz.raw';

# get contents of a file into a string
$content = file_get_contents($audioFile);

# set string as audio content
$audio = (new RecognitionAudio())
    ->setContent($content);

# The audio file's encoding, sample rate and language
$config = new RecognitionConfig([
    'encoding' => AudioEncoding::LINEAR16,
    'sample_rate_hertz' => 32000,
    'language_code' => 'en-US'
]);

# Instantiates a client
$client = new SpeechClient();

# Detects speech in the audio file
$response = $client->recognize($config, $audio);

# Print most likely transcription
foreach ($response->getResults() as $result) {
    $alternatives = $result->getAlternatives();
    $mostLikely = $alternatives[0];
    $transcript = $mostLikely->getTranscript();
    printf('Transcript: %s' . PHP_EOL, $transcript);
}

$client->close();

Python

import io
import os

# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

# Instantiates a client
client = speech.SpeechClient()

# The name of the audio file to transcribe
file_name = os.path.join(
    os.path.dirname(__file__),
    'resources',
    'audio.raw')

# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code='en-US')

# Detects speech in the audio file
response = client.recognize(config, audio)

for result in response.results:
    print('Transcript: {}'.format(result.alternatives[0].transcript))

Ruby

# Imports the Google Cloud client library
require "google/cloud/speech"

# Instantiates a client
speech = Google::Cloud::Speech.new

# The name of the audio file to transcribe
file_name = "./resources/audio.raw"

# The raw audio
audio_file = File.binread file_name

# The audio file's encoding and sample rate
config = { encoding:          :LINEAR16,
           sample_rate_hertz: 16_000,
           language_code:     "en-US" }
audio  = { content: audio_file }

# Detects speech in the audio file
response = speech.recognize config, audio

results = response.results

# Get first result because we only processed a single audio file
# Each result represents a consecutive portion of the audio
results.first.alternatives.each do |alternatives|
  puts "Transcription: #{alternatives.transcript}"
end

Additional resources

本頁內容對您是否有任何幫助?請提供意見:

傳送您對下列選項的寶貴意見...

這個網頁
Cloud Speech-to-Text Documentation
需要協助嗎?請前往我們的支援網頁