Speech-to-Text Client Libraries

This page shows how to get started with the new Cloud Client Libraries for the Speech-to-Text API. Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained.

Installing the client library

C#

Install-Package Google.Cloud.Speech.V1 -Pre

Go

go get -u cloud.google.com/go/speech/apiv1

Java

If you are using Maven, add this to your pom.xml file:
<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-speech</artifactId>
  <version>0.49.0-alpha</version>
</dependency>
If you are using Gradle, add this to your dependencies:
compile 'com.google.cloud:google-cloud-speech:0.49.0-alpha'
If you are using SBT, add this to your dependencies:
libraryDependencies += "com.google.cloud" % "google-cloud-speech" % "0.49.0-alpha"

Node.js

npm install --save @google-cloud/speech

PHP

composer require google/cloud-speech

Python

For more on setting up your Python development environment, refer to Python Development Environment Setup Guide.
pip install --upgrade google-cloud-speech

Ruby

gem install google-cloud-speech

Setting up authentication

To run the client library, you must first set up authentication by creating a service account and setting an environment variable.

GCP Console

  1. Go to the Create service account key page in the GCP Console.

    Go to the Create Service Account Key page
  2. From the Service account drop-down list, select New service account.
  3. Enter a name into the Service account name field.
  4. From the Role drop-down list, select Project > Owner.

    Note: The Role field authorizes your service account to access resources. You can view and change this field later using GCP Console. If you are developing a production application, specify more granular permissions than Project > Owner. For more information, see granting roles to service accounts.
  5. Click Create. A JSON file that contains your key downloads to your computer.

Command line

You can run the following commands using the Cloud SDK on your local machine, or within Cloud Shell.

  1. Create the service account. Replace [NAME] with your desired service account name.

    gcloud iam service-accounts create [NAME]
  2. Grant permissions to the service account. Replace [PROJECT_ID] with your project ID.

    gcloud projects add-iam-policy-binding [PROJECT_ID] --member "serviceAccount:[NAME]@[PROJECT_ID].iam.gserviceaccount.com" --role "roles/owner"
    Note: The Role field authorizes your service account to access resources. You can view and change this field later using GCP Console. If you are developing a production application, specify more granular permissions than Project > Owner. For more information, see granting roles to service accounts.
  3. Generate the key file. Replace [FILE_NAME] with a name for the key file.

    gcloud iam service-accounts keys create [FILE_NAME].json --iam-account [NAME]@[PROJECT_ID].iam.gserviceaccount.com

Provide authentication credentials to your application code by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS. Replace [PATH] with the file path of the JSON file that contains your service account key, and [FILE_NAME] with the filename.

Linux or macOS

export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

For example:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"

Windows

With PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

For example:

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\[FILE_NAME].json"

With command prompt:

set GOOGLE_APPLICATION_CREDENTIALS=[PATH]

Using the client library

The following example shows how to use the client library.

C#

Read more in the C# API Reference Documentation for the Speech-to-Text API Client Library.

using Google.Cloud.Speech.V1;
using System;

namespace GoogleCloudSamples
{
    public class QuickStart
    {
        public static void Main(string[] args)
        {
            var speech = SpeechClient.Create();
            var response = speech.Recognize(new RecognitionConfig()
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
                SampleRateHertz = 16000,
                LanguageCode = "en",
            }, RecognitionAudio.FromFile("audio.raw"));
            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    Console.WriteLine(alternative.Transcript);
                }
            }
        }
    }
}

Go

Read more in the Go API Reference Documentation for the Speech-to-Text API Client Library.

// Sample speech-quickstart uses the Google Cloud Speech API to transcribe
// audio.
package main

import (
	"fmt"
	"io/ioutil"
	"log"

	// Imports the Google Cloud Speech API client package.
	"golang.org/x/net/context"

	speech "cloud.google.com/go/speech/apiv1"
	speechpb "google.golang.org/genproto/googleapis/cloud/speech/v1"
)

func main() {
	ctx := context.Background()

	// Creates a client.
	client, err := speech.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}

	// Sets the name of the audio file to transcribe.
	filename := "/path/to/audio.raw"

	// Reads the audio file into memory.
	data, err := ioutil.ReadFile(filename)
	if err != nil {
		log.Fatalf("Failed to read file: %v", err)
	}

	// Detects speech in the audio file.
	resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		},
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Content{Content: data},
		},
	})
	if err != nil {
		log.Fatalf("failed to recognize: %v", err)
	}

	// Prints the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
		}
	}
}

Java

Read more in the Java API Reference Documentation for the Speech-to-Text API Client Library.

// Imports the Google Cloud client library
import com.google.cloud.speech.v1p1beta1.RecognitionAudio;
import com.google.cloud.speech.v1p1beta1.RecognitionConfig;
import com.google.cloud.speech.v1p1beta1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1p1beta1.RecognizeResponse;
import com.google.cloud.speech.v1p1beta1.SpeechClient;
import com.google.cloud.speech.v1p1beta1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1p1beta1.SpeechRecognitionResult;
import com.google.protobuf.ByteString;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

public class QuickstartSample {

  /**
   * Demonstrates using the Speech API to transcribe an audio file.
   */
  public static void main(String... args) throws Exception {
    // Instantiates a client
    try (SpeechClient speechClient = SpeechClient.create()) {

      // The path to the audio file to transcribe
      String fileName = "./resources/audio.raw";

      // Reads the audio file into memory
      Path path = Paths.get(fileName);
      byte[] data = Files.readAllBytes(path);
      ByteString audioBytes = ByteString.copyFrom(data);

      // Builds the sync recognize request
      RecognitionConfig config = RecognitionConfig.newBuilder()
          .setEncoding(AudioEncoding.LINEAR16)
          .setSampleRateHertz(16000)
          .setLanguageCode("en-US")
          .build();
      RecognitionAudio audio = RecognitionAudio.newBuilder()
          .setContent(audioBytes)
          .build();

      // Performs speech recognition on the audio file
      RecognizeResponse response = speechClient.recognize(config, audio);
      List<SpeechRecognitionResult> results = response.getResultsList();

      for (SpeechRecognitionResult result : results) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
        System.out.printf("Transcription: %s%n", alternative.getTranscript());
      }
    }
  }
}

Node.js

Read more in the Node.js API Reference Documentation for the Speech-to-Text API Client Library.

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');
const fs = require('fs');

// Creates a client
const client = new speech.SpeechClient();

// The name of the audio file to transcribe
const fileName = './resources/audio.raw';

// Reads a local audio file and converts it to base64
const file = fs.readFileSync(fileName);
const audioBytes = file.toString('base64');

// The audio file's encoding, sample rate in hertz, and BCP-47 language code
const audio = {
  content: audioBytes,
};
const config = {
  encoding: 'LINEAR16',
  sampleRateHertz: 16000,
  languageCode: 'en-US',
};
const request = {
  audio: audio,
  config: config,
};

// Detects speech in the audio file
client
  .recognize(request)
  .then(data => {
    const response = data[0];
    const transcription = response.results
      .map(result => result.alternatives[0].transcript)
      .join('\n');
    console.log(`Transcription: ${transcription}`);
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

Read more in the PHP API Reference Documentation for the Speech-to-Text API Client Library.

# Includes the autoloader for libraries installed with composer
require __DIR__ . '/vendor/autoload.php';

# Imports the Google Cloud client library
use Google\Cloud\Speech\SpeechClient;

# Your Google Cloud Platform project ID
$projectId = 'YOUR_PROJECT_ID';

# Instantiates a client
$speech = new SpeechClient([
    'projectId' => $projectId,
    'languageCode' => 'en-US',
]);

# The name of the audio file to transcribe
$fileName = __DIR__ . '/resources/audio.raw';

# The audio file's encoding and sample rate
$options = [
    'encoding' => 'LINEAR16',
    'sampleRateHertz' => 16000,
];

# Detects speech in the audio file
$results = $speech->recognize(fopen($fileName, 'r'), $options);

foreach ($results as $result) {
    echo 'Transcription: ' . $result->alternatives()[0]['transcript'] . PHP_EOL;
}

Python

Read more in the Python API Reference Documentation for the Speech-to-Text API Client Library.

import io
import os

# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

# Instantiates a client
client = speech.SpeechClient()

# The name of the audio file to transcribe
file_name = os.path.join(
    os.path.dirname(__file__),
    'resources',
    'audio.raw')

# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code='en-US')

# Detects speech in the audio file
response = client.recognize(config, audio)

for result in response.results:
    print('Transcript: {}'.format(result.alternatives[0].transcript))

Ruby

Read more in the Ruby API Reference Documentation for the Speech-to-Text API Client Library.

# Imports the Google Cloud client library
require "google/cloud/speech"

# Your Google Cloud Platform project ID
project_id = "YOUR_PROJECT_ID"

# Instantiates a client
speech = Google::Cloud::Speech.new project: project_id

# The name of the audio file to transcribe
file_name = "./audio_files/audio.raw"

# The audio file's encoding and sample rate
audio = speech.audio file_name, encoding:    :linear16,
                                sample_rate: 16000,
                                language:    "en-US"

# Detects speech in the audio file
results = audio.recognize

# Each result represents a consecutive portion of the audio
results.each do |result|
  puts "Transcription: #{result.transcript}"
end

Additional Resources

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Speech-to-Text API