{% setvar launch_type %}api{% endsetvar %} {% setvar launch_name %}Cloud Speech-to-Text API{% endsetvar %} {% setvar info_params %}realtime_warning{% endsetvar %}

Quickstart: Using Client Libraries

This page shows you how to send a speech recognition request to Cloud Speech-to-Text in your favorite programming language using the Google Cloud Client Libraries.

Cloud Speech-to-Text enables easy integration of Google speech recognition technologies into developer applications. You can send audio data to the Speech-to-Text API, which then returns a text transcription of that audio file. For more information about the service, see Cloud Speech-to-Text basics.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Set up a GCP Console project.

    Set up a project

    Click to:

    • Create or select a project.
    • Enable the Google Speech-to-Text API for that project.
    • Create a service account.
    • Download a private key as JSON.

    You can view and manage these resources at any time in the GCP Console.

  3. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

  4. Install and initialize the Cloud SDK.

Install the client library


Install-Package Google.Cloud.Speech.V1 -Pre


go get -u cloud.google.com/go/speech/apiv1




Before installing the library, make sure you've prepared your environment for Node.js development.

npm install --save @google-cloud/speech


composer require google/cloud-speech


Before installing the library, make sure you've prepared your environment for Python development.

pip install --upgrade google-cloud-speech


gem install google-cloud-speech

Make an audio transcription request

Now you can use Speech-to-Text to transcribe an audio file to text. Use the following code to send a recognize request to the Speech-to-Text API.


using Google.Cloud.Speech.V1;
using System;

namespace GoogleCloudSamples
    public class QuickStart
        public static void Main(string[] args)
            var speech = SpeechClient.Create();
            var response = speech.Recognize(new RecognitionConfig()
                Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
                SampleRateHertz = 16000,
                LanguageCode = "en",
            }, RecognitionAudio.FromFile("audio.raw"));
            foreach (var result in response.Results)
                foreach (var alternative in result.Alternatives)


// Sample speech-quickstart uses the Google Cloud Speech API to transcribe
// audio.
package main

import (

	speech "cloud.google.com/go/speech/apiv1"
	speechpb "google.golang.org/genproto/googleapis/cloud/speech/v1"

func main() {
	ctx := context.Background()

	// Creates a client.
	client, err := speech.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)

	// Sets the name of the audio file to transcribe.
	filename := "/path/to/audio.raw"

	// Reads the audio file into memory.
	data, err := ioutil.ReadFile(filename)
	if err != nil {
		log.Fatalf("Failed to read file: %v", err)

	// Detects speech in the audio file.
	resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Content{Content: data},
	if err != nil {
		log.Fatalf("failed to recognize: %v", err)

	// Prints the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)


// Imports the Google Cloud client library
import com.google.cloud.speech.v1p1beta1.RecognitionAudio;
import com.google.cloud.speech.v1p1beta1.RecognitionConfig;
import com.google.cloud.speech.v1p1beta1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1p1beta1.RecognizeResponse;
import com.google.cloud.speech.v1p1beta1.SpeechClient;
import com.google.cloud.speech.v1p1beta1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1p1beta1.SpeechRecognitionResult;
import com.google.protobuf.ByteString;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

public class QuickstartSample {

   * Demonstrates using the Speech API to transcribe an audio file.
  public static void main(String... args) throws Exception {
    // Instantiates a client
    try (SpeechClient speechClient = SpeechClient.create()) {

      // The path to the audio file to transcribe
      String fileName = "./resources/audio.raw";

      // Reads the audio file into memory
      Path path = Paths.get(fileName);
      byte[] data = Files.readAllBytes(path);
      ByteString audioBytes = ByteString.copyFrom(data);

      // Builds the sync recognize request
      RecognitionConfig config = RecognitionConfig.newBuilder()
      RecognitionAudio audio = RecognitionAudio.newBuilder()

      // Performs speech recognition on the audio file
      RecognizeResponse response = speechClient.recognize(config, audio);
      List<SpeechRecognitionResult> results = response.getResultsList();

      for (SpeechRecognitionResult result : results) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
        System.out.printf("Transcription: %s%n", alternative.getTranscript());


Before running the example, make sure you've prepared your environment for Node.js development.

async function main() {
  // Imports the Google Cloud client library
  const speech = require('@google-cloud/speech');
  const fs = require('fs');

  // Creates a client
  const client = new speech.SpeechClient();

  // The name of the audio file to transcribe
  const fileName = './resources/audio.raw';

  // Reads a local audio file and converts it to base64
  const file = fs.readFileSync(fileName);
  const audioBytes = file.toString('base64');

  // The audio file's encoding, sample rate in hertz, and BCP-47 language code
  const audio = {
    content: audioBytes,
  const config = {
    encoding: 'LINEAR16',
    sampleRateHertz: 16000,
    languageCode: 'en-US',
  const request = {
    audio: audio,
    config: config,

  // Detects speech in the audio file
  const [response] = await client.recognize(request);
  const transcription = response.results
    .map(result => result.alternatives[0].transcript)
  console.log(`Transcription: ${transcription}`);


# Includes the autoloader for libraries installed with composer
require __DIR__ . '/vendor/autoload.php';

# Imports the Google Cloud client library
use Google\Cloud\Speech\SpeechClient;

# Your Google Cloud Platform project ID
$projectId = 'YOUR_PROJECT_ID';

# Instantiates a client
$speech = new SpeechClient([
    'projectId' => $projectId,
    'languageCode' => 'en-US',

# The name of the audio file to transcribe
$fileName = __DIR__ . '/resources/audio.raw';

# The audio file's encoding and sample rate
$options = [
    'encoding' => 'LINEAR16',
    'sampleRateHertz' => 16000,

# Detects speech in the audio file
$results = $speech->recognize(fopen($fileName, 'r'), $options);

foreach ($results as $result) {
    echo 'Transcription: ' . $result->alternatives()[0]['transcript'] . PHP_EOL;


Before running the example, make sure you've prepared your environment for Python development.

import io
import os

# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

# Instantiates a client
client = speech.SpeechClient()

# The name of the audio file to transcribe
file_name = os.path.join(

# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

config = types.RecognitionConfig(

# Detects speech in the audio file
response = client.recognize(config, audio)

for result in response.results:
    print('Transcript: {}'.format(result.alternatives[0].transcript))


# Imports the Google Cloud client library
require "google/cloud/speech"

# Instantiates a client
speech = Google::Cloud::Speech.new

# The name of the audio file to transcribe
file_name = "./resources/audio.raw"

# The raw audio
audio_file = File.binread file_name

# The audio file's encoding and sample rate
config = { encoding:          :LINEAR16,
           sample_rate_hertz: 16000,
           language_code:     "en-US"   }
audio  = { content: audio_file }

# Detects speech in the audio file
response = speech.recognize config, audio

results = response.results

# Get first result because we only processed a single audio file
# Each result represents a consecutive portion of the audio
results.first.alternatives.each do |alternatives|
  puts "Transcription: #{alternatives.transcript}"

Congratulations! You've sent your first request to Speech-to-Text .

Clean up

To avoid incurring charges to your GCP account for the resources used in this quickstart:

  • Use the GCP Console to delete your project if you do not need it.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Speech-to-Text API