Performing Asynchronous Speech Recognition

Asynchronous Speech Recognition starts a long running audio processing operation.

Use asynchronous speech recognition to recognize audio that is longer than a minute and stored in Google Cloud Storage. For shorter audio, including audio stored locally (inline), Synchronous Speech Recognition is faster and simpler.

You can retrieve the results of the operation via the google.longrunning.Operations interface. Audio content can be sent directly to the Cloud Speech API or the Cloud Speech API can process audio content that already resides in Google Cloud Storage. See also the audio limits for asynchronous speech recognition requests.

The Cloud Speech API v1 is officially released and is generally available from the https://speech.googleapis.com/v1/speech endpoint. The Client Libraries are released as Alpha and will likely be changed in backward-incompatible ways. The client libraries are currently not recommended for production use.

These samples require that you have set up gcloud and have created and activated a service account. For information about setting up gcloud, and also creating and activating a service account, see Quickstart.

Protocol

Refer to the speech:longrunningrecognize API endpoint for complete details.

To perform asynchronous speech recognition, make a POST request and provide the appropriate request body.

POST https://speech.googleapis.com/v1/speech:longrunningrecognize?key=YOUR_API_KEY
{
"config": {
  "language_code": "en-US"
  },
"audio":{
  "uri":"gs://gcs-test-data/vr.flac"
  }
}

See the RecognitionConfig and RecognitionAudio reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "name": "7612202767953098924"
}

where name is the name of the long running operation created for the request.

Wait approximately 30 seconds for processing to complete. To retrieve the result of the operation, make a GET request:

GET https://speech.googleapis.com/v1/operations/YOUR_OPERATION_NAME?key=YOUR_API_KEY

replacing YOUR_OPERATION_NAME with the name received from your longrunningrecognize request.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "name": "7612202767953098924",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2017-07-20T16:36:55.033650Z",
    "lastUpdateTime": "2017-07-20T16:37:17.158630Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "transcript": "okay so what am I doing here...(etc)...",
            "confidence": 0.96096134,
          }
        ]
      },
      {
        "alternatives": [
          {
            ...
          }
        ]
      }
    ]
  }
}

If the operation has not completed, you can poll the endpoint by repeatedly making the GET request until the done property of the response is true.

C#

For more on installing and creating a Cloud Speech API client, refer to Cloud Speech API Client Libraries.

static object AsyncRecognizeGcs(string storageUri)
{
    var speech = SpeechClient.Create();
    var longOperation = speech.LongRunningRecognize(new RecognitionConfig()
    {
        Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
        SampleRateHertz = 16000,
        LanguageCode = "en",
    }, RecognitionAudio.FromStorageUri(storageUri));
    longOperation = longOperation.PollUntilCompleted();
    var response = longOperation.Result;
    foreach (var result in response.Results)
    {
        foreach (var alternative in result.Alternatives)
        {
            Console.WriteLine($"Transcript: { alternative.Transcript}");
        }
    }
    return 0;
}

Go

For more on installing and creating a Cloud Speech API client, refer to Cloud Speech API Client Libraries.

  1. Make the request:

    func sendGCS(client *speech.Client, gcsURI string) (*speechpb.LongRunningRecognizeResponse, error) {
    	ctx := context.Background()
    
    	// Send the contents of the audio file with the encoding and
    	// and sample rate information to be transcripted.
    	req := &speechpb.LongRunningRecognizeRequest{
    		Config: &speechpb.RecognitionConfig{
    			Encoding:        speechpb.RecognitionConfig_LINEAR16,
    			SampleRateHertz: 16000,
    			LanguageCode:    "en-US",
    		},
    		Audio: &speechpb.RecognitionAudio{
    			AudioSource: &speechpb.RecognitionAudio_Uri{Uri: gcsURI},
    		},
    	}
    
    	op, err := client.LongRunningRecognize(ctx, req)
    	if err != nil {
    		return nil, err
    	}
    	return op.Wait(ctx)
    }

  2. Print the result:

    // Print the results.
    for _, result := range resp.Results {
    	for _, alt := range result.Alternatives {
    		fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
    	}
    }

Java

For more on installing and creating a Cloud Speech API client, refer to Cloud Speech API Client Libraries.

public static void asyncRecognizeGcs(String gcsUri) throws Exception, IOException {
  // Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
  SpeechClient speech = SpeechClient.create();

  // Configure remote file request for Linear16
  RecognitionConfig config = RecognitionConfig.newBuilder()
      .setEncoding(AudioEncoding.FLAC)
      .setLanguageCode("en-US")
      .setSampleRateHertz(16000)
      .build();
  RecognitionAudio audio = RecognitionAudio.newBuilder()
      .setUri(gcsUri)
      .build();

  // Use non-blocking call for getting file transcription
  OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata,
          Operation> response =
      speech.longRunningRecognizeAsync(config, audio);
  while (!response.isDone()) {
    System.out.println("Waiting for response...");
    Thread.sleep(10000);
  }

  List<SpeechRecognitionResult> results = response.get().getResultsList();

  for (SpeechRecognitionResult result: results) {
    // There can be several alternative transcripts for a given chunk of speech. Just use the
    // first (most likely) one here.
    SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
    System.out.printf("Transcription: %s\n",alternative.getTranscript());
  }
  speech.close();
}

Node.js

For more on installing and creating a Cloud Speech API client, refer to Cloud Speech API Client Libraries.

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const gcsUri = 'gs://my-bucket/audio.raw';
// const encoding = 'Eencoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
};

const audio = {
  uri: gcsUri,
};

const request = {
  config: config,
  audio: audio,
};

// Detects speech in the audio file. This creates a recognition job that you
// can wait for now, or get its result later.
client
  .longRunningRecognize(request)
  .then(data => {
    const operation = data[0];
    // Get a Promise representation of the final result of the job
    return operation.promise();
  })
  .then(data => {
    const response = data[0];
    const transcription = response.results
      .map(result => result.alternatives[0].transcript)
      .join('\n');
    console.log(`Transcription: ${transcription}`);
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a Cloud Speech API client, refer to Cloud Speech API Client Libraries.

use Google\Cloud\Speech\SpeechClient;
use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Core\ExponentialBackoff;

/**
 * Transcribe an audio file using Google Cloud Speech API
 * Example:
 * ```
 * transcribe_async_gcs('your-bucket-name', 'audiofile.wav');
 * ```.
 *
 * @param string $bucketName The Cloud Storage bucket name.
 * @param string $objectName The Cloud Storage object name.
 * @param string $languageCode The Cloud Storage
 *     be recognized. Accepts BCP-47 (e.g., `"en-US"`, `"es-ES"`).
 * @param array $options configuration options.
 *
 * @return string the text transcription
 */
function transcribe_async_gcs($bucketName, $objectName, $languageCode = 'en-US', $options = [])
{
    // Create the speech client
    $speech = new SpeechClient([
        'languageCode' => $languageCode,
    ]);

    // Fetch the storage object
    $storage = new StorageClient();
    $object = $storage->bucket($bucketName)->object($objectName);

    // Create the asyncronous recognize operation
    $operation = $speech->beginRecognizeOperation(
        $object,
        $options
    );

    // Wait for the operation to complete
    $backoff = new ExponentialBackoff(10);
    $backoff->execute(function () use ($operation) {
        print('Waiting for operation to complete' . PHP_EOL);
        $operation->reload();
        if (!$operation->isComplete()) {
            throw new Exception('Job has not yet completed', 500);
        }
    });

    // Print the results
    if ($operation->isComplete()) {
        $results = $operation->results();
        foreach ($results as $result) {
            $alternative = $result->alternatives()[0];
            printf('Transcript: %s' . PHP_EOL, $alternative['transcript']);
            printf('Confidence: %s' . PHP_EOL, $alternative['confidence']);
        }
    }
}

Python

For more on installing and creating a Cloud Speech API client, refer to Cloud Speech API Client Libraries.

def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000,
        language_code='en-US')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    response = operation.result(timeout=90)

    # Print the first alternative of all the consecutive results.
    for result in response.results:
        print('Transcript: {}'.format(result.alternatives[0].transcript))
        print('Confidence: {}'.format(result.alternatives[0].confidence))

Ruby

For more on installing and creating a Cloud Speech API client, refer to Cloud Speech API Client Libraries.

# project_id   = "Your Google Cloud project ID"
# storage_path = "Path to file in Cloud Storage, eg. gs://bucket/audio.raw"

require "google/cloud/speech"

speech = Google::Cloud::Speech.new project: project_id
audio  = speech.audio storage_path, encoding:    :linear16,
                                    sample_rate: 16000,
                                    language:    "en-US"

operation = audio.process

puts "Operation started"

operation.wait_until_done!

results = operation.results

results.each do |result|
  puts "Transcription: #{result.transcript}"
end

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Speech API Documentation