This page demonstrates how to transcribe long audio files (longer than 1 minute) to text using asynchronous speech recognition.
Asynchronous speech recognition starts a long running audio processing operation. Use asynchronous speech recognition to transcribe audio that is longer than 1 minute. For shorter audio, synchronous speech recognition is faster and simpler.
You can retrieve the results of the operation using the google.longrunning.Operations method. Results remain available for retrieval for 5 days (120 hours). Audio content can be sent directly to Speech-to-Text from a local file, or the API can process audio content stored in { storage_name }. Audio files longer than 1 minute must be stored in a Cloud Storage bucket in order to be transcribed by Speech-to-Text. Performing asynchronous speech recognition on a local file longer than 1 minute will result in either an error or an incomplete transcription.
Transcribing long audio files using a Google Cloud Storage file
These samples use a
Cloud Storage bucket
to store the raw audio input
for the long-running transcription process. For an example of a typical
longrunningrecognize
operation response, see the
Speech-to-Text basics
documentation.
Protocol
Refer to the speech:longrunningrecognize
API endpoint for complete
details.
To perform synchronous speech recognition, make a POST
request and provide the
appropriate request body. The following shows an example of a POST
request using
curl
. The example uses the access token for a service account set up for the
project using the Google Cloud
Cloud SDK. For instructions on installing the Cloud SDK,
setting up a project with a service account, and obtaining an access token,
see the quickstart.
curl -X POST \ -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ --data "{ 'config': { 'language_code': 'en-US' }, 'audio':{ 'uri':'gs://gcs-test-data/vr.flac' } }" "https://speech.googleapis.com/v1/speech:longrunningrecognize"
See the RecognitionConfig and RecognitionAudio reference documentation for more information on configuring the request body.
If the request is successful, the server returns a 200 OK
HTTP status code and
the response in JSON format:
{ "name": "7612202767953098924" }
where name
is the name of the long running operation created for the request.
Wait for processing to complete. Processing time differs depending on your
source audio. In most cases, you will get results in half
the length of the source audio.
You can get the status of your long-running operation by making a GET
request to the https://speech.googleapis.com/v1/operations/
endpoint. Replace your-operation-name
with the name
returned from your longrunningrecognize
request. You can get the estimated
progress of the request from the
progressPercent
field.
curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ "https://speech.googleapis.com/v1/operations/your-operation-name"
If the request is successful, the server returns a 200 OK
HTTP status code and
the response in JSON format:
{ "name": "7612202767953098924", "metadata": { "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata", "progressPercent": 100, "startTime": "2017-07-20T16:36:55.033650Z", "lastUpdateTime": "2017-07-20T16:37:17.158630Z" }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse", "results": [ { "alternatives": [ { "transcript": "okay so what am I doing here...(etc)...", "confidence": 0.96096134, } ] }, { "alternatives": [ { ... } ] } ] } }
If the operation has not completed, you can poll the endpoint by repeatedly
making the GET
request until the done
property of the response is true
.
gcloud
Refer to the
recognize-long-running
command for complete details.
To perform asynchronous speech recognition, use the gcloud
command line tool, providing the path of a local file or a
Google Cloud Storage URL.
gcloud ml speech recognize-long-running \ 'gs://cloud-samples-tests/speech/brooklyn.flac' \ --language-code='en-US' --async
If the request is successful, the server returns the ID of the long-running operation in JSON format.
{ "name": OPERATION_ID }
You can then get information about the operation by running the following command.
gcloud ml speech operations describe OPERATION_ID
You can also poll the operation until it completes by running the following command.
gcloud ml speech operations wait OPERATION_ID
After the operation completes, the operation returns a transcript of the audio in JSON format.
{ "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse", "results": [ { "alternatives": [ { "confidence": 0.9840146, "transcript": "how old is the Brooklyn Bridge" } ] } ] }
C#
Go
Java
Node.js
PHP
Python
Ruby
Transcribing long audio files using a local file
These samples use a local file to store the raw audio input
for the long-running transcription process. For an example of a typical
longrunningrecognize
operation response, see the
Speech-to-Text basics
documentation.
C#
Go
Java
Node.js
PHP
Python
Ruby