This page demonstrates how to transcribe a short audio file to text using synchronous speech recognition.
Synchronous speech recognition returns the recognized text for short audio (less than 60 seconds). To process a speech recognition request for audio longer than 60 seconds, use Asynchronous Speech Recognition.
Audio content can be sent directly to Speech-to-Text from a local file, or Speech-to-Text can process audio content stored in a Google Cloud Storage bucket. See the quotas & limits page for limits on synchronous speech recognition requests.
Perform synchronous speech recognition on a local file
Here is an example of performing synchronous speech recognition on a local audio file:
REST
Refer to the speech:recognize
API endpoint for complete
details. See the RecognitionConfig reference documentation
for more information on configuring the request body.
The audio content supplied in the request body must be base64-encoded.
For more information on how to base64-encode
audio, see Base64 Encoding Audio Content. For more
information on the content
field, see RecognitionAudio.
Before using any of the request data, make the following replacements:
- LANGUAGE_CODE: the BCP-47 code of the language spoken in your audio clip.
- ENCODING: the encoding of the audio you want to transcribe.
- SAMPLE_RATE_HERTZ: sample rate in hertz of the audio you want to transcribe.
- ENABLE_WORD_TIME_OFFSETS: enable this field if you want word start and end time offsets (timestamps) returned.
- INPUT_AUDIO: a base64-encoded string of the audio data that you want to transcribe.
- PROJECT_ID: the alphanumeric ID of your Google Cloud project.
HTTP method and URL:
POST https://speech.googleapis.com/v1/speech:recognize
Request JSON body:
{ "config": { "languageCode": "LANGUAGE_CODE", "encoding": "ENCODING", "sampleRateHertz": SAMPLE_RATE_HERTZ, "enableWordTimeOffsets": ENABLE_WORD_TIME_OFFSETS }, "audio": { "content": "INPUT_AUDIO" } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "results": [ { "alternatives": [ { "transcript": "how old is the Brooklyn Bridge", "confidence": 0.98267895 } ] } ] }
gcloud
Refer to
recognize
command for complete details.
To perform speech recognition on a local file, use the Google Cloud CLI, passing in the local filepath of the file to perform speech recognition on.
gcloud ml speech recognize PATH-TO-LOCAL-FILE --language-code='en-US'
If the request is successful, the server returns a response in JSON format:
{ "results": [ { "alternatives": [ { "confidence": 0.9840146, "transcript": "how old is the Brooklyn Bridge" } ] } ] }
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Go API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Java API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Node.js API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Python API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.
Perform synchronous speech recognition on a remote file
For your convenience, Speech-to-Text API can perform synchronous speech recognition directly on an audio file located in Google Cloud Storage, without the need to send the contents of the audio file in the body of your request.
Here is an example of performing synchronous speech recognition on a file located in Cloud Storage:
REST
Refer to the speech:recognize
API endpoint for complete
details. See the RecognitionConfig reference documentation
for more information on configuring the request body.
The audio content supplied in the request body must be base64-encoded.
For more information on how to base64-encode
audio, see Base64 Encoding Audio Content. For more
information on the content
field, see RecognitionAudio.
Before using any of the request data, make the following replacements:
- LANGUAGE_CODE: the BCP-47 code of the language spoken in your audio clip.
- ENCODING: the encoding of the audio you want to transcribe.
- SAMPLE_RATE_HERTZ: sample rate in Hertz of the audio you want to transcribe.
- ENABLE_WORD_TIME_OFFSETS: enable this field if you want word start and end time offsets (timestamps) returned.
- STORAGE_BUCKET: a Cloud Storage bucket.
- INPUT_AUDIO: the audio data file that you want to transcribe.
- PROJECT_ID: the alphanumeric ID of your Google Cloud project.
HTTP method and URL:
POST https://speech.googleapis.com/v1/speech:recognize
Request JSON body:
{ "config": { "languageCode": "LANGUAGE_CODE", "encoding": "ENCODING", "sampleRateHertz": SAMPLE_RATE_HERTZ, "enableWordTimeOffsets": ENABLE_WORD_TIME_OFFSETS }, "audio": { "uri": "gs://STORAGE_BUCKET/INPUT_AUDIO" } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "results": [ { "alternatives": [ { "transcript": "how old is the Brooklyn Bridge", "confidence": 0.98267895 } ] } ] }
gcloud
Refer to
recognize
command for complete details.
To perform speech recognition on a local file, use the Google Cloud CLI, passing in the local filepath of the file to perform speech recognition on.
gcloud ml speech recognize 'gs://cloud-samples-tests/speech/brooklyn.flac' \ --language-code='en-US'
If the request is successful, the server returns a response in JSON format:
{ "results": [ { "alternatives": [ { "confidence": 0.9840146, "transcript": "how old is the Brooklyn Bridge" } ] } ] }
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Go API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Java API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Node.js API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Python API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.