This tutorial shows how to transcribe the audio track from a video file using Speech-to-Text.
Audio files can come from many different sources. Audio data can come from a phone (like voicemail) or the soundtrack included in a video file.
Speech-to-Text can use one of several machine learning models to transcribe your audio file, to best match the original source of the audio. You can get better results from your speech transcription by specifying the source of the original audio. This allows Speech-to-Text to process your audio files using a machine learning model trained for data similar to your audio file.
Objectives
- Send a audio transcription request for a video file to Speech-to-Text.
Costs
In this document, you use the following billable components of Google Cloud:
- Speech-to-Text
To generate a cost estimate based on your projected usage,
use the pricing calculator.
Before you begin
This tutorial has several prerequisites:
- You've set up a Speech-to-Text project in the Google Cloud console.
- You've set up your environment using Application Default Credentials in the Google Cloud console.
- You have set up the development environment for your chosen programming language.
- You've installed the Google Cloud Client Library for your chosen programming language.
Prepare the audio data
Before you can transcribe audio from a video, you must extract the data from the video file. After you've extracted the audio data, you must store it in a Cloud Storage bucket or convert it to base64-encoding.
Extract the audio data
You can use any file conversion tool that handles audio and video files, such as FFmpeg.
Use the code snippet below to convert a video file to an audio file
using ffmpeg
.
ffmpeg -i video-input-file audio-output-file
Store or convert the audio data
You can transcribe an audio file stored on your local machine or in a Cloud Storage bucket.
Use the following command to upload your audio file to an existing Cloud Storage bucket using the Google Cloud CLI.
gcloud storage cp audio-output-file storage-bucket-uri
If you use a local file and plan to send a request using the curl
tool from the command line, you must convert the audio file to
base64-encoded data first.
Use the following command to convert an audio file to a text file.
base64 audio-output-file -w 0 > audio-data-text
Send a transcription request
Use the following code to send a transcription request to Speech-to-Text.
Local file request
Protocol
Refer to the speech:recognize
API endpoint for complete details.
To perform synchronous speech recognition, make a POST
request and provide the
appropriate request body. The following shows an example of a POST
request using
curl
. The example uses the Google Cloud CLI to generate an access
token. For instructions on installing the gcloud CLI,
see the quickstart.
curl -s -H "Content-Type: application/json" \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ https://speech.googleapis.com/v1/speech:recognize \ --data '{ "config": { "encoding": "LINEAR16", "sampleRateHertz": 16000, "languageCode": "en-US", "model": "video" }, "audio": { "uri": "gs://cloud-samples-tests/speech/Google_Gnome.wav" } }'
See the RecognitionConfig
reference documentation for
more information on configuring the request body.
If the request is successful, the server returns a 200 OK
HTTP
status code and the response in JSON format:
{ "results": [ { "alternatives": [ { "transcript": "OK Google stream stranger things from Netflix to my TV okay stranger things from Netflix playing on TV from the people that brought you Google home comes the next evolution of the smart home and it's just outside your window me Google know hi how can I help okay no what's the weather like outside the weather outside is sunny and 76 degrees he's right okay no turn on the hose I'm holding sure okay no I'm can I eat this lemon tree leaf yes what about this Daisy yes but I wouldn't recommend it but I could eat it okay Nomad milk to my shopping list I'm sorry that sounds like an indoor request I keep doing that sorry you do keep doing that okay no is this compost really we're all compost if you think about it pretty much everything is made up of organic matter and will return", "confidence": 0.9251011 } ] } ] }
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Go API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Java API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Node.js API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Python API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.
Remote file request
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Go API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Java API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Node.js API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Python API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete instances
To delete a Compute Engine instance:
- In the Google Cloud console, go to the VM instances page.
- Select the checkbox for the instance that you want to delete.
- To delete the instance, click More actions, click Delete, and then follow the instructions.
Delete firewall rules for the default network
To delete a firewall rule:
- In the Google Cloud console, go to the Firewall page.
- Select the checkbox for the firewall rule that you want to delete.
- To delete the firewall rule, click Delete.
What's next
- Learn how to get timestamps for audio.
- Identify different speakers in an audio file.
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how Speech-to-Text performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Try Speech-to-Text free