This tutorial shows how to transcribe the audio track from a video file using Speech-to-Text.
Audio files can come from many different sources. Audio data can come from a phone (like voicemail) or the soundtrack included in a video file.
Speech-to-Text can use one of several machine learning models to transcribe your audio file, to best match the original source of the audio. You can get better results from your speech transcription by specifying the source of the original audio. This allows Speech-to-Text to process your audio files using a machine learning model trained for data similar to your audio file.
Objectives
- Send a audio transcription request for a video file to Speech-to-Text.
Costs
This tutorial uses billable components of Cloud Platform, including:
- Speech-to-Text
Use the Pricing Calculator to generate a cost estimate based on your projected usage. New Cloud Platform users might be eligible for a free trial.
Before you begin
This tutorial has several prerequisites:
- You've set up a Speech-to-Text project in the Google Cloud Console.
- You've set up your environment using Application Default Credentials in the Google Cloud Console.
- You have set up the development environment for your chosen programming language.
- You've installed the Google Cloud Client Library for your chosen programming language.
Preparing the audio data
Before you can transcribe audio from a video, you must extract the data from the video file. After you've extracted the audio data, you must store it in a Cloud Storage bucket or convert it to base64-encoding.
Extract the audio data
You can use any file conversion tool that handles audio and video files, such as FFmpeg.
Use the code snippet below to convert a video file to an audio file
using ffmpeg
.
ffmpeg -i video-input-file audio-output-file
Store or convert the audio data
You can transcribe an audio file stored on your local machine or in a Cloud Storage bucket.
Use the following command to upload your audio file to an existing
Cloud Storage bucket using the gsutil
tool.
gsutil cp audio-output-file storage-bucket-uri
If you use a local file and plan to send a request using the curl
tool from the command line, you must convert the audio file to
base64-encoded data first.
Use the following command to convert an audio file to a text file.
base64 audio-output-file -w 0 > audio-data-text
Sending a request
Use the following code to send a transcription request to Speech-to-Text.
Local file request
Protocol
Refer to the speech:recognize
API endpoint for complete details.
To perform synchronous speech recognition, make a POST
request and provide the
appropriate request body. The following shows an example of a POST
request using
curl
. The example uses the access token for a service account set up for the
project using the Google Cloud
Cloud SDK. For instructions on installing the Cloud SDK,
setting up a project with a service account, and obtaining an access token,
see the quickstart.
curl -s -H "Content-Type: application/json" \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ https://speech.googleapis.com/v1/speech:recognize \ --data '{ "config": { "encoding": "LINEAR16", "sampleRateHertz": 16000, "languageCode": "en-US", "model": "video" }, "audio": { "uri": "gs://cloud-samples-tests/speech/Google_Gnome.wav" } }'
See the RecognitionConfig
reference documentation for
more information on configuring the request body.
If the request is successful, the server returns a 200 OK
HTTP
status code and the response in JSON format:
{ "results": [ { "alternatives": [ { "transcript": "OK Google stream stranger things from Netflix to my TV okay stranger things from Netflix playing on TV from the people that brought you Google home comes the next evolution of the smart home and it's just outside your window me Google know hi how can I help okay no what's the weather like outside the weather outside is sunny and 76 degrees he's right okay no turn on the hose I'm holding sure okay no I'm can I eat this lemon tree leaf yes what about this Daisy yes but I wouldn't recommend it but I could eat it okay Nomad milk to my shopping list I'm sorry that sounds like an indoor request I keep doing that sorry you do keep doing that okay no is this compost really we're all compost if you think about it pretty much everything is made up of organic matter and will return", "confidence": 0.9251011 } ] } ] }
Go
Java
Node.js
Python
Remote file request
Java
Node.js
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Deleting the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Deleting instances
To delete a Compute Engine instance:
- In the Cloud Console, go to the VM instances page.
- Select the checkbox for the instance that you want to delete.
- To delete the instance, click Delete.
Deleting firewall rules for the default network
To delete a firewall rule:
- In the Cloud Console, go to the Firewall page.
- Select the checkbox for the firewall rule that you want to delete.
- To delete the firewall rule, click Delete.
What's next
- Learn how to get timestamps for audio.
- Identify different speakers in an audio file.
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how Speech-to-Text performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Try Speech-to-Text free