This tutorial shows how to transcribe the audio track from a video file using Cloud Speech-to-Text.
Audio files can come from many different sources. Audio data can come from a phone (like voicemail) or the soundtrack included in a video file.
Cloud Speech-to-Text can use one of several machine learning models to transcribe your audio file, to best match the original source of the audio. You can get better results from your speech transcription by specifying the source of the original audio. This allows Cloud Speech-to-Text to process your audio files using a machine learning model trained for data similar to your audio file.
Objectives
- Send a audio transcription request for a video file to Cloud Speech-to-Text.
Costs
This tutorial uses billable components of Cloud Platform, including:
- Cloud Speech-to-Text
Use the Pricing Calculator to generate a cost estimate based on your projected usage. New Cloud Platform users might be eligible for a free trial.
Before you begin
This tutorial has several prerequisites:
- You've set up a Cloud Speech-to-Text project in the Google Cloud Console.
- You've set up your environment using Application Default Credentials in the Google Cloud Console.
- You have set up the development environment for your chosen programming language.
- You've installed the Google Cloud Client Library for your chosen programming language.
Preparing the audio data
Before you can transcribe audio from a video, you must extract the data from the video file. After you've extracted the audio data, you must store it in a Cloud Storage bucket or convert it to base64-encoding.
Extract the audio data
You can use any file conversion tool that handles audio and video files, such as FFmpeg.
Use the code snippet below to convert a video file to an audio file
using ffmpeg
.
ffmpeg -i video-input-file audio-output-file
Store or convert the audio data
You can transcribe an audio file stored on your local machine or in a Cloud Storage bucket.
Use the following command to upload your audio file to an existing
Cloud Storage bucket using the gsutil
tool.
gsutil cp audio-output-file storage-bucket-uri
If you use a local file and plan to send a request using the curl
tool from the command line, you must convert the audio file to
base64-encoded data first.
Use the following command to convert an audio file to a text file.
base64 audio-output-file -w 0 > audio-data-text
Sending a request
Use the following code to send a transcription request to Cloud Speech-to-Text.
Protocol
Refer to the speech:recognize
API endpoint for
complete details.
To perform synchronous speech recognition, make a POST
request and provide the
appropriate request body. The following shows an example of a POST
request using
curl
. The example uses the access token for a service account set up for the
project using the Google Cloud
Cloud SDK. For instructions on installing the Cloud SDK,
setting up a project with a service account, and obtaining an access token,
see the quickstart.
curl -s -H "Content-Type: application/json" \ -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ https://speech.googleapis.com/v1/speech:recognize \ --data "{ 'config': { 'encoding': 'LINEAR16', 'sampleRateHertz': 16000, 'languageCode': 'en-US', 'model': 'video' }, 'audio': { 'uri':'gs://cloud-samples-tests/speech/Google_Gnome.wav' } }"
See the RecognitionConfig
reference
documentation for more information on configuring the request body.
If the request is successful, the server returns a 200 OK
HTTP
status code and the response in JSON format:
{ "results": [ { "alternatives": [ { "transcript": "OK Google stream stranger things from Netflix to my TV okay stranger things from Netflix playing on TV from the people that brought you Google home comes the next evolution of the smart home and it's just outside your window me Google know hi how can I help okay no what's the weather like outside the weather outside is sunny and 76 degrees he's right okay no turn on the hose I'm holding sure okay no I'm can I eat this lemon tree leaf yes what about this Daisy yes but I wouldn't recommend it but I could eat it okay Nomad milk to my shopping list I'm sorry that sounds like an indoor request I keep doing that sorry you do keep doing that okay no is this compost really we're all compost if you think about it pretty much everything is made up of organic matter and will return", "confidence": 0.9251011 } ] } ] }
Node.js
Python
Java
Cleaning up
To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:
Deleting the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project you want to delete and click Delete delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Deleting instances
To delete a Compute Engine instance:
- In the Cloud Console, go to the VM Instances page.
- Click the checkbox for the instance you want to delete.
- Click Delete delete to delete the instance.
Deleting firewall rules for the default network
To delete a firewall rule:
- In the Cloud Console, go to the Firewall Rules page.
- Click the checkbox for the firewall rule you want to delete.
- Click Delete delete to delete the firewall rule.
What's next
What's next
- Learn how to get timestamps for audio.
- Identify different speakers in an audio file.