Transcribing videos

This tutorial shows how to transcribe the audio track from a video file using Speech-to-Text.

Audio files can come from many different sources. Audio data can come from a phone (like voicemail) or the soundtrack included in a video file.

Cloud Speech-to-Text can use one of several machine learning models to transcribe your audio file, to best match the original source of the audio. You can get better results from your speech transcription by specifying the source of the original audio. This allows Cloud Speech-to-Text to process your audio files using a machine learning model trained for data similar to your audio file.


  • Send a audio transcription request for a video file to Cloud Speech-to-Text.


This tutorial uses billable components of Cloud Platform, including:

  • Speech-to-Text

Use the Pricing Calculator to generate a cost estimate based on your projected usage. New Cloud Platform users might be eligible for a free trial.

Before you begin

This tutorial has several prerequisites:

Preparing the audio data

Before you can transcribe audio from a video, you must extract the data from the video file. After you've extracted the audio data, you must store it in a Cloud Storage bucket or convert it to base64-encoding.

Extract the audio data

You can use any file conversion tool that handles audio and video files, such as FFmpeg.

Use the code snippet below to convert a video file to an audio file using ffmpeg.

ffmpeg -i video-input-file audio-output-file

Store or convert the audio data

You can transcribe an audio file stored on your local machine or in a Cloud Storage bucket.

Use the following command to upload your audio file to an existing Cloud Storage bucket using the gsutil tool.

gsutil cp audio-output-file storage-bucket-uri

If you use a local file and plan to send a request using the curl tool from the command line, you must convert the audio file to base64-encoded data first.

Use the following command to convert an audio file to a text file.

base64 audio-output-file -w 0 > audio-data-text

Sending a request

Use the following code to send a transcription request to Cloud Speech-to-Text.


Refer to the <a href="/speech-to-text/docs/reference/rest/v1/speech/recognize"><code translate="no" dir="ltr">speech:recognize</code></a> API endpoint for complete details.

To perform synchronous speech recognition, make a POST request and provide the appropriate request body. The following shows an example of a POST request using curl. The example uses the access token for a service account set up for the project using the Google Cloud Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account, and obtaining an access token, see the quickstart.

curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ \
    --data "{
  'config': {
    'encoding': 'LINEAR16',
    'sampleRateHertz': 16000,
    'languageCode': 'en-US',
    'model': 'video'
  'audio': {

See the <a href="/speech-to-text/docs/reference/rest/v1/RecognitionConfig"><code translate="no" dir="ltr">RecognitionConfig</code></a> reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

  "results": [
      "alternatives": [
          "transcript": "OK Google stream stranger things from
            Netflix to my TV okay stranger things from
            Netflix playing on TV from the people that brought you
            Google home comes the next evolution of the smart home
            and it's just outside your window me Google know hi
            how can I help okay no what's the weather like outside
            the weather outside is sunny and 76 degrees he's right
            okay no turn on the hose I'm holding sure okay no I'm can
            I eat this lemon tree leaf yes what about this Daisy yes
            but I wouldn't recommend it but I could eat it okay
            Nomad milk to my shopping list I'm sorry that sounds like
            an indoor request I keep doing that sorry you do keep
            doing that okay no is this compost really we're all
            compost if you think about it pretty much everything is
            made up of organic matter and will return",
          "confidence": 0.9251011


// Imports the Google Cloud client library for Beta API
 * TODO(developer): Update client library import to use new
 * version of API when desired features become available
const speech = require('@google-cloud/speech').v1p1beta1;
const fs = require('fs');

// Creates a client
const client = new speech.SpeechClient();

 * TODO(developer): Uncomment the following lines before running the sample.
// const filename = 'Local path to audio file, e.g. /path/to/audio.raw';
// const model = 'Model to use, e.g. phone_call, video, default';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
  model: model,
const audio = {
  content: fs.readFileSync(filename).toString('base64'),

const request = {
  config: config,
  audio: audio,

// Detects speech in the audio file
const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
console.log(`Transcription: `, transcription);


def transcribe_model_selection(speech_file, model):
    """Transcribe the given audio file synchronously with
    the selected model."""
    from import speech
    client = speech.SpeechClient()

    with open(speech_file, 'rb') as audio_file:
        content =

    audio = speech.types.RecognitionAudio(content=content)

    config = speech.types.RecognitionConfig(

    response = client.recognize(config, audio)

    for i, result in enumerate(response.results):
        alternative = result.alternatives[0]
        print('-' * 20)
        print('First alternative of result {}'.format(i))
        print(u'Transcript: {}'.format(alternative.transcript))


 * Performs transcription of the given audio file synchronously with the selected model.
 * @param fileName the path to a audio file to transcribe
public static void transcribeModelSelection(String fileName) throws Exception {
  Path path = Paths.get(fileName);
  byte[] content = Files.readAllBytes(path);

  try (SpeechClient speech = SpeechClient.create()) {
    // Configure request with video media type
    RecognitionConfig recConfig =
            // encoding may either be omitted or must match the value in the file header
            // sample rate hertz may be either be omitted or must match the value in the file
            // header

    RecognitionAudio recognitionAudio =

    RecognizeResponse recognizeResponse = speech.recognize(recConfig, recognitionAudio);
    // Just print the first result here.
    SpeechRecognitionResult result = recognizeResponse.getResultsList().get(0);
    // There can be several alternative transcripts for a given chunk of speech. Just use the
    // first (most likely) one here.
    SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
    System.out.printf("Transcript : %s\n", alternative.getTranscript());

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting instances

To delete a Compute Engine instance:

  1. In the Cloud Console, go to the VM Instances page.

    Go to the VM Instances page

  2. Click the checkbox for the instance you want to delete.
  3. Click Delete to delete the instance.

Deleting firewall rules for the default network

To delete a firewall rule:

  1. In the Cloud Console, go to the Firewall Rules page.

    Go to the Firewall Rules page

  2. Click the checkbox for the firewall rule you want to delete.
  3. Click Delete to delete the firewall rule.

What's next

What's next