Transcribe short audio files

Stay organized with collections Save and categorize content based on your preferences.

This page demonstrates how to transcribe a short audio file to text using synchronous speech recognition.

Synchronous speech recognition returns the recognized text for short audio (less than 60 seconds).

Audio content can be sent directly to Speech-to-Text from a local file, or Speech-to-Text can process audio content stored in a Google Cloud Storage bucket. See the quotas & limits page for limits on synchronous speech recognition requests.

For more information about recognizers and sending recognition requests, see the reference documentation.

Perform synchronous speech recognition on a local file

Here is an example of performing synchronous speech recognition on a local audio file:

Python

import io

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech


def transcribe_file_v2(project_id, recognizer_id, audio_file):
    # Instantiates a client
    client = SpeechClient()

    request = cloud_speech.CreateRecognizerRequest(
        parent=f"projects/{project_id}/locations/global",
        recognizer_id=recognizer_id,
        recognizer=cloud_speech.Recognizer(
            language_codes=["en-US"], model="latest_long"
        ),
    )

    # Creates a Recognizer
    operation = client.create_recognizer(request=request)
    recognizer = operation.result()

    # Reads a file as bytes
    with io.open(audio_file, "rb") as f:
        content = f.read()

    config = cloud_speech.RecognitionConfig(auto_decoding_config={})

    request = cloud_speech.RecognizeRequest(
        recognizer=recognizer.name, config=config, content=content
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print("Transcript: {}".format(result.alternatives[0].transcript))

    return response

Perform synchronous speech recognition on a remote file

For your convenience, Speech-to-Text API can perform synchronous speech recognition directly on an audio file located in Google Cloud Storage, without the need to send the contents of the audio file in the body of your request.

In order for Speech-to-Text to access your files in Google Cloud Storage, you must create a service account for Speech-to-Text and give that account read access to the relevant storage object. To create a service account, in Cloud Shell, run the following command to create the account if it doesn't exist, and display it.

gcloud beta services identity create --service=speech.googleapis.com \
    --project=PROJECT_ID

If you're prompted to install the gcloud Beta Commands component, type Y. After installation, the command is automatically restarted.

The service account ID is formatted like an email address:

Service identity created: service-xxx@gcp-sa-speech.iam.gserviceaccount.com

Give this account read access to the relevant storage object on which you want to run recognition.

More information about managing access to Google Cloud Storage is available at Creating and Managing Access Control Lists in the Google Cloud Storage documentation.

Here is an example of performing synchronous speech recognition on a file located in Cloud Storage:

Python

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech


def transcribe_gcs_v2(project_id, recognizer_id, gcs_uri):
    # Instantiates a client
    client = SpeechClient()

    request = cloud_speech.CreateRecognizerRequest(
        parent=f"projects/{project_id}/locations/global",
        recognizer_id=recognizer_id,
        recognizer=cloud_speech.Recognizer(
            language_codes=["en-US"], model="latest_long"
        ),
    )

    # Creates a Recognizer
    operation = client.create_recognizer(request=request)
    recognizer = operation.result()

    config = cloud_speech.RecognitionConfig(auto_decoding_config={})

    request = cloud_speech.RecognizeRequest(
        recognizer=recognizer.name, config=config, uri=gcs_uri
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print("Transcript: {}".format(result.alternatives[0].transcript))

    return response