Transcribe audio with voice activity timeouts

This sample demonstrates how to transcribe audio from a file with voice activity timeouts. It uses the Speech-to-Text API to transcribe the audio and prints the transcript to the console. The sample also prints out speech activity events, such as when speech starts and ends.

Code sample


To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Python API reference documentation.

To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from time import sleep

from import SpeechClient
from import cloud_speech
from google.protobuf import duration_pb2  # type: ignore

def transcribe_streaming_voice_activity_timeouts(
    project_id: str,
    speech_start_timeout: int,
    speech_end_timeout: int,
    audio_file: str,
) -> cloud_speech.StreamingRecognizeResponse:
    """Transcribes audio from audio file to text.

        project_id: The GCP project ID to use.
        speech_start_timeout: The timeout in seconds for speech start.
        speech_end_timeout: The timeout in seconds for speech end.
        audio_file: The audio file to transcribe.

        The streaming response containing the transcript.
    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        content =

    # In practice, stream should be a generator yielding chunks of audio data
    chunk_length = len(content) // 20
    stream = [
        content[start : start + chunk_length]
        for start in range(0, len(content), chunk_length)
    audio_requests = (
        cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream

    recognition_config = cloud_speech.RecognitionConfig(

    # Sets the flag to enable voice activity events and timeout
    speech_start_timeout = duration_pb2.Duration(seconds=speech_start_timeout)
    speech_end_timeout = duration_pb2.Duration(seconds=speech_end_timeout)
    voice_activity_timeout = (
    streaming_features = cloud_speech.StreamingRecognitionFeatures(
        enable_voice_activity_events=True, voice_activity_timeout=voice_activity_timeout

    streaming_config = cloud_speech.StreamingRecognitionConfig(
        config=recognition_config, streaming_features=streaming_features

    config_request = cloud_speech.StreamingRecognizeRequest(

    def requests(config: cloud_speech.RecognitionConfig, audio: list) -> list:
        yield config
        for message in audio:
            yield message

    # Transcribes the audio into text
    responses_iterator = client.streaming_recognize(
        requests=requests(config_request, audio_requests)

    responses = []
    for response in responses_iterator:
        if (
            == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_BEGIN
            print("Speech started.")
        if (
            == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_END
            print("Speech ended.")
        for result in response.results:
            print(f"Transcript: {result.alternatives[0].transcript}")

    return responses

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.