Transcribe audio with voice activity timeouts
Stay organized with collections
Save and categorize content based on your preferences.
This sample demonstrates how to transcribe audio from a file with voice activity timeouts. It uses the Speech-to-Text API to transcribe the audio and prints the transcript to the console. The sample also prints out speech activity events, such as when speech starts and ends.
Code sample
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],[],[],[],null,["# Transcribe audio with voice activity timeouts\n\nThis sample demonstrates how to transcribe audio from a file with voice activity timeouts. It uses the Speech-to-Text API to transcribe the audio and prints the transcript to the console. The sample also prints out speech activity events, such as when speech starts and ends.\n\nCode sample\n-----------\n\n### Python\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Python API\nreference documentation](/python/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import os\n from time import sleep\n\n from google.cloud.speech_v2 import SpeechClient\n from google.cloud.speech_v2.types import cloud_speech\n from google.protobuf import duration_pb2 # type: ignore\n\n PROJECT_ID = os.getenv(\"GOOGLE_CLOUD_PROJECT\")\n\n\n def transcribe_streaming_voice_activity_timeouts(\n speech_start_timeout: int,\n speech_end_timeout: int,\n audio_file: str,\n ) -\u003e cloud_speech.StreamingRecognizeResponse:\n \"\"\"Transcribes audio from audio file to text.\n Args:\n speech_start_timeout: The timeout in seconds for speech start.\n speech_end_timeout: The timeout in seconds for speech end.\n audio_file: Path to the local audio file to be transcribed.\n Example: \"resources/audio_silence_padding.wav\"\n Returns:\n The streaming response containing the transcript.\n \"\"\"\n # Instantiates a client\n client = SpeechClient()\n\n # Reads a file as bytes\n with open(audio_file, \"rb\") as file:\n audio_content = file.read()\n\n # In practice, stream should be a generator yielding chunks of audio data\n chunk_length = len(audio_content) // 20\n stream = [\n audio_content[start : start + chunk_length]\n for start in range(0, len(audio_content), chunk_length)\n ]\n audio_requests = (\n cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream\n )\n\n recognition_config = cloud_speech.RecognitionConfig(\n auto_decoding_config=cloud_speech.https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.AutoDetectDecodingConfig.html(),\n language_codes=[\"en-US\"],\n model=\"long\",\n )\n\n # Sets the flag to enable voice activity events and timeout\n speech_start_timeout = duration_pb2.Duration(seconds=speech_start_timeout)\n speech_end_timeout = duration_pb2.Duration(seconds=speech_end_timeout)\n voice_activity_timeout = (\n cloud_speech.https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.StreamingRecognitionFeatures.html.VoiceActivityTimeout(\n speech_start_timeout=speech_start_timeout,\n speech_end_timeout=speech_end_timeout,\n )\n )\n streaming_features = cloud_speech.https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v2.types.StreamingRecognitionFeatures.html(\n enable_voice_activity_events=True, voice_activity_timeout=voice_activity_timeout\n )\n\n streaming_config = cloud_speech.StreamingRecognitionConfig(\n config=recognition_config, streaming_features=streaming_features\n )\n\n config_request = cloud_speech.StreamingRecognizeRequest(\n recognizer=f\"projects/{PROJECT_ID}/locations/global/recognizers/_\",\n streaming_config=streaming_config,\n )\n\n def requests(config: cloud_speech.RecognitionConfig, audio: list) -\u003e list:\n yield config\n for message in audio:\n sleep(0.5)\n yield message\n\n # Transcribes the audio into text\n responses_iterator = client.https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.services.speech.SpeechClient.html#google_cloud_speech_v1_services_speech_SpeechClient_streaming_recognize(\n requests=requests(config_request, audio_requests)\n )\n\n responses = []\n for response in responses_iterator:\n responses.append(response)\n if (\n response.speech_event_type\n == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_BEGIN\n ):\n print(\"Speech started.\")\n if (\n response.speech_event_type\n == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_END\n ):\n print(\"Speech ended.\")\n for result in response.results:\n print(f\"Transcript: {result.alternatives[0].transcript}\")\n\n return responses\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=speech)."]]