Transcribe audio with voice activity events
Stay organized with collections
Save and categorize content based on your preferences.
This sample demonstrates how to transcribe audio from a file into text, and detect speech activity events such as when someone starts or stops speaking.
Code sample
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],[],[],[],null,["# Transcribe audio with voice activity events\n\nThis sample demonstrates how to transcribe audio from a file into text, and detect speech activity events such as when someone starts or stops speaking.\n\nCode sample\n-----------\n\n### Python\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Python API\nreference documentation](/python/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import os\n\n from google.cloud.speech_v2 import SpeechClient\n from google.cloud.speech_v2.types import cloud_speech\n\n PROJECT_ID = os.getenv(\"GOOGLE_CLOUD_PROJECT\")\n\n\n def transcribe_streaming_voice_activity_events(\n audio_file: str,\n ) -\u003e cloud_speech.StreamingRecognizeResponse:\n \"\"\"Transcribes audio from a file into text and detects voice activity\n events using Google Cloud Speech-to-Text API.\n Args:\n audio_file (str): Path to the local audio file to be transcribed.\n Example: \"resources/audio.wav\"\n Returns:\n list[cloud_speech.StreamingRecognizeResponse]: A list of `StreamingRecognizeResponse` objects.\n \"\"\"\n # Instantiates a client\n client = SpeechClient()\n\n # Reads a file as bytes\n with open(audio_file, \"rb\") as file:\n audio_content = file.read()\n\n # In practice, stream should be a generator yielding chunks of audio data\n chunk_length = len(audio_content) // 5\n stream = [\n audio_content[start : start + chunk_length]\n for start in range(0, len(audio_content), chunk_length)\n ]\n audio_requests = (\n cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream\n )\n\n recognition_config = cloud_speech.RecognitionConfig(\n auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),\n language_codes=[\"en-US\"],\n model=\"long\",\n )\n\n # Sets the flag to enable voice activity events\n streaming_features = cloud_speech.StreamingRecognitionFeatures(\n enable_voice_activity_events=True\n )\n streaming_config = cloud_speech.StreamingRecognitionConfig(\n config=recognition_config, streaming_features=streaming_features\n )\n\n config_request = cloud_speech.StreamingRecognizeRequest(\n recognizer=f\"projects/{PROJECT_ID}/locations/global/recognizers/_\",\n streaming_config=streaming_config,\n )\n\n def requests(config: cloud_speech.RecognitionConfig, audio: list) -\u003e list:\n yield config\n yield from audio\n\n # Transcribes the audio into text\n responses_iterator = client.streaming_recognize(\n requests=requests(config_request, audio_requests)\n )\n responses = []\n for response in responses_iterator:\n responses.append(response)\n if (\n response.speech_event_type\n == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_BEGIN\n ):\n print(\"Speech started.\")\n if (\n response.speech_event_type\n == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_END\n ):\n print(\"Speech ended.\")\n for result in response.results:\n print(f\"Transcript: {result.alternatives[0].transcript}\")\n\n return responses\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=speech)."]]