使用語音活動逾時值轉錄音訊

這個範例會示範如何轉錄檔案中的音訊,並設定語音活動逾時。這項工具會使用 Speech-to-Text API 轉錄音訊,並將轉錄稿輸出至控制台。範例也會列印語音活動事件,例如語音開始和結束的時間。

程式碼範例

Python

如要瞭解如何安裝及使用 Speech-to-Text 的用戶端程式庫,請參閱這篇文章。 詳情請參閱 Speech-to-Text Python API 參考說明文件

如要向語音轉文字服務進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

import os
from time import sleep

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.protobuf import duration_pb2  # type: ignore

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_streaming_voice_activity_timeouts(
    speech_start_timeout: int,
    speech_end_timeout: int,
    audio_file: str,
) -> cloud_speech.StreamingRecognizeResponse:
    """Transcribes audio from audio file to text.
    Args:
        speech_start_timeout: The timeout in seconds for speech start.
        speech_end_timeout: The timeout in seconds for speech end.
        audio_file: Path to the local audio file to be transcribed.
            Example: "resources/audio_silence_padding.wav"
    Returns:
        The streaming response containing the transcript.
    """
    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as file:
        audio_content = file.read()

    # In practice, stream should be a generator yielding chunks of audio data
    chunk_length = len(audio_content) // 20
    stream = [
        audio_content[start : start + chunk_length]
        for start in range(0, len(audio_content), chunk_length)
    ]
    audio_requests = (
        cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream
    )

    recognition_config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    # Sets the flag to enable voice activity events and timeout
    speech_start_timeout = duration_pb2.Duration(seconds=speech_start_timeout)
    speech_end_timeout = duration_pb2.Duration(seconds=speech_end_timeout)
    voice_activity_timeout = (
        cloud_speech.StreamingRecognitionFeatures.VoiceActivityTimeout(
            speech_start_timeout=speech_start_timeout,
            speech_end_timeout=speech_end_timeout,
        )
    )
    streaming_features = cloud_speech.StreamingRecognitionFeatures(
        enable_voice_activity_events=True, voice_activity_timeout=voice_activity_timeout
    )

    streaming_config = cloud_speech.StreamingRecognitionConfig(
        config=recognition_config, streaming_features=streaming_features
    )

    config_request = cloud_speech.StreamingRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        streaming_config=streaming_config,
    )

    def requests(config: cloud_speech.RecognitionConfig, audio: list) -> list:
        yield config
        for message in audio:
            sleep(0.5)
            yield message

    # Transcribes the audio into text
    responses_iterator = client.streaming_recognize(
        requests=requests(config_request, audio_requests)
    )

    responses = []
    for response in responses_iterator:
        responses.append(response)
        if (
            response.speech_event_type
            == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_BEGIN
        ):
            print("Speech started.")
        if (
            response.speech_event_type
            == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_END
        ):
            print("Speech ended.")
        for result in response.results:
            print(f"Transcript: {result.alternatives[0].transcript}")

    return responses

後續步驟

如要搜尋及篩選其他 Google Cloud 產品的程式碼範例,請參閱Google Cloud 範例瀏覽器