音声アクティビティイベントによる音声文字変換

このサンプルでは、ファイルから音声をテキストに文字変換する方法と、誰かが話し始めたときや話すのをやめたときなどの音声アクティビティイベントを検出する方法を示します。

コードサンプル

Python

Speech-to-Text 用のクライアントライブラリをインストールして使用する方法については、Speech-to-Text クライアントライブラリをご覧ください。詳細については、Speech-to-Text の Python API リファレンスドキュメントをご覧ください。

Speech-to-Text に対する認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、ローカル開発環境の認証を設定するをご覧ください。

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_streaming_voice_activity_events(
    audio_file: str,
) -> cloud_speech.StreamingRecognizeResponse:
    """Transcribes audio from a file into text and detects voice activity
        events using Google Cloud Speech-to-Text API.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"
    Returns:
        list[cloud_speech.StreamingRecognizeResponse]: A list of `StreamingRecognizeResponse` objects.
    """
    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as file:
        audio_content = file.read()

    # In practice, stream should be a generator yielding chunks of audio data
    chunk_length = len(audio_content) // 5
    stream = [
        audio_content[start : start + chunk_length]
        for start in range(0, len(audio_content), chunk_length)
    ]
    audio_requests = (
        cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream
    )

    recognition_config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    # Sets the flag to enable voice activity events
    streaming_features = cloud_speech.StreamingRecognitionFeatures(
        enable_voice_activity_events=True
    )
    streaming_config = cloud_speech.StreamingRecognitionConfig(
        config=recognition_config, streaming_features=streaming_features
    )

    config_request = cloud_speech.StreamingRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        streaming_config=streaming_config,
    )

    def requests(config: cloud_speech.RecognitionConfig, audio: list) -> list:
        yield config
        yield from audio

    # Transcribes the audio into text
    responses_iterator = client.streaming_recognize(
        requests=requests(config_request, audio_requests)
    )
    responses = []
    for response in responses_iterator:
        responses.append(response)
        if (
            response.speech_event_type
            == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_BEGIN
        ):
            print("Speech started.")
        if (
            response.speech_event_type
            == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_ACTIVITY_END
        ):
            print("Speech ended.")
        for result in response.results:
            print(f"Transcript: {result.alternatives[0].transcript}")

    return responses

次のステップ

他の Google Cloud プロダクトのコードサンプルを検索およびフィルタするには、Google Cloud サンプルブラウザをご覧ください。

音声アクティビティ イベントによる音声文字変換 コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

コードサンプル

Python

次のステップ

音声アクティビティイベントによる音声文字変換