長い音声ファイルをテキストに変換する

このページでは、Speech-to-Text API と非同期音声認識を使用して、長い音声ファイル（1 分を超える）をテキストに変換する方法について説明します。

非同期音声認識について

一括音声認識では、音声処理オペレーションの長時間実行が開始されます。60 秒を超える音声を文字に変換するには、非同期音声認識を使用します。短い音声の場合は、同期音声認識を使用したほうが早くて簡単です。非同期音声認識の上限は 480 分（8 時間）です。

一括音声認識では、Cloud Storage に保存されている音声のみ文字変換できます。音声文字変換の出力は、レスポンスにインラインで送信することも（単一ファイルの一括認識リクエストの場合）、Cloud Storage に書き込むこともできます。

一括認識リクエストは、リクエストの継続的な認識処理に関する情報を含む Operation を返します。オペレーションをポーリングすると、オペレーションの完了と音声文字変換テキストのタイミングを確認できます。

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
IAM に移動
プロジェクトを選択します。
[ アクセスを許可] をクリックします。
[新しいプリンシパル] フィールドに、ユーザー ID を入力します。これは通常、Google アカウントのメールアドレスです。
[ロールを選択] リストでロールを選択します。
追加のロールを付与するには、 [別のロールを追加] をクリックして各ロールを追加します。
[保存] をクリックします。

Install the Google Cloud CLI.

注: すでに gcloud CLI をインストールしている場合は、gcloud components update を実行して、最新バージョンがインストールされていることを確認してください。

外部 ID プロバイダ（IdP）を使用している場合は、まずフェデレーション ID を使用して gcloud CLI にログインする必要があります。

gcloud CLI を初期化するには、次のコマンドを実行します。

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
IAM に移動
プロジェクトを選択します。
[ アクセスを許可] をクリックします。
[新しいプリンシパル] フィールドに、ユーザー ID を入力します。これは通常、Google アカウントのメールアドレスです。
[ロールを選択] リストでロールを選択します。
追加のロールを付与するには、 [別のロールを追加] をクリックして各ロールを追加します。
[保存] をクリックします。

Install the Google Cloud CLI.

外部 ID プロバイダ（IdP）を使用している場合は、まずフェデレーション ID を使用して gcloud CLI にログインする必要があります。

gcloud CLI を初期化するには、次のコマンドを実行します。

gcloud init

クライアントライブラリは、アプリケーションのデフォルト認証情報を使用することによって、Google API で簡単に認証を行い、これらの API にリクエストを送信できます。アプリケーションのデフォルト認証情報を使用すると、ベースとなるコードを変更することなく、ローカルでアプリケーションのテストを行ったり、アプリケーションをデプロイしたりできます。詳細については、クライアントライブラリを使用するための認証をご覧ください。

If you're using a local shell, then create local authentication credentials for your user account:
```
gcloud auth application-default login
```
You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

また、クライアントライブラリがインストールされていることを確認してください。

Cloud Storage へのアクセスを有効にする

Speech-to-Text は、サービスアカウントを使用して Cloud Storage 内のファイルにアクセスします。デフォルトでは、サービスアカウントは同じプロジェクト内の Cloud Storage ファイルにアクセスできます。

サービスアカウントのメールアドレスは次のとおりです。

service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com

別のプロジェクトの Cloud Storage ファイルを音声文字変換するには、このサービスアカウントにもう一方のプロジェクトの Speech-to-Text サービスエージェントロールを付与します。

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
    --role=roles/speech.serviceAgent

プロジェクトの IAM ポリシーの詳細については、プロジェクト、フォルダ、組織へのアクセス権の管理をご覧ください。

サービスアカウントにさらにきめ細かくアクセス権を付与するには、特定の Cloud Storage バケットへの権限を付与します。

gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
    --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
    --role=roles/storage.admin

Cloud Storage へのアクセスの管理の詳細については、Cloud Storage ドキュメントのアクセス制御リストの作成と管理をご覧ください。

一括認識を実行してインラインで結果を読み取る

Cloud Storage の音声ファイルに対して一括音声認識を実行し、レスポンスから音声文字変換結果をインラインで読み取る例を次に示します。

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_batch_gcs_input_inline_output_v2(
    audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using the Google Cloud Speech-to-Text API.
        The transcription results are returned inline in the response.
    Args:
        audio_uri (str): The Google Cloud Storage URI of the input audio file.
            E.g., gs://[BUCKET]/[FILE]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            inline_response_config=cloud_speech.InlineOutputConfig(),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    for result in response.results[audio_uri].transcript.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response.results[audio_uri].transcript

一括認識を実行して結果を Cloud Storage に書き込む

次に、Cloud Storage の音声ファイルに対して一括音声認識を実行し、Cloud Storage の出力ファイルから音声文字変換結果を読み取る例を示します。Cloud Storage に書き込まれるファイルは JSON 形式の BatchRecognizeResults メッセージです。

Python

import os

import re

from google.cloud import storage
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_batch_gcs_input_gcs_output_v2(
    audio_uri: str,
    gcs_output_path: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using the Google Cloud Speech-to-Text API.
    The transcription results are stored in another Google Cloud Storage bucket.
    Args:
        audio_uri (str): The Google Cloud Storage URI of the input audio file.
            E.g., gs://[BUCKET]/[FILE]
        gcs_output_path (str): The Google Cloud Storage bucket URI where the output transcript will be stored.
            E.g., gs://[BUCKET]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the URI of the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            gcs_output_config=cloud_speech.GcsOutputConfig(
                uri=gcs_output_path,
            ),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    file_results = response.results[audio_uri]

    print(f"Operation finished. Fetching results from {file_results.uri}...")
    output_bucket, output_object = re.match(
        r"gs://([^/]+)/(.*)", file_results.uri
    ).group(1, 2)

    # Instantiates a Cloud Storage client
    storage_client = storage.Client()

    # Fetch results from Cloud Storage
    bucket = storage_client.bucket(output_bucket)
    blob = bucket.blob(output_object)
    results_bytes = blob.download_as_bytes()
    batch_recognize_results = cloud_speech.BatchRecognizeResults.from_json(
        results_bytes, ignore_unknown_fields=True
    )

    for result in batch_recognize_results.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return batch_recognize_results

複数のファイルに対して一括認識を実行する

次に、Cloud Storage の複数の音声ファイルに対して一括音声認識を実行し、Cloud Storage の出力ファイルから音声文字変換結果を読み取る例を示します。

Python

import os
import re
from typing import List

from google.cloud import storage
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_batch_multiple_files_v2(
    audio_uris: List[str],
    gcs_output_path: str,
) -> cloud_speech.BatchRecognizeResponse:
    """Transcribes audio from multiple Google Cloud Storage URIs using the Google Cloud Speech-to-Text API.
    The transcription results are stored in another Google Cloud Storage bucket.
    Args:
        audio_uris (List[str]): The list of Google Cloud Storage URIs of the input audio files.
            E.g., ["gs://[BUCKET]/[FILE]", "gs://[BUCKET]/[FILE]"]
        gcs_output_path (str): The Google Cloud Storage bucket URI where the output transcript will be stored.
            E.g., gs://[BUCKET]
    Returns:
        cloud_speech.BatchRecognizeResponse: The response containing the URIs of the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    files = [cloud_speech.BatchRecognizeFileMetadata(uri=uri) for uri in audio_uris]

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=files,
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            gcs_output_config=cloud_speech.GcsOutputConfig(
                uri=gcs_output_path,
            ),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    print("Operation finished. Fetching results from:")
    for uri in audio_uris:
        file_results = response.results[uri]
        print(f"  {file_results.uri}...")
        output_bucket, output_object = re.match(
            r"gs://([^/]+)/(.*)", file_results.uri
        ).group(1, 2)

        # Instantiates a Cloud Storage client
        storage_client = storage.Client()

        # Fetch results from Cloud Storage
        bucket = storage_client.bucket(output_bucket)
        blob = bucket.blob(output_object)
        results_bytes = blob.download_as_bytes()
        batch_recognize_results = cloud_speech.BatchRecognizeResults.from_json(
            results_bytes, ignore_unknown_fields=True
        )

        for result in batch_recognize_results.results:
            print(f"     Transcript: {result.alternatives[0].transcript}")

    return response

一括認識で動的一括処理を有効にする

動的一括処理を使用すると、より高いレイテンシに対して低コストで音声文字変換を行うことができます。この機能はバッチ認識でのみ使用できます。

次に、動的一括処理を有効にして Cloud Storage の音声ファイルに対して一括認識を実行する例を示します。

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_batch_dynamic_batching_v2(
    audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using dynamic batching.
    Args:
        audio_uri (str): The Cloud Storage URI of the input audio.
        E.g., gs://[BUCKET]/[FILE]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            inline_response_config=cloud_speech.InlineOutputConfig(),
        ),
        processing_strategy=cloud_speech.BatchRecognizeRequest.ProcessingStrategy.DYNAMIC_BATCHING,
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    for result in response.results[audio_uri].transcript.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response.results[audio_uri].transcript

ファイルごとに認識機能をオーバーライドする

一括認識のデフォルトでは、バッチ認識リクエストのファイルごとに同じ認識構成が使用されます。さまざまなファイルで異なる構成や機能が必要な場合は、[BatchRecognizeFileMetadata][batch-file-metadata-grpc] メッセージの config フィールドを使用して、ファイルごとに構成をオーバーライドできます。認識機能をオーバーライドする例については、認識機能のドキュメントをご覧ください。

クリーンアップ

このページで使用したリソースについて、 Google Cloud アカウントに課金されないようにするには、次の操作を行います。

Optional: Revoke the authentication credentials that you created, and delete the local credential file.
```
gcloud auth application-default revoke
```
Optional: Revoke credentials from the gcloud CLI.
```
gcloud auth revoke
```

コンソール

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用すると、プロジェクトの割り当て上限を超えないようにできます。

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

gcloud

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用すると、プロジェクトの割り当て上限の超過を回避できます。

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

次のステップ

一括認識については、リファレンスドキュメントをご覧ください。
ストリーミング音声を文字に変換する方法を学習する。
短い音声ファイルの文字変換を行う。
Chirp を使用して、音声ファイルの音声文字変換を行う。
ベストプラクティスのドキュメントで、最高のパフォーマンスと精度を実現するための方法やヒントを確認する。

長い音声ファイルをテキストに変換する コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

非同期音声認識について

始める前に

Check for the roles

Grant the roles

Check for the roles

Grant the roles

Cloud Storage へのアクセスを有効にする

一括認識を実行してインラインで結果を読み取る

Python

一括認識を実行して結果を Cloud Storage に書き込む

Python

複数のファイルに対して一括認識を実行する

Python

一括認識で動的一括処理を有効にする

Python

ファイルごとに認識機能をオーバーライドする

クリーンアップ

コンソール

gcloud

次のステップ

長い音声ファイルをテキストに変換する