Chirp: ユニバーサル音声モデル

Chirp は、Google の次世代の Speech-to-Text モデルです。長年にわたる研究の成果として、Chirp の最初のバージョンが Speech-to-Text で利用できるようになりました。Google では、Chirp の改善と他の言語およびドメインへの展開を予定しています。詳しくは、Google USM の論文をご覧ください。

Chirp モデルは、現在の音声モデルとは異なるアーキテクチャでトレーニングされています。1 つのモデルに複数の言語のデータが統合されています。ただし、モデルで音声認識を行う言語はユーザーが指定します。Chirp では、他のモデルが提供している Google の音声認識機能の一部がサポートされていません。一覧については、機能のサポートと制限事項をご覧ください。

モデル ID

Chirp は Speech-to-Text API v2 で使用できます。他のモデルと同様に利用できます。

Chirp のモデル ID は chirp です。

このモデルは、同期またはバッチ認識リクエストで指定できます。

利用可能な API メソッド

Chirp は、他のモデルよりもはるかに大きなチャンクで音声を処理します。そのため、リアルタイムでの使用には適さない場合があります。Chirp は次の API メソッドを介して利用できます。

v2 Speech.Recognize（1 分未満の短い音声信号に適しています）
v2 Speech.BatchRecognize（1 分から 8 時間までの長い音声信号に適しています）

次の API メソッドでは Chirp を利用できません。

v2 Speech.StreamingRecognize
v1 Speech.StreamingRecognize
v1 Speech.Recognize
v1 Speech.LongRunningRecognize
v1p1beta1 Speech.StreamingRecognize
v1p1beta1 Speech.Recognize
v1p1beta1 Speech.LongRunningRecognize

リージョン

Chirp は、次のリージョンで利用できます。

us-central1
europe-west4
asia-southeast1

詳しくは言語ページをご覧ください。

言語

サポートされている言語については、言語のリストをご覧ください。

機能のサポートと制限事項

Chirp では、STT API の一部の機能がサポートされていません。

信頼スコア: API は値を返しますが、実際には信頼スコアではありません。
音声適応: 適応機能はサポートされていません。
ダイアライゼーション: 自動ダイアライゼーションはサポートされていません。
正規化の強制: サポートされていません。
単語レベルの信頼: サポートされていません。
言語検出: サポートされていません。

Chirp は、次の機能をサポートしています。

句読点の自動入力: 句読点はモデルによって予測されます。無効にすることもできます。
ワードタイミング: 必要に応じて返されます。
言語に依存しない音声文字変換: モデルは、音声ファイルの音声言語を自動的に推測して結果に追加します。

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
[IAM] に移動
プロジェクトを選択します。
[ アクセスを許可] をクリックします。
[新しいプリンシパル] フィールドに、ユーザー ID を入力します。これは通常、Google アカウントのメールアドレスです。
[ロールを選択] リストでロールを選択します。
追加のロールを付与するには、 [別のロールを追加] をクリックして各ロールを追加します。
[保存] をクリックします。

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

Note: If you installed the gcloud CLI previously, make sure you have the latest version by running

gcloud components
      update

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
[IAM] に移動
プロジェクトを選択します。
[ アクセスを許可] をクリックします。
[新しいプリンシパル] フィールドに、ユーザー ID を入力します。これは通常、Google アカウントのメールアドレスです。
[ロールを選択] リストでロールを選択します。
追加のロールを付与するには、 [別のロールを追加] をクリックして各ロールを追加します。
[保存] をクリックします。

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

Note: If you installed the gcloud CLI previously, make sure you have the latest version by running

gcloud components
      update

クライアントライブラリは、アプリケーションのデフォルト認証情報を使用することによって、Google API で簡単に認証を行い、これらの API にリクエストを送信できます。アプリケーションのデフォルト認証情報を使用すると、ベースとなるコードを変更することなく、ローカルでアプリケーションのテストを行ったり、アプリケーションをデプロイしたりできます。詳細については、クライアントライブラリを使用するための認証をご覧ください。

If you're using a local shell, then create local authentication credentials for your user account:
```
gcloud auth application-default login
```
You don't need to do this if you're using Cloud Shell.

また、クライアントライブラリがインストールされていることを確認してください。

Chirp を使用して同期音声認識を実行する

Chirp を使用してローカル音声ファイルに対して同期音声認識を実行する例を次に示します。

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_chirp(
    audio_file: str,
) -> cloud_speech.RecognizeResponse:
    """Transcribes an audio file using the Chirp model of Google Cloud Speech-to-Text API.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"
    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
        the transcription results.

    """
    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

言語に依存しない音声文字変換を有効にしてリクエストする

次のコードサンプルは、言語に依存しない音声文字変換を有効にしてリクエストを行う方法を示しています。

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_chirp_auto_detect_language(
    audio_file: str,
    region: str = "us-central1",
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file and auto-detect spoken language using Chirp.
    Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
    information on which audio encodings are supported.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        region (str): The region for the API endpoint.
    Returns:
        cloud_speech.RecognizeResponse: The response containing the transcription results.
    """
    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint=f"{region}-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["auto"],  # Set language code to auto to detect language.
        model="chirp",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/{region}/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
        print(f"Detected Language: {result.language_code}")

    return response

Google Cloud コンソールで Chirp を使ってみる

Google Cloud アカウントを登録して、プロジェクトを作成していることを確認します。
Google Cloud コンソールで [Speech] に移動します。
API が有効になっていない場合は有効にします。
[音声文字変換] サブページに移動します。
[新しい音声文字変換] をクリックします。
STT ワークスペースがあることを確認します。ない場合は作成します。
1. [ワークスペース] プルダウンを開き、[新しいワークスペース] をクリックします。
2. [新しいワークスペースの作成] ナビゲーションサイドバーで [参照] をクリックします。
3. クリックしてバケットを作成します。
4. バケットの名前を入力して、[続行] をクリックします。
5. [作成] をクリックします。
6. バケットが作成されたら、[選択] をクリックしてバケットを選択します。
7. [作成] をクリックして、Speech-to-Text 用のワークスペースの作成を完了します。
音声文字変換を行います。
1. [新しい音声文字変換] ページで、音声ファイルを選択するオプションを選択します。
  - [ローカルアップロード] をクリックしてアップロードします。
  - [Cloud Storage] をクリックして、既存の Cloud Storage ファイルを指定します。
注: Speech-to-Text は、音声ファイルのパラメータを自動的に評価しようとします。
1. [続行] をクリックします。
1. [音音声文字変換のオプション] セクションで、以前に作成した認識ツールから、Chirp で認識に使用する音声言語を選択します。
2. [モデル]* プルダウンで [Chirp] を選択します。
3. [リージョン] プルダウンで、リージョン（us-central1 など）を選択します。
4. [続行] をクリックします。
5. Chirp を使用して最初の認識リクエストを実行するには、メインセクションで [送信] をクリックします。
Chirp の音声文字変換の結果を表示します。
1. [音声文字変換] ページで、音声文字変換の名前をクリックします。
2. [音声文字変換の詳細] ページで、音声文字変換の結果を表示し、必要に応じてブラウザで音声を再生します。

クリーンアップ

このページで使用したリソースについて、 Google Cloud アカウントに課金されないようにするには、次の操作を行います。

Optional: Revoke the authentication credentials that you created, and delete the local credential file.
```
gcloud auth application-default revoke
```
Optional: Revoke credentials from the gcloud CLI.
```
gcloud auth revoke
```

コンソール

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用すると、プロジェクトの割り当て上限を超えないようにすることができます。

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

gcloud

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

次のステップ

短い音声ファイルの文字変換を行う。
ストリーミング音声を文字に変換する方法を学習する。
長い音声ファイルを文字に変換する方法を学習する。
ベストプラクティスのドキュメントで、最高のパフォーマンスと精度を実現するための方法やヒントを確認する。