Chirp 2: 多言語での精度が向上

Chirp 2 は、フィードバックと経験に基づいてユーザーのニーズを満たすように設計された、Google の最新の多言語 ASR 専用モデルです。元の Chirp モデルの精度と速度が向上し、単語レベルのタイムスタンプ、モデル適応、音声翻訳などの重要な新機能が追加されています。

Colab ノートブックを試す

GitHub でノートブックを表示

モデルの詳細

Chirp 2 は Speech-to-Text API V2 でのみ使用できます。

モデル ID

Chirp 2 は他のモデルと同様に使用できます。API を使用する場合は認識リクエストで適切なモデル ID を指定します。Google Cloud コンソールではモデル名を指定します。

モデル	モデル ID
Chirp 2	`chirp_2`

API メソッド

Chirp 2 は Speech-to-Text API V2 でのみ使用できるため、次の認識方法がサポートされています。

モデル	モデル ID	言語サポート
`V2`	`Speech.StreamingRecognize`（ストリーミングとリアルタイム音声に適しています）	限定*
`V2`	`Speech.Recognize`（1 分未満の短い音声信号に適しています）	Chirp と同程度
`V2`	`Speech.BatchRecognize`（1 分から 8 時間までの長い音声信号に適しています）	Chirp と同程度

*Location API を使用して、各音声文字変換モデルでサポートされている言語と特徴の最新リストをいつでも確認できます。

ご利用いただけるリージョン

Chirp 2 は、次のリージョンでサポートされています。

Google Cloud ゾーン	提供状況
`us-central1`	一般提供
`europe-west4`	一般提供
`asia-southeast1`	限定公開プレビュー

各音声文字変換モデルでサポートされている最新の Google Cloud のリージョン、言語、特徴の一覧は、ここで説明されているように、Location API を使用していつでも確認できます。

音声文字変換の対応言語

音声文字変換でサポートされている言語のリストをご覧ください。

翻訳の対応言語

音声翻訳では次の言語がサポートされています。Chirp 2 の翻訳の言語サポートは対称的ではありません。つまり、言語 A から言語 B への翻訳は可能でも、言語 B から言語 A への翻訳はできない場合があります。音声翻訳でサポートされている言語ペアは次のとおりです。

英語への翻訳の場合:

ソース言語 -> ターゲット言語	ソース言語 -> ターゲット言語コード
アラビア語（エジプト） -> 英語	`ar-EG` -> `en-US`
アラビア語（湾岸） -> 英語	`ar-x-gulf` -> `en-US`
アラビア語（レバント） -> 英語	`ar-x-levant` -> `en-US`
アラビア語（マグレブ） -> 英語	`ar-x-maghrebi` -> `en-US`
カタルーニャ語（スペイン） -> 英語	`ca-ES` -> `en-US`
ウェールズ語（英国） -> 英語	`cy-GB` -> `en-US`
ドイツ語（ドイツ） -> 英語	`de-DE` -> `en-US`
スペイン語（ラテンアメリカ） -> 英語	`es-419` -> `en-US`
スペイン語（スペイン） -> 英語	`es-ES` -> `en-US`
スペイン語（米国） -> 英語	`es-US` -> `en-US`
エストニア語（エストニア） -> 英語	`et-EE` -> `en-US`
フランス語（カナダ） -> 英語	`fr-CA` -> `en-US`
フランス語（フランス） -> 英語	`fr-FR` -> `en-US`
ペルシャ語（イラン） -> 英語	`fa-IR` -> `en-US`
インドネシア語（インドネシア） -> 英語	`id-ID` -> `en-US`
イタリア語（イタリア） -> 英語	`it-IT` -> `en-US`
日本語（日本）-> 英語	`ja-JP` -> `en-US`
ラトビア語（ラトビア） -> 英語	`lv-LV` -> `en-US`
モンゴル語（モンゴル） -> 英語	`mn-MN` -> `en-US`
オランダ語（オランダ） -> 英語	`nl-NL` -> `en-US`
ポルトガル語（ブラジル） -> 英語	`pt-BR` -> `en-US`
ロシア語（ロシア） -> 英語	`ru-RU` -> `en-US`
スロベニア語（スロベニア） -> 英語	`sl-SI` -> `en-US`
スウェーデン語（スウェーデン） -> 英語	`sv-SE` -> `en-US`
タミル語（インド） -> 英語	`ta-IN` -> `en-US`
トルコ語（トルコ） -> 英語	`tr-TR` -> `en-US`
中国語（簡体字、中国） -> 英語	`zh-Hans-CN` -> `en-US`

英語から翻訳する場合:

ソース言語 -> ターゲット言語	ソース言語 -> ターゲット言語コード
英語 -> アラビア語（エジプト）	`en-US` -> `ar-EG`
英語 -> アラビア語（湾岸）	`en-US` -> ar-`x-gulf`
英語 -> アラビア語（レバント）	`en-US` -> ar-`x-levant`
英語 -> アラビア語（マグレブ）	`en-US` -> ar-`x-maghrebi`
英語 -> カタルーニャ語（スペイン）	`en-US` -> `ca-ES`
英語 -> ウェールズ語（英国）	`en-US` -> `cy-GB`
英語 -> ドイツ語（ドイツ）	`en-US` -> `de-DE`
英語 -> エストニア語（エストニア）	`en-US` -> `et-EE`
英語 -> ペルシャ語（イラン）	`en-US` -> `fa-IR`
英語 -> インドネシア語（インドネシア）	`en-US` -> `id-ID`
英語 -> 日本語（日本）	`en-US` -> `ja-JP`
英語 -> ラトビア語（ラトビア）	`en-US` -> `lv-LV`
英語 -> モンゴル語（モンゴル）	`en-US` -> `mn-MN`
英語 -> スロベニア語（スロベニア）	`en-US` -> `sl-SI`
英語 -> スウェーデン語（スウェーデン）	`en-US` -> `sv-SE`
英語 -> タミル語（インド）	`en-US` -> `ta-IN`
英語 -> トルコ語（トルコ）	`en-US` -> `tr-TR`
英語 -> 中国語（簡体字、中国）	`en-US` -> `zh-Hans-CN`

機能のサポートと制限事項

Chirp 2 は、次の機能をサポートしています。

機能	説明
句読点入力の自動化	モデルによって自動的に生成され、必要に応じて無効にできます。
大文字の自動入力	モデルによって自動的に生成され、必要に応じて無効にできます。
音声適応（バイアス）	単純な単語やフレーズでモデルにヒントを提供することで、特定の用語や固有名詞の認識精度を高めることができます。クラストークンまたはカスタムクラスはサポートされていません。
単語のタイミング（タイムスタンプ）	モデルによって自動的に生成され、必要に応じて有効にできます。音声文字変換の品質と速度が若干、低下する可能性があります。
言語に依存しない音声文字変換	モデルは、音声ファイルの音声言語を自動的に推測し、最も一般的な言語で文字変換します。
言語固有の翻訳	モデルは、音声の言語からターゲット言語に自動的に翻訳します。
強制正規化	リクエスト本文で定義されている場合、API は特定の用語やフレーズに対して文字列を置き換え、音声文字変換の整合性を確保します。
単語レベルの信頼スコア	API は値を返しますが、実際には信頼スコアではありません。翻訳の場合、信頼スコアは返されません。

Chirp 2 は、次の機能をサポートしていません。

機能	説明
ダイアライゼーション	非対応
言語の検出	非対応
冒とくフィルタ	非対応

Chirp 2 を使用して音声文字変換を行う

音声文字変換と翻訳のニーズに Chirp 2 で対応する方法を学びます。

音声認識ストリーミングを実施する

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_streaming_chirp2(
    audio_file: str
) -> cloud_speech.StreamingRecognizeResponse:
    """Transcribes audio from audio file stream using the Chirp 2 model of Google Cloud Speech-to-Text V2 API.

    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"

    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API V2 containing
        the transcription results.        
    """

    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        content = f.read()

    # In practice, stream should be a generator yielding chunks of audio data
    chunk_length = len(content) // 5
    stream = [
        content[start : start + chunk_length]
        for start in range(0, len(content), chunk_length)
    ]
    audio_requests = (
        cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream
    )

    recognition_config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),        
        language_codes=["en-US"],
        model="chirp_2",
    )
    streaming_config = cloud_speech.StreamingRecognitionConfig(
        config=recognition_config
    )
    config_request = cloud_speech.StreamingRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        streaming_config=streaming_config,
    )

    def requests(config: cloud_speech.RecognitionConfig, audio: list) -> list:
        yield config
        yield from audio

    # Transcribes the audio into text
    responses_iterator = client.streaming_recognize(
        requests=requests(config_request, audio_requests)
    )
    responses = []
    for response in responses_iterator:
        responses.append(response)
        for result in response.results:
            print(f"Transcript: {result.alternatives[0].transcript}")

    return responses

同期音声認識を行う

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_sync_chirp2(
    audio_file: str
) -> cloud_speech.RecognizeResponse:
    """Transcribes an audio file using the Chirp 2 model of Google Cloud Speech-to-Text V2 API.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"
    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
        the transcription results.
    """

    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp_2",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

音声認識を一括で行う

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_batch_chirp2(
    audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 2 model of Google Cloud Speech-to-Text V2 API.
    Args:
        audio_uri (str): The Google Cloud Storage URI of the input audio file.
            E.g., gs://[BUCKET]/[FILE]
    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
        the transcription results.
    """

    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp_2",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            inline_response_config=cloud_speech.InlineOutputConfig(),
        ),
    )

    # Transcribes the audio into text    
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    for result in response.results[audio_uri].transcript.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response.results[audio_uri].transcript

Chirp 2 機能を使用する

最新機能の使用方法とそのコード例です。

言語に依存しない音声文字変換を行う

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_sync_chirp2_auto_detect_language(
    audio_file: str
) -> cloud_speech.RecognizeResponse:
    """Transcribes an audio file and auto-detect spoken language using Chirp 2.
    Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
    information on which audio encodings are supported.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"
    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
        the transcription results.
    """
    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["auto"],  # Set language code to auto to detect language.
        model="chirp_2",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
        print(f"Detected Language: {result.language_code}")

    return response

音声翻訳を行う

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def translate_sync_chirp2(
    audio_file: str
) -> cloud_speech.RecognizeResponse:
    """Translates an audio file using Chirp 2.
    Args:
        audio_file (str): Path to the local audio file to be translated.
            Example: "resources/audio.wav"
    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
        the translated results.
    """

    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["fr-FR"],  # Set language code to targeted to detect language.
        translation_config=cloud_speech.TranslationConfig(target_language="fr-FR"), # Set target language code.
        model="chirp_2",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Translated transcript: {result.alternatives[0].transcript}")

    return response

単語レベルのタイムスタンプを有効にする

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_sync_chirp2_with_timestamps(
    audio_file: str
) -> cloud_speech.RecognizeResponse:
    """Transcribes an audio file using the Chirp 2 model of Google Cloud Speech-to-Text V2 API, providing word-level timestamps for each transcribed word. 
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"
    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
        the transcription results.
    """

    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="telephony",
        features=cloud_speech.RecognitionFeatures(
            enable_word_time_offsets=True, # Enabling word-level timestamps
        )
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

モデル適応により精度を向上させる

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_sync_chirp2_model_adaptation(
    audio_file: str
) -> cloud_speech.RecognizeResponse:
    """Transcribes an audio file using the Chirp 2 model with adaptation, improving accuracy for specific audio characteristics or vocabulary.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
            Example: "resources/audio.wav"
    Returns:
        cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
        the transcription results.
    """

    # Instantiates a client
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint="us-central1-speech.googleapis.com",
        )
    )

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp_2",
        # Use model adaptation
        adaptation=cloud_speech.SpeechAdaptation( 
          phrase_sets=[
              cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                  inline_phrase_set=cloud_speech.PhraseSet(phrases=[
                    {
                        "value": "alphabet",
                    },
                    {
                          "value": "cell phone service",
                    }
                  ])
              )
          ]
        )
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/us-central1/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

Google Cloud コンソールで Chirp 2 を使用する

Google Cloud アカウントを登録して、プロジェクトを作成していることを確認します。
Google Cloud コンソールで [Speech] に移動します。
API が有効になっていない場合は有効にします。
STT コンソールのワークスペースがあることを確認します。ワークスペースがない場合は作成する必要があります。
1. [音声文字変換] ページにアクセスし、[New Transcription] をクリックします。
2. [ワークスペース] プルダウンを開き、[New Workspace] をクリックして、音声文字変換用のワークスペースを作成します。
3. [Create a new workspace] ナビゲーションサイドバーで [参照] をクリックします。
4. クリックすると新しいバケットが作成されます。
5. バケットの名前を入力して、[続行] をクリックします。
6. [作成] をクリックして Cloud Storage バケットを作成します。
7. バケットが作成されたら、[選択] をクリックして使用するバケットを選択します。
8. [作成] をクリックして、Speech-to-Text API V2 コンソール用のワークスペースの作成を完了します。
実際の音声に音声文字変換を行います。

[新しい音声文字変換] ページで、[ローカルアップロード]（アップロード）または [Cloud Storage]（既存の Cloud Storage ファイルの指定）のいずれかから音声ファイルを選択します。
[続行] をクリックして、[ 音声文字変換のオプション] に移動します。
1. 以前に作成した認識ツールから、Chirp で認識に使用する音声言語を選択します。
2. [モデル] プルダウンから、[Chirp - ユニバーサル音声モデル] を選択します。
3. [認識ツール] プルダウンで、新しく作成した認識ツールを選択します。
4. [送信] をクリックし、Chirp を使用して最初の認識リクエストを実行します。
Chirp 2 の音声文字変換の結果を表示します。
1. [音声文字変換] ページで、音声文字変換の名前をクリックして結果を表示します。
2. [Transcription details] ページで、音声文字変換の結果を表示し、必要に応じてブラウザで音声を再生します。

クリーンアップ

このページで使用したリソースについて、Google Cloud アカウントに課金されないようにするには、次の操作を行います。

Optional: Revoke the authentication credentials that you created, and delete the local credential file.
```
gcloud auth application-default revoke
```
Optional: Revoke credentials from the gcloud CLI.
```
gcloud auth revoke
```

コンソール

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用すると、プロジェクトの割り当て上限を超えないようにすることができます。

In the Google Cloud console, go to the Manage resources page.

Go to Manage resources

In the project list, select the project that you want to delete, and then click Delete.

In the dialog, type the project ID, and then click Shut down to delete the project.

gcloud

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

次のステップ

短い音声ファイルの文字変換を行う。
ストリーミング音声を文字に変換する方法を学習する。
長い音声ファイルを文字に変換する方法を学習する。
ベストプラクティスのドキュメントで、最高のパフォーマンスと精度を実現するための方法やヒントを確認する。