Vertex AI の最新のマルチモーダルモデルである Gemini 1.5 モデルを試して、最大 200 万のトークンコンテキストウィンドウで何を構築できるかご確認ください。 Vertex AI の最新のマルチモーダルモデルである Gemini 1.5 モデルを試して、最大 200 万のトークンコンテキストウィンドウで何を構築できるかご確認ください。

言語を自動検出する

このページでは、Speech-to-Text に送信する音声文字変換リクエストに複数の言語コードを指定する方法について説明します。

音声録音に含まれている言語を明確に特定できない場合があります。たとえば、複数の公用語がある国でサービス、アプリ、製品を公開した場合、さまざまな言語でユーザーから音声入力を受け取ることが考えられます。この場合、1 つの言語コードを音声文字変換リクエストに指定することは非常に難しくなります。

複数の言語の認識

Speech-to-Text では、音声データに含まれる可能性のある一連の代替言語をユーザーが指定できます。つまり、Speech-to-Text に音声文字変換リクエストを送信するときに、音声データに含まれる可能性のある追加言語のリストを指定できます。言語リストをリクエストに含めると、Speech-to-Text は指定された言語の選択肢の中からサンプルに最適な言語を使用して音声文字変換を行い、音声文字変換の結果に予測した言語コードのラベルを付けます。

この機能は、音声コマンドや検索などの短い文を音声文字変換する必要があるアプリに最適です。第一言語に加えて、Speech-to-Text でサポートしている言語から 3 つの代替言語（合計 4 つの言語）をリストアップできます。

音声文字変換リクエストに代替言語を指定できますが、languageCode フィールドには第一言語コードを指定する必要があります。また、リクエストする言語数は最小限に抑えてください。リクエストする代替言語コードの数が少ないほど、Speech-to-Text で正しい言語を選択できる確率が上がります。1 つの言語だけを指定すると最良の結果が得られます。

音声文字変換リクエストで言語認識を有効にする

音声文字変換で別の言語を指定するには、リクエストの際に RecognitionConfig パラメータの alternativeLanguageCodes フィールドに言語コードのリストを設定する必要があります。Speech-to-Text では、speech:recognize、speech:longrunningrecognize、ストリーミングのどの音声認識メソッドでも代替言語コードがサポートされています。

ローカルファイルを使用する

プロトコル

詳細については、speech:recognize API エンドポイントをご覧ください。

同期音声認識を実行するには、POST リクエストを作成し、適切なリクエスト本文を指定します。次は、curl を使用した POST リクエストの例です。この例では、Google Cloud CLI を使用するプロジェクト用に設定されたサービスアカウントのアクセストークンを使用します。gcloud CLI のインストール、サービスアカウントでのプロジェクトの設定、アクセストークンの取得を行う手順については、クイックスタートをご覧ください。

次の例は、英語、フランス語、ドイツ語の音声を含む可能性のある音声ファイルの音声文字変換をリクエストする方法を示しています。

curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    https://speech.googleapis.com/v1p1beta1/speech:recognize \
    --data '{
    "config": {
        "encoding": "LINEAR16",
        "languageCode": "en-US",
        "alternativeLanguageCodes": ["fr-FR", "de-DE"],
        "model": "command_and_search"
    },
    "audio": {
        "uri": "gs://cloud-samples-tests/speech/commercial_mono.wav"
    }
}' > multi-language.txt

リクエストが成功すると、サーバーは 200 OK HTTP ステータスコードと JSON 形式のレスポンス（multi-language.txt というファイル名で保存される）を返します。

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "hi I'd like to buy a Chromecast I'm ..."
          "confidence": 0.9466864
        }
      ],
      "languageCode": "en-us"
    },
    {
      "alternatives": [
        {
          "transcript": " let's go with the black one",
          "confidence": 0.9829583
        }
      ],
      "languageCode": "en-us"
    },
  ]
}

Java

GitHub で表示フィードバック

/**
 * Transcribe a local audio file with multi-language recognition
 *
 * @param fileName the path to the audio file
 */
public static void transcribeMultiLanguage(String fileName) throws Exception {
  Path path = Paths.get(fileName);
  // Get the contents of the local audio file
  byte[] content = Files.readAllBytes(path);

  try (SpeechClient speechClient = SpeechClient.create()) {

    RecognitionAudio recognitionAudio =
        RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
    ArrayList<String> languageList = new ArrayList<>();
    languageList.add("es-ES");
    languageList.add("en-US");

    // Configure request to enable multiple languages
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.LINEAR16)
            .setSampleRateHertz(16000)
            .setLanguageCode("ja-JP")
            .addAllAlternativeLanguageCodes(languageList)
            .build();
    // Perform the transcription request
    RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);

    // Print out the results
    for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternatives(0);
      System.out.format("Transcript : %s\n\n", alternative.getTranscript());
    }
  }
}

Node.js

GitHub で表示フィードバック

const fs = require('fs');

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const fileName = 'Local path to audio file, e.g. /path/to/audio.raw';

const config = {
  encoding: 'LINEAR16',
  sampleRateHertz: 44100,
  languageCode: 'en-US',
  alternativeLanguageCodes: ['es-ES', 'en-US'],
};

const audio = {
  content: fs.readFileSync(fileName).toString('base64'),
};

const request = {
  config: config,
  audio: audio,
};

const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
console.log(`Transcription: ${transcription}`);

Python

GitHub で表示フィードバック

from google.cloud import speech_v1p1beta1 as speech

client = speech.SpeechClient()

speech_file = "resources/multi.wav"
first_lang = "en-US"
second_lang = "es"

with open(speech_file, "rb") as audio_file:
    content = audio_file.read()

audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=44100,
    audio_channel_count=2,
    language_code=first_lang,
    alternative_language_codes=[second_lang],
)

print("Waiting for operation to complete...")
response = client.recognize(config=config, audio=audio)

for i, result in enumerate(response.results):
    alternative = result.alternatives[0]
    print("-" * 20)
    print(u"First alternative of result {}: {}".format(i, alternative))
    print(u"Transcript: {}".format(alternative.transcript))

リモートファイルを使用する

Java

GitHub で表示フィードバック

/**
 * Transcribe a remote audio file with multi-language recognition
 *
 * @param gcsUri the path to the remote audio file
 */
public static void transcribeMultiLanguageGcs(String gcsUri) throws Exception {
  try (SpeechClient speechClient = SpeechClient.create()) {

    ArrayList<String> languageList = new ArrayList<>();
    languageList.add("es-ES");
    languageList.add("en-US");

    // Configure request to enable multiple languages
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.LINEAR16)
            .setSampleRateHertz(16000)
            .setLanguageCode("ja-JP")
            .addAllAlternativeLanguageCodes(languageList)
            .build();

    // Set the remote path for the audio file
    RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

    // Use non-blocking call for getting file transcription
    OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
        speechClient.longRunningRecognizeAsync(config, audio);

    while (!response.isDone()) {
      System.out.println("Waiting for response...");
      Thread.sleep(10000);
    }

    for (SpeechRecognitionResult result : response.get().getResultsList()) {

      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);

      // Print out the result
      System.out.printf("Transcript : %s\n\n", alternative.getTranscript());
    }
  }
}

Node.js

GitHub で表示フィードバック

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const uri = path to GCS audio file e.g. `gs:/bucket/audio.wav`;

const config = {
  encoding: 'LINEAR16',
  sampleRateHertz: 44100,
  languageCode: 'en-US',
  alternativeLanguageCodes: ['es-ES', 'en-US'],
};

const audio = {
  uri: gcsUri,
};

const request = {
  config: config,
  audio: audio,
};

const [operation] = await client.longRunningRecognize(request);
const [response] = await operation.promise();
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
console.log(`Transcription: ${transcription}`);

言語を自動検出する

複数の言語の認識

音声文字変換リクエストで言語認識を有効にする

ローカル ファイルを使用する

プロトコル

Java

Node.js

Python

リモート ファイルを使用する

Java

Node.js

ローカルファイルを使用する

リモートファイルを使用する