自动检测语言

本页面介绍了如何为发送到 Cloud Speech-to-Text 的音频转录请求提供多种语言代码。

有些时候,您并不确定音频录音中会包含哪些语言。例如,如果您在具有多种官方语言的国家/地区发布服务、应用或产品,则可能会接收用户以多种语言提供的音频输入。这种情况下,为转录请求指定单独一种语言代码的难度很大。

多语言识别

Speech-to-Text 为您提供了一种方法,让您可以指定音频数据可能包含的一组备用语言。当您向 Speech-to-Text 发送音频转录请求时,您可以提供一份列表,在其中列明音频数据可能包含的其他语言。如果您在请求中包含语言列表,则 Speech-to-Text 会尝试根据您提供的备用语言中最适合样本的语言转录音频。随后,Speech-to-Text 会使用预测的语言代码标记转录结果。

此功能非常适合需要转录语音指令或搜索等简短语句的应用。除了主要语言之外,您还可以从 Speech-to-Text 支持的语言中任选三种,作为备用语言列出(总共四种语言)。

尽管您可以为语音转录请求指定备用语言,但仍然必须在 languageCode 字段中提供主要语言代码。此外,您应该将请求使用的语言数量限制在最低限度。请求使用的备用语言代码越少,就越有助于 Speech-to-Text 成功选择正确的语言代码。仅指定一种语言的效果最为理想。

在音频转录请求中启用语言识别

如需在音频转录中指定备用语言,必须将请求的 RecognitionConfig 参数的 alternativeLanguageCodes 字段设置为语言代码的列表。Speech-to-Text 的备用语言代码支持以下所有语音识别方法:speech:recognizespeech:longrunningrecognize流式

使用本地文件

协议

如需了解完整的详细信息,请参阅 speech:recognize API 端点。

如需执行同步语音识别,请发出 POST 请求并提供相应的请求正文。以下示例展示了一个使用 curl 发出的 POST 请求。该示例使用通过 Google Cloud Cloud SDK 为项目设置的服务帐号的访问令牌。如需了解有关安装 Cloud SDK、建立项目和服务帐号以及获取访问令牌的说明,请参阅快速入门

以下示例展示了如何请求可能包含英语、法语或德语语音的音频文件的转录。

curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    https://speech.googleapis.com/v1p1beta1/speech:recognize \
    --data '{
    "config": {
        "encoding": "LINEAR16",
        "languageCode": "en-US",
        "alternativeLanguageCodes": ["fr-FR", "de-DE"],
        "model": "command_and_search"
    },
    "audio": {
        "uri": "gs://cloud-samples-tests/speech/commercial_mono.wav"
    }
}' > multi-language.txt

如果请求成功,服务器将返回一个 200 OK HTTP 状态代码以及 JSON 格式的响应(该响应会保存到名为 multi-language.txt 的文件中)。

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "hi I'd like to buy a Chromecast I'm ..."
          "confidence": 0.9466864
        }
      ],
      "languageCode": "en-us"
    },
    {
      "alternatives": [
        {
          "transcript": " let's go with the black one",
          "confidence": 0.9829583
        }
      ],
      "languageCode": "en-us"
    },
  ]
}

Java

/**
 * Transcribe a local audio file with multi-language recognition
 *
 * @param fileName the path to the audio file
 */
public static void transcribeMultiLanguage(String fileName) throws Exception {
  Path path = Paths.get(fileName);
  // Get the contents of the local audio file
  byte[] content = Files.readAllBytes(path);

  try (SpeechClient speechClient = SpeechClient.create()) {

    RecognitionAudio recognitionAudio =
        RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
    ArrayList<String> languageList = new ArrayList<>();
    languageList.add("es-ES");
    languageList.add("en-US");

    // Configure request to enable multiple languages
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.LINEAR16)
            .setSampleRateHertz(16000)
            .setLanguageCode("ja-JP")
            .addAllAlternativeLanguageCodes(languageList)
            .build();
    // Perform the transcription request
    RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);

    // Print out the results
    for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternatives(0);
      System.out.format("Transcript : %s\n\n", alternative.getTranscript());
    }
  }
}

Node.js

const fs = require('fs');

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const fileName = 'Local path to audio file, e.g. /path/to/audio.raw';

const config = {
  encoding: 'LINEAR16',
  sampleRateHertz: 44100,
  languageCode: 'en-US',
  alternativeLanguageCodes: ['es-ES', 'en-US'],
};

const audio = {
  content: fs.readFileSync(fileName).toString('base64'),
};

const request = {
  config: config,
  audio: audio,
};

const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
console.log(`Transcription: ${transcription}`);

Python

from google.cloud import speech_v1p1beta1
import io

def sample_recognize(local_file_path):
    """
    Transcribe a short audio file with language detected from a list of possible
    languages

    Args:
      local_file_path Path to local audio file, e.g. /path/audio.wav
    """

    client = speech_v1p1beta1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.flac'

    # The language of the supplied audio. Even though additional languages are
    # provided by alternative_language_codes, a primary language is still required.
    language_code = "fr"

    # Specify up to 3 additional languages as possible alternative languages
    # of the supplied audio.
    alternative_language_codes_element = "es"
    alternative_language_codes_element_2 = "en"
    alternative_language_codes = [
        alternative_language_codes_element,
        alternative_language_codes_element_2,
    ]
    config = {
        "language_code": language_code,
        "alternative_language_codes": alternative_language_codes,
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}

    response = client.recognize(config, audio)
    for result in response.results:
        # The language_code which was detected as the most likely being spoken in the audio
        print(u"Detected language: {}".format(result.language_code))
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))

使用远程文件

Java

/**
 * Transcribe a remote audio file with multi-language recognition
 *
 * @param gcsUri the path to the remote audio file
 */
public static void transcribeMultiLanguageGcs(String gcsUri) throws Exception {
  try (SpeechClient speechClient = SpeechClient.create()) {

    ArrayList<String> languageList = new ArrayList<>();
    languageList.add("es-ES");
    languageList.add("en-US");

    // Configure request to enable multiple languages
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.LINEAR16)
            .setSampleRateHertz(16000)
            .setLanguageCode("ja-JP")
            .addAllAlternativeLanguageCodes(languageList)
            .build();

    // Set the remote path for the audio file
    RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

    // Use non-blocking call for getting file transcription
    OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
        speechClient.longRunningRecognizeAsync(config, audio);

    while (!response.isDone()) {
      System.out.println("Waiting for response...");
      Thread.sleep(10000);
    }

    for (SpeechRecognitionResult result : response.get().getResultsList()) {

      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);

      // Print out the result
      System.out.printf("Transcript : %s\n\n", alternative.getTranscript());
    }
  }
}

Node.js

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const uri = path to GCS audio file e.g. `gs:/bucket/audio.wav`;

const config = {
  encoding: 'LINEAR16',
  sampleRateHertz: 44100,
  languageCode: 'en-US',
  alternativeLanguageCodes: ['es-ES', 'en-US'],
};

const audio = {
  uri: gcsUri,
};

const request = {
  config: config,
  audio: audio,
};

const [operation] = await client.longRunningRecognize(request);
const [response] = await operation.promise();
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
console.log(`Transcription: ${transcription}`);