Detecting language spoken automatically

This page describes how to provide multiple language codes for audio transcription requests sent to Speech-to-Text.

In some situations, you don't know for certain what language your audio recordings contain. For example, if you publish your service, app, or product in a country with multiple official languages, you can potentially receive audio input from users in a variety of languages. This can make specifying a single language code for transcription requests significantly more difficult.

Multiple language recognition

Speech-to-Text offers a way for you to specify a set of alternative languages that your audio data might contain. When you send an audio transcription request to Speech-to-Text, you can provide a list of additional languages that the audio data might include. If you include a list of languages in your request, Speech-to-Text attempts to transcribe the audio based upon the language that best fits the sample from the alternates you provide. Speech-to-Text then labels the transcription results with the predicted language code.

This feature is ideal for apps that need to transcribe short statements like voice commands or search. You can list up to three alternative languages from among those that Speech-to-Text supports in addition to your primary language (for four languages total).

Even though you can specify alternative languages for your speech transcription request, you must still provide a primary language code in the languageCode field. Also, you should constrain the number of languages you request to a bare minimum. The fewer alternative language codes that you request helps Speech-to-Text more successfully select the correct one. Specifying just a single language produces the best results.

Enabling language recognition in audio transcription requests

To specify alternative languages in your audio transcription, you must set the alternativeLanguageCodes field to a list of language codes in the RecognitionConfig parameters for the request. Cloud Speech-to-Text supports alternative language codes for all speech recognition methods: speech:recognize, speech:longrunningrecognize, and Streaming.


Refer to the speech:recognize API endpoint for complete details.

To perform synchronous speech recognition, make a POST request and provide the appropriate request body. The following shows an example of a POST request using curl. The example uses the access token for a service account set up for the project using the Google Cloud Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account, and obtaining an access token, see the quickstart.

The following example shows how to request transcription of an audio file that may include speech in English, French, or German.

curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ \
    --data '{
    "config": {
        "languageCode": "en-US",
        "alternativeLanguageCodes": ["fr-FR", "de-DE"],
        "model": "command_and_search"
    "audio": {
}' > multi-language.txt

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format, saved to a file named multi-language.txt.

  "results": [
      "alternatives": [
          "transcript": "hi I'd like to buy a Chromecast I'm ..."
          "confidence": 0.9466864
      "languageCode": "en-us"
      "alternatives": [
          "transcript": " let's go with the black one",
          "confidence": 0.9829583
      "languageCode": "en-us"


 * Transcribe a local audio file with multi-language recognition
 * @param fileName the path to the audio file
public static void transcribeMultiLanguage(String fileName) throws Exception {
  Path path = Paths.get(fileName);
  // Get the contents of the local audio file
  byte[] content = Files.readAllBytes(path);

  try (SpeechClient speechClient = SpeechClient.create()) {

    RecognitionAudio recognitionAudio =
    ArrayList<String> languageList = new ArrayList<>();

    // Configure request to enable multiple languages
    RecognitionConfig config =
    // Perform the transcription request
    RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);

    // Print out the results
    for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternatives(0);
      System.out.format("Transcript : %s\n\n", alternative.getTranscript());


const fs = require('fs');

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;

// Creates a client
const client = new speech.SpeechClient();

 * TODO(developer): Uncomment the following lines before running the sample.
// const fileName = 'Local path to audio file, e.g. /path/to/audio.raw';

const config = {
  encoding: 'LINEAR16',
  sampleRateHertz: 44100,
  languageCode: `en-US`,
  alternativeLanguageCodes: [`es-ES`, `en-US`],

const audio = {
  content: fs.readFileSync(fileName).toString('base64'),

const request = {
  config: config,
  audio: audio,

const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
console.log(`Transcription: ${transcription}`);


from import speech_v1p1beta1
import io

def sample_recognize(local_file_path):
    Transcribe a short audio file with language detected from a list of possible

      local_file_path Path to local audio file, e.g. /path/audio.wav

    client = speech_v1p1beta1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.flac'

    # The language of the supplied audio. Even though additional languages are
    # provided by alternative_language_codes, a primary language is still required.
    language_code = "fr"

    # Specify up to 3 additional languages as possible alternative languages
    # of the supplied audio.
    alternative_language_codes_element = "es"
    alternative_language_codes_element_2 = "en"
    alternative_language_codes = [
    config = {
        "language_code": language_code,
        "alternative_language_codes": alternative_language_codes,
    with, "rb") as f:
        content =
    audio = {"content": content}

    response = client.recognize(config, audio)
    for result in response.results:
        # The language_code which was detected as the most likely being spoken in the audio
        print(u"Detected language: {}".format(result.language_code))
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Speech-to-Text Documentation
Need help? Visit our support page.