Transcribing Short Audio Files

This page demonstrates how to transcribe a short audio file to text using synchronous speech recognition.

Synchronous speech recognition returns the recognized text for short audio (less than ~1 minute) in the response as soon as it is processed. To process a speech recognition request for long audio, use Asynchronous Speech Recognition.

Audio content can be sent directly to Cloud Speech-to-Text, or it can process audio content that already resides in Google Cloud Storage. See also the audio limits for synchronous speech recognition requests.

The Speech-to-Text v1 is officially released and is generally available from the https://speech.googleapis.com/v1/speech endpoint. The Client Libraries are released as Alpha and will likely be changed in backward-incompatible ways. The client libraries are currently not recommended for production use.

These samples require that you have set up gcloud and have created and activated a service account. For information about setting up gcloud, and also creating and activating a service account, see Quickstart.

Performing Synchronous Speech Recognition on a Local File

Here is an example of performing synchronous speech recognition on a local audio file:

Protocol

Refer to the speech:recognize API endpoint for complete details.

To perform synchronous speech recognition, make a POST request and provide the appropriate request body. The following shows an example of a POST request using curl. The example uses the access token for a service account set up for the project using the Google Cloud Platform Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account, and obtaining an access token, see the Quickstart.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'config': {
    'encoding': 'LINEAR16',
    'sampleRateHertz': 16000,
    'languageCode': 'en-US',
    'enableWordTimeOffsets': false
  },
  'audio': {
    'content': '/9j/7QBEUGhvdG9zaG9...base64-encoded-audio-content...fXNWzvDEeYxxxzj/Coa6Bax//Z'
  }
}" "https://speech.googleapis.com/v1/speech:recognize"
  

See the RecognitionConfig reference documentation for more information on configuring the request body.

The audio content supplied in the request body is base64-encoded. For more information on how to base64-encode audio, see Base64 Encoding Audio Content. For more information on the content field, see RecognitionAudio.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}

GCLOUD COMMAND

Refer to recognize command for complete details.

To perform speech recognition on a local file, use the gcloud command line tool, passing in the local filepath of the file to perform speech recognition on.

gcloud ml speech recognize PATH-TO-LOCAL-FILE --language-code='en-US'

If the request is successful, the server returns a response in JSON format:

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.9840146,
          "transcript": "how old is the Brooklyn Bridge"
        }
      ]
    }
  ]
}

C#

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

static object SyncRecognize(string filePath)
{
    var speech = SpeechClient.Create();
    var response = speech.Recognize(new RecognitionConfig()
    {
        Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
        SampleRateHertz = 16000,
        LanguageCode = "en",
    }, RecognitionAudio.FromFile(filePath));
    foreach (var result in response.Results)
    {
        foreach (var alternative in result.Alternatives)
        {
            Console.WriteLine(alternative.Transcript);
        }
    }
    return 0;
}

Go

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

  1. Import the Speech API:

    "golang.org/x/net/context"
    
    speech "cloud.google.com/go/speech/apiv1"
    speechpb "google.golang.org/genproto/googleapis/cloud/speech/v1"

  2. Create a client:

    client, err := speech.NewClient(ctx)
    if err != nil {
    	log.Fatal(err)
    }

  3. Make the request:

    data, err := ioutil.ReadFile(file)
    if err != nil {
    	log.Fatal(err)
    }
    
    // Send the contents of the audio file with the encoding and
    // and sample rate information to be transcripted.
    resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
    	Config: &speechpb.RecognitionConfig{
    		Encoding:        speechpb.RecognitionConfig_LINEAR16,
    		SampleRateHertz: 16000,
    		LanguageCode:    "en-US",
    	},
    	Audio: &speechpb.RecognitionAudio{
    		AudioSource: &speechpb.RecognitionAudio_Content{Content: data},
    	},
    })

  4. Print the result:

    // Print the results.
    for _, result := range resp.Results {
    	for _, alt := range result.Alternatives {
    		fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
    	}
    }

Java

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

public static void syncRecognizeFile(String fileName) throws Exception {
  try (SpeechClient speech = SpeechClient.create()) {
    Path path = Paths.get(fileName);
    byte[] data = Files.readAllBytes(path);
    ByteString audioBytes = ByteString.copyFrom(data);

    // Configure request with local raw PCM audio
    RecognitionConfig config = RecognitionConfig.newBuilder()
        .setEncoding(AudioEncoding.LINEAR16)
        .setLanguageCode("en-US")
        .setSampleRateHertz(16000)
        .build();
    RecognitionAudio audio = RecognitionAudio.newBuilder()
        .setContent(audioBytes)
        .build();

    // Use blocking call to get audio transcript
    RecognizeResponse response = speech.recognize(config, audio);
    List<SpeechRecognitionResult> results = response.getResultsList();

    for (SpeechRecognitionResult result : results) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
      System.out.printf("Transcription: %s%n", alternative.getTranscript());
    }
  }
}

Node.js

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

// Imports the Google Cloud client library
const fs = require('fs');
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const filename = 'Local path to audio file, e.g. /path/to/audio.raw';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
};
const audio = {
  content: fs.readFileSync(filename).toString('base64'),
};

const request = {
  config: config,
  audio: audio,
};

// Detects speech in the audio file
client
  .recognize(request)
  .then(data => {
    const response = data[0];
    const transcription = response.results
      .map(result => result.alternatives[0].transcript)
      .join('\n');
    console.log(`Transcription: `, transcription);
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

use Google\Cloud\Speech\SpeechClient;

/**
 * Transcribe an audio file using Google Cloud Speech API
 * Example:
 * ```
 * transcribe_sync('/path/to/audiofile.wav');
 * ```.
 *
 * @param string $audioFile path to an audio file.
 * @param string $languageCode The language of the content to
 *     be recognized. Accepts BCP-47 (e.g., `"en-US"`, `"es-ES"`).
 * @param array $options configuration options.
 *
 * @return string the text transcription
 */
function transcribe_sync($audioFile, $languageCode = 'en-US', $options = [])
{
    // Create the speech client
    $speech = new SpeechClient([
        'languageCode' => $languageCode,
    ]);

    // Make the API call
    $results = $speech->recognize(
        fopen($audioFile, 'r'),
        $options
    );

    // Print the results
    foreach ($results as $result) {
        $alternative = $result->alternatives()[0];
        printf('Transcript: %s' . PHP_EOL, $alternative['transcript']);
        printf('Confidence: %s' . PHP_EOL, $alternative['confidence']);
    }
}

Python

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

def transcribe_file(speech_file):
    """Transcribe the given audio file."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    with io.open(speech_file, 'rb') as audio_file:
        content = audio_file.read()

    audio = types.RecognitionAudio(content=content)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code='en-US')

    response = client.recognize(config, audio)
    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(u'Transcript: {}'.format(result.alternatives[0].transcript))

Ruby

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

# project_id      = "Your Google Cloud project ID"
# audio_file_path = "Path to file on which to perform speech recognition"

require "google/cloud/speech"

speech = Google::Cloud::Speech.new project: project_id
audio  = speech.audio audio_file_path, encoding:    :linear16,
                                       sample_rate: 16000,
                                       language:    "en-US"

results = audio.recognize

results.each do |result|
  puts "Transcription: #{result.transcript}"
end

Performing Synchronous Speech Recognition on a Remote File

For your convenience, Speech-to-Text API can perform synchronous speech recognition directly on an audio file located in Google Cloud Storage, without the need to send the contents of the audio file in the body of your request.

Here is an example of performing synchronous speech recognition on a file located in Cloud Storage:

Protocol

Refer to the speech:recognize API endpoint for complete details.

To perform synchronous speech recognition, make a POST request and provide the appropriate request body. The following shows an example of a POST request using curl. The example uses the access token for a service account set up for the project using the Google Cloud Platform Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account, and obtaining an access token, see the Quickstart.

curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'config': {
    'encoding': 'LINEAR16',
    'sampleRateHertz': 16000,
    'languageCode': 'en-US'
  },
  'audio': {
    'uri': 'gs://YOUR_BUCKET_NAME/YOUR_FILE_NAME'
  }
}" "https://speech.googleapis.com/v1/speech:recognize"

See the RecognitionConfig reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}

GCLOUD COMMAND

Refer to recognize command for complete details.

To perform speech recognition on a local file, use the gcloud command line tool, passing in the local filepath of the file to perform speech recognition on.

gcloud ml speech recognize 'gs://cloud-samples-tests/speech/brooklyn.flac' \
--language-code='en-US'

If the request is successful, the server returns a response in JSON format:

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.9840146,
          "transcript": "how old is the Brooklyn Bridge"
        }
      ]
    }
  ]
}

C#

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

static object SyncRecognizeGcs(string storageUri)
{
    var speech = SpeechClient.Create();
    var response = speech.Recognize(new RecognitionConfig()
    {
        Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
        SampleRateHertz = 16000,
        LanguageCode = "en",
    }, RecognitionAudio.FromStorageUri(storageUri));
    foreach (var result in response.Results)
    {
        foreach (var alternative in result.Alternatives)
        {
            Console.WriteLine(alternative.Transcript);
        }
    }
    return 0;
}

Go

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

  1. Import the Speech API:

    "golang.org/x/net/context"
    
    speech "cloud.google.com/go/speech/apiv1"
    speechpb "google.golang.org/genproto/googleapis/cloud/speech/v1"

  2. Initialize a client:

    client, err := speech.NewClient(ctx)
    if err != nil {
    	log.Fatal(err)
    }

  3. Make the request:

    // Send the request with the URI (gs://...)
    // and sample rate information to be transcripted.
    resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
    	Config: &speechpb.RecognitionConfig{
    		Encoding:        speechpb.RecognitionConfig_LINEAR16,
    		SampleRateHertz: 16000,
    		LanguageCode:    "en-US",
    	},
    	Audio: &speechpb.RecognitionAudio{
    		AudioSource: &speechpb.RecognitionAudio_Uri{Uri: gcsURI},
    	},
    })

  4. Print the result:

    // Print the results.
    for _, result := range resp.Results {
    	for _, alt := range result.Alternatives {
    		fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
    	}
    }

Java

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

public static void syncRecognizeGcs(String gcsUri) throws Exception {
  // Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
  try (SpeechClient speech = SpeechClient.create()) {
    // Builds the request for remote FLAC file
    RecognitionConfig config = RecognitionConfig.newBuilder()
        .setEncoding(AudioEncoding.FLAC)
        .setLanguageCode("en-US")
        .setSampleRateHertz(16000)
        .build();
    RecognitionAudio audio = RecognitionAudio.newBuilder()
        .setUri(gcsUri)
        .build();

    // Use blocking call for getting audio transcript
    RecognizeResponse response = speech.recognize(config, audio);
    List<SpeechRecognitionResult> results = response.getResultsList();

    for (SpeechRecognitionResult result : results) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
      System.out.printf("Transcription: %s%n", alternative.getTranscript());
    }
  }
}

Node.js

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const gcsUri = 'gs://my-bucket/audio.raw';
// const encoding = 'Eencoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
};
const audio = {
  uri: gcsUri,
};

const request = {
  config: config,
  audio: audio,
};

// Detects speech in the audio file
client
  .recognize(request)
  .then(data => {
    const response = data[0];
    const transcription = response.results
      .map(result => result.alternatives[0].transcript)
      .join('\n');
    console.log(`Transcription: `, transcription);
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

use Google\Cloud\Speech\SpeechClient;
use Google\Cloud\Storage\StorageClient;

/**
 * Transcribe an audio file using Google Cloud Speech API
 * Example:
 * ```
 * transcribe_sync_gcs('your-bucket-name', 'audiofile.wav');
 * ```.
 *
 * @param string $audioFile path to an audio file.
 * @param string $languageCode The language of the content to
 *     be recognized. Accepts BCP-47 (e.g., `"en-US"`, `"es-ES"`).
 * @param array $options configuration options.
 *
 * @return string the text transcription
 */
function transcribe_sync_gcs($bucketName, $objectName, $languageCode = 'en-US', $options = [])
{
    // Create the speech client
    $speech = new SpeechClient([
        'languageCode' => $languageCode,
    ]);

    // Fetch the storage object
    $storage = new StorageClient();
    $object = $storage->bucket($bucketName)->object($objectName);

    // Make the API call
    $results = $speech->recognize(
        $object,
        $options
    );

    // Print the results
    foreach ($results as $result) {
        $alternative = $result->alternatives()[0];
        printf('Transcript: %s' . PHP_EOL, $alternative['transcript']);
        printf('Confidence: %s' . PHP_EOL, $alternative['confidence']);
    }
}

Python

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

def transcribe_gcs(gcs_uri):
    """Transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=16000,
        language_code='en-US')

    response = client.recognize(config, audio)
    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        print(u'Transcript: {}'.format(result.alternatives[0].transcript))

Ruby

For more on installing and creating a Speech-to-Text client, refer to Speech-to-Text Client Libraries.

# project_id   = "Your Google Cloud project ID"
# storage_path = "Path to file in Cloud Storage, eg. gs://bucket/audio.raw"

require "google/cloud/speech"

speech = Google::Cloud::Speech.new project: project_id
audio  = speech.audio storage_path, encoding:    :linear16,
                                    sample_rate: 16000,
                                    language:    "en-US"

results = audio.recognize

results.each do |result|
  puts "Transcription: #{result.transcript}"
end

¿Te ha resultado útil esta página? Danos tu opinión:

Enviar comentarios sobre...

Cloud Speech API Documentation