Enabling word-level confidence

You can specify that Cloud Speech-to-Text indicate a value of accuracy, or confidence level, for individual words in a transcription.

Word-level confidence

When the Cloud Speech-to-Text transcribes an audio clip, it also measures the degree of accuracy for the response. The response sent from Cloud Speech-to-Text states the confidence level for the entire transcription request as a number between 0.0 and 1.0. The following code sample shows an example of the confidence level value returned by Cloud Speech-to-Text.

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.96748614
        }
      ]
    }
  ]
}

In addition to the confidence level of the entire transcription, Cloud Speech-to-Text can also provide the confidence level of individual words within the transcription. The response then includes WordInfo details in the transcription, indicating the confidence level for individual words as shown in the following example.

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98360395,
          "words": [
            {
              "startTime": "0s",
              "endTime": "0.300s",
              "word": "how",
              "confidence": SOME NUMBER
            },
            ...
          ]
        }
      ]
    }
  ]
}

Enabling word-level confidence in a request

The following code snippet demonstrates how to enable word-level confidence in a transcription request to Cloud Speech-to-Text.

Protocol

Refer to the speech:recognize API endpoint for complete details.

To perform synchronous speech recognition, make a POST request and provide the appropriate request body. The following shows an example of a POST request using curl. The example uses the access token for a service account set up for the project using the Google Cloud Platform Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account, and obtaining an access token, see the quickstart.

The following example show how to send a POST request using curl, where the body of the request enables word-level confidence.

curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    https://speech.googleapis.com/v1p1beta1/speech:recognize \
    --data '{
    "config": {
        "encoding":"FLAC",
        "sampleRateHertz": 16000,
        "languageCode": "en-US",
        "enableWordTimeOffsets": true,
        "enableWordConfidence": true
    },
    "audio": {
        "uri":"gs://cloud-samples-tests/speech/brooklyn.flac"
    }
}' > word-level-confidence.txt

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format, saved to a file named word-level-confidence.txt.

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98360395,
          "words": [
            {
              "startTime": "0s",
              "endTime": "0.300s",
              "word": "how",
              "confidence": 0.98762906
            },
            {
              "startTime": "0.300s",
              "endTime": "0.600s",
              "word": "old",
              "confidence": 0.96929157
            },
            {
              "startTime": "0.600s",
              "endTime": "0.800s",
              "word": "is",
              "confidence": 0.98271006
            },
            {
              "startTime": "0.800s",
              "endTime": "0.900s",
              "word": "the",
              "confidence": 0.98271006
            },
            {
              "startTime": "0.900s",
              "endTime": "1.100s",
              "word": "Brooklyn",
              "confidence": 0.98762906
            },
            {
              "startTime": "1.100s",
              "endTime": "1.500s",
              "word": "Bridge",
              "confidence": 0.98762906
            }
          ]
        }
      ],
      "languageCode": "en-us"
    }
  ]
}

Java

/**
 * Transcribe a local audio file with word level confidence
 *
 * @param fileName the path to the local audio file
 */
public static void transcribeWordLevelConfidence(String fileName) throws Exception {
  Path path = Paths.get(fileName);
  byte[] content = Files.readAllBytes(path);

  try (SpeechClient speechClient = SpeechClient.create()) {
    RecognitionAudio recognitionAudio =
        RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
    // Configure request to enable word level confidence
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.LINEAR16)
            .setSampleRateHertz(16000)
            .setLanguageCode("en-US")
            .setEnableWordConfidence(true)
            .build();
    // Perform the transcription request
    RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);

    // Print out the results
    for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternatives(0);
      System.out.format("Transcript : %s\n", alternative.getTranscript());
      System.out.format(
          "First Word and Confidence : %s %s \n",
          alternative.getWords(0).getWord(), alternative.getWords(0).getConfidence());
    }
  }
}

Node.js

const fs = require('fs');

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech').v1p1beta1;

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const fileName = 'Local path to audio file, e.g. /path/to/audio.raw';

const config = {
  encoding: `FLAC`,
  sampleRateHertz: 16000,
  languageCode: `en-US`,
  enableWordConfidence: true,
};

const audio = {
  content: fs.readFileSync(fileName).toString('base64'),
};

const request = {
  config: config,
  audio: audio,
};

const [response] = await client.recognize(request);
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
const confidence = response.results
  .map(result => result.alternatives[0].confidence)
  .join(`\n`);
console.log(`Transcription: ${transcription} \n Confidence: ${confidence}`);

console.log(`Word-Level-Confidence:`);
const words = response.results.map(result => result.alternatives[0]);
words[0].words.forEach(a => {
  console.log(` word: ${a.word}, confidence: ${a.confidence}`);
});

Python

from google.cloud import speech_v1p1beta1
import io


def sample_recognize(local_file_path):
    """
    Print confidence level for individual words in a transcription of a short audio
    file.

    Args:
      local_file_path Path to local audio file, e.g. /path/audio.wav
    """

    client = speech_v1p1beta1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.flac'

    # When enabled, the first result returned by the API will include a list
    # of words and the confidence level for each of those words.
    enable_word_confidence = True

    # The language of the supplied audio
    language_code = "en-US"
    config = {
        "enable_word_confidence": enable_word_confidence,
        "language_code": language_code,
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}

    response = client.recognize(config, audio)
    # The first result includes confidence levels per word
    result = response.results[0]
    # First alternative is the most probable result
    alternative = result.alternatives[0]
    print(u"Transcript: {}".format(alternative.transcript))
    # Print the confidence level of each word
    for word in alternative.words:
        print(u"Word: {}".format(word.word))
        print(u"Confidence: {}".format(word.confidence))

Trang này có hữu ích không? Hãy cho chúng tôi biết đánh giá của bạn:

Gửi phản hồi về...

Bạn cần trợ giúp? Truy cập trang hỗ trợ của chúng tôi.