Konfidenz auf Wortebene aktivieren

Sie können angeben, dass in Speech-to-Text ein Genauigkeitswert bzw. ein Konfidenzgrad für einzelne Wörter in einer Transkription angegeben wird.

Konfidenz auf Wortebene

Wenn von Speech-to-Text ein Audioclip transkribiert wird, wird auch der Genauigkeitsgrad der Antwort gemessen. Die von Speech-to-Text gesendete Antwort gibt einen Konfidenzgrad für die gesamte Transkriptionsanfrage in Form einer Zahl zwischen 0,0 und 1,0 an. Das folgende Codebeispiel enthält ein Beispiel für den Konfidenzgradwert, der von Speech-to-Text zurückgegeben wird.

    {
      "results": [
        {
          "alternatives": [
            {
              "transcript": "how old is the Brooklyn Bridge",
              "confidence": 0.96748614
            }
          ]
        }
      ]
    }
    

Abgesehen vom Konfidenzgrad der gesamten Transkription kann in Speech-to-Text auch der Konfidenzgrad einzelner Wörter innerhalb der Transkription zur Verfügung gestellt werden. Die Antwort enthält in der Transkription dann WordInfo-Details, mit denen der Konfidenzgrad einzelner Wörter angegeben wird. Dies ist im folgenden Beispiel zu sehen.

    {
      "results": [
        {
          "alternatives": [
            {
              "transcript": "how old is the Brooklyn Bridge",
              "confidence": 0.98360395,
              "words": [
                {
                  "startTime": "0s",
                  "endTime": "0.300s",
                  "word": "how",
                  "confidence": SOME NUMBER
                },
                ...
              ]
            }
          ]
        }
      ]
    }
    

Konfidenz auf Wortebene in einer Anfrage aktivieren

Im folgenden Code-Snippet wird gezeigt, wie die Konfidenz auf Wortebene in einer Transkriptionsanfrage an Speech-to-Text mithilfe von lokalen und Remote-Dateien aktiviert wird.

Lokale Datei verwenden

Protokoll

Ausführliche Informationen finden Sie unter dem API-Endpunkt speech:recognize.

Für eine synchrone Spracherkennung senden Sie eine POST-Anfrage und geben den entsprechenden Anfragetext an. Das folgende Beispiel zeigt eine POST-Anfrage mit curl. In diesem Beispiel wird das Zugriffstoken für ein Dienstkonto verwendet, das mit dem Cloud SDK von Google Cloud für das Projekt eingerichtet wurde. Anleitungen zur Installation von Cloud SDK, zur Einrichtung eines Projekts mit einem Dienstkonto und zur Anforderung eines Zugriffstokens finden Sie in der Kurzanleitung.

Im folgenden Beispiel wird gezeigt, wie eine POST-Anfrage mit curl gesendet und im Text der Anfrage die Konfidenz auf Wortebene aktiviert wird.

    curl -s -H "Content-Type: application/json" \
        -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
        https://speech.googleapis.com/v1p1beta1/speech:recognize \
        --data '{
        "config": {
            "encoding": "FLAC",
            "sampleRateHertz": 16000,
            "languageCode": "en-US",
            "enableWordTimeOffsets": true,
            "enableWordConfidence": true
        },
        "audio": {
            "uri": "gs://cloud-samples-tests/speech/brooklyn.flac"
        }
    }' > word-level-confidence.txt
    

Wenn die Anfrage erfolgreich ist, gibt der Server den HTTP-Statuscode 200 OK und die Antwort im JSON-Format zurück. Diese Informationen sind in einer Datei namens word-level-confidence.txt gespeichert.

    {
      "results": [
        {
          "alternatives": [
            {
              "transcript": "how old is the Brooklyn Bridge",
              "confidence": 0.98360395,
              "words": [
                {
                  "startTime": "0s",
                  "endTime": "0.300s",
                  "word": "how",
                  "confidence": 0.98762906
                },
                {
                  "startTime": "0.300s",
                  "endTime": "0.600s",
                  "word": "old",
                  "confidence": 0.96929157
                },
                {
                  "startTime": "0.600s",
                  "endTime": "0.800s",
                  "word": "is",
                  "confidence": 0.98271006
                },
                {
                  "startTime": "0.800s",
                  "endTime": "0.900s",
                  "word": "the",
                  "confidence": 0.98271006
                },
                {
                  "startTime": "0.900s",
                  "endTime": "1.100s",
                  "word": "Brooklyn",
                  "confidence": 0.98762906
                },
                {
                  "startTime": "1.100s",
                  "endTime": "1.500s",
                  "word": "Bridge",
                  "confidence": 0.98762906
                }
              ]
            }
          ],
          "languageCode": "en-us"
        }
      ]
    }
    

Java

/**
     * Transcribe a local audio file with word level confidence
     *
     * @param fileName the path to the local audio file
     */
    public static void transcribeWordLevelConfidence(String fileName) throws Exception {
      Path path = Paths.get(fileName);
      byte[] content = Files.readAllBytes(path);

      try (SpeechClient speechClient = SpeechClient.create()) {
        RecognitionAudio recognitionAudio =
            RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
        // Configure request to enable word level confidence
        RecognitionConfig config =
            RecognitionConfig.newBuilder()
                .setEncoding(AudioEncoding.LINEAR16)
                .setSampleRateHertz(16000)
                .setLanguageCode("en-US")
                .setEnableWordConfidence(true)
                .build();
        // Perform the transcription request
        RecognizeResponse recognizeResponse = speechClient.recognize(config, recognitionAudio);

        // Print out the results
        for (SpeechRecognitionResult result : recognizeResponse.getResultsList()) {
          // There can be several alternative transcripts for a given chunk of speech. Just use the
          // first (most likely) one here.
          SpeechRecognitionAlternative alternative = result.getAlternatives(0);
          System.out.format("Transcript : %s\n", alternative.getTranscript());
          System.out.format(
              "First Word and Confidence : %s %s \n",
              alternative.getWords(0).getWord(), alternative.getWords(0).getConfidence());
        }
      }
    }

Node.js

const fs = require('fs');

    // Imports the Google Cloud client library
    const speech = require('@google-cloud/speech').v1p1beta1;

    // Creates a client
    const client = new speech.SpeechClient();

    /**
     * TODO(developer): Uncomment the following lines before running the sample.
     */
    // const fileName = 'Local path to audio file, e.g. /path/to/audio.raw';

    const config = {
      encoding: 'FLAC',
      sampleRateHertz: 16000,
      languageCode: 'en-US',
      enableWordConfidence: true,
    };

    const audio = {
      content: fs.readFileSync(fileName).toString('base64'),
    };

    const request = {
      config: config,
      audio: audio,
    };

    const [response] = await client.recognize(request);
    const transcription = response.results
      .map(result => result.alternatives[0].transcript)
      .join('\n');
    const confidence = response.results
      .map(result => result.alternatives[0].confidence)
      .join('\n');
    console.log(`Transcription: ${transcription} \n Confidence: ${confidence}`);

    console.log('Word-Level-Confidence:');
    const words = response.results.map(result => result.alternatives[0]);
    words[0].words.forEach(a => {
      console.log(` word: ${a.word}, confidence: ${a.confidence}`);
    });

Python

from google.cloud import speech_v1p1beta1
    import io

    def sample_recognize(local_file_path):
        """
        Print confidence level for individual words in a transcription of a short audio
        file.

        Args:
          local_file_path Path to local audio file, e.g. /path/audio.wav
        """

        client = speech_v1p1beta1.SpeechClient()

        # local_file_path = 'resources/brooklyn_bridge.flac'

        # When enabled, the first result returned by the API will include a list
        # of words and the confidence level for each of those words.
        enable_word_confidence = True

        # The language of the supplied audio
        language_code = "en-US"
        config = {
            "enable_word_confidence": enable_word_confidence,
            "language_code": language_code,
        }
        with io.open(local_file_path, "rb") as f:
            content = f.read()
        audio = {"content": content}

        response = client.recognize(config, audio)
        # The first result includes confidence levels per word
        result = response.results[0]
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))
        # Print the confidence level of each word
        for word in alternative.words:
            print(u"Word: {}".format(word.word))
            print(u"Confidence: {}".format(word.confidence))

    

Remote-Datei verwenden

Java

/**
     * Transcribe a remote audio file with word level confidence
     *
     * @param gcsUri path to the remote audio file
     */
    public static void transcribeWordLevelConfidenceGcs(String gcsUri) throws Exception {
      try (SpeechClient speechClient = SpeechClient.create()) {

        // Configure request to enable word level confidence
        RecognitionConfig config =
            RecognitionConfig.newBuilder()
                .setEncoding(AudioEncoding.FLAC)
                .setSampleRateHertz(44100)
                .setLanguageCode("en-US")
                .setEnableWordConfidence(true)
                .build();

        // Set the remote path for the audio file
        RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

        // Use non-blocking call for getting file transcription
        OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
            speechClient.longRunningRecognizeAsync(config, audio);

        while (!response.isDone()) {
          System.out.println("Waiting for response...");
          Thread.sleep(10000);
        }
        // Just print the first result here.
        SpeechRecognitionResult result = response.get().getResultsList().get(0);

        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
        // Print out the result
        System.out.printf("Transcript : %s\n", alternative.getTranscript());
        System.out.format(
            "First Word and Confidence : %s %s \n",
            alternative.getWords(0).getWord(), alternative.getWords(0).getConfidence());
      }
    }

Node.js

// Imports the Google Cloud client library
    const speech = require('@google-cloud/speech').v1p1beta1;

    // Creates a client
    const client = new speech.SpeechClient();

    /**
     * TODO(developer): Uncomment the following line before running the sample.
     */
    // const uri = path to GCS audio file e.g. `gs:/bucket/audio.wav`;

    const config = {
      encoding: 'FLAC',
      sampleRateHertz: 16000,
      languageCode: 'en-US',
      enableWordConfidence: true,
    };

    const audio = {
      uri: gcsUri,
    };

    const request = {
      config: config,
      audio: audio,
    };

    const [response] = await client.recognize(request);
    const transcription = response.results
      .map(result => result.alternatives[0].transcript)
      .join('\n');
    const confidence = response.results
      .map(result => result.alternatives[0].confidence)
      .join('\n');
    console.log(`Transcription: ${transcription} \n Confidence: ${confidence}`);

    console.log('Word-Level-Confidence:');
    const words = response.results.map(result => result.alternatives[0]);
    words[0].words.forEach(a => {
      console.log(` word: ${a.word}, confidence: ${a.confidence}`);
    });