StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of one or more StreamingRecognizeResponse messages are streamed back to the client.

Here's an example of a series of ten StreamingRecognizeResponses that might be returned while processing audio:

  1. endpointerType: START_OF_SPEECH

  2. results { alternatives { transcript: "tube" } stability: 0.01 } resultIndex: 0

  3. results { alternatives { transcript: "to be a" } stability: 0.01 } resultIndex: 0

  4. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 } resultIndex: 0

  5. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } isFinal: true } resultIndex: 0

  6. results { alternatives { transcript: " that's" } stability: 0.01 } resultIndex: 1

  7. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 } resultIndex: 1

  8. endpointerType: END_OF_SPEECH

  9. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } isFinal: true } resultIndex: 1

  10. endpointerType: END_OF_AUDIO

Notes:

  • Only two of the above responses #5 and #9 contain final results, they are indicated by isFinal: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".

  • The others contain interim results. #4 and #7 contain two interim results, the first portion has a high stability and is less likely to change, the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.

  • The resultIndex indicates the portion of audio that has had final results returned, and is no longer being processed. For example, the results in #6 and later correspond to the portion of audio after "to be or not to be".

JSON representation
{
  "error": {
    object(Status)
  },
  "results": [
    {
      object(StreamingRecognitionResult)
    }
  ],
  "resultIndex": number,
  "endpointerType": enum(EndpointerType),
}
Fields
error

object(Status)

[Output-only] If set, returns a google.rpc.Status message that specifies the error for the operation.

results[]

object(StreamingRecognitionResult)

[Output-only] This repeated list contains zero or more results that correspond to consecutive portions of the audio currently being processed. It contains zero or one isFinal=true result (the newly settled portion), followed by zero or more isFinal=false results.

resultIndex

number

[Output-only] Indicates the lowest index in the results array that has changed. The repeated StreamingRecognitionResult results overwrite past results at this index and higher.

endpointerType

enum(EndpointerType)

[Output-only] Indicates the type of endpointer event.

Send feedback about...

Cloud Speech-to-Text API