Class StreamingRecognitionResult (0.8.0)

StreamingRecognitionResult(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Contains a speech recognition result corresponding to a portion of the audio that is currently being processed or an indication that this is the end of the single requested utterance.


  1. transcript: "tube"

  2. transcript: "to be a"

  3. transcript: "to be"

  4. transcript: "to be or not to be" is_final: true

  5. transcript: " that's"

  6. transcript: " that is"

  7. message_type: END_OF_SINGLE_UTTERANCE

  8. transcript: " that is the question" is_final: true

Only two of the responses contain final results (#4 and #8 indicated by is_final: true). Concatenating these generates the full transcript: "to be or not to be that is the question".

In each response we populate:

  • for TRANSCRIPT: transcript and possibly is_final.

  • for END_OF_SINGLE_UTTERANCE: only message_type.

Attributes: message_type ( Type of the result message. transcript (str): Transcript text representing the words that the user spoke. Populated if and only if message_type = TRANSCRIPT. is_final (bool): If false, the StreamingRecognitionResult represents an interim result that may change. If true, the recognizer will not return any further hypotheses about this piece of the audio. May only be populated for message_type = TRANSCRIPT. confidence (float): The Speech confidence between 0.0 and 1.0 for the current portion of audio. A higher number indicates an estimated greater likelihood that the recognized words are correct. The default of 0.0 is a sentinel value indicating that confidence was not set.

    This field is typically only provided if ``is_final`` is
    true and you should not rely on it being accurate or even
stability (float):
    An estimate of the likelihood that the speech recognizer
    will not change its guess about this interim recognition

    -  If the value is unspecified or 0.0, Dialogflow didn't
       compute the stability. In particular, Dialogflow will
       only provide stability for ``TRANSCRIPT`` results with
       ``is_final = false``.
    -  Otherwise, the value is in (0.0, 1.0] where 0.0 means
       completely unstable and 1.0 means completely stable.
speech_word_info (Sequence[]):
    Word-specific information for the words recognized by Speech
    Populated if and only if ``message_type`` = ``TRANSCRIPT``
    and [InputAudioConfig.enable_word_info] is set.
speech_end_offset (google.protobuf.duration_pb2.Duration):
    Time offset of the end of this Speech recognition result
    relative to the beginning of the audio. Only populated for
    ``message_type`` = ``TRANSCRIPT``.


builtins.object > proto.message.Message > StreamingRecognitionResult