Classes
Adaptation
Service that implements Google Cloud Speech Adaptation API.
Adaptation.AdaptationBase
Base class for server-side implementations of Adaptation
Adaptation.AdaptationClient
Client for Adaptation
AdaptationClient
Adaptation client wrapper, for convenient use.
AdaptationClientBuilder
Builder class for AdaptationClient to provide simple configuration of credentials, endpoint etc.
AdaptationClientImpl
Adaptation client wrapper implementation, for convenient use.
AdaptationSettings
Settings for AdaptationClient instances.
CreateCustomClassRequest
Message sent by the client for the CreateCustomClass
method.
CreatePhraseSetRequest
Message sent by the client for the CreatePhraseSet
method.
CustomClass
A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.
CustomClass.Types
Container for nested types declared in the CustomClass message type.
CustomClass.Types.ClassItem
An item of the class.
CustomClassName
Resource name for the CustomClass
resource.
DeleteCustomClassRequest
Message sent by the client for the DeleteCustomClass
method.
DeletePhraseSetRequest
Message sent by the client for the DeletePhraseSet
method.
GetCustomClassRequest
Message sent by the client for the GetCustomClass
method.
GetPhraseSetRequest
Message sent by the client for the GetPhraseSet
method.
LanguageCodes
A helper class forming a hierarchy of supported language codes, via nested classes. All language codes are eventually represented as string constants. This is simply a code-convenient form of the table at https://cloud.google.com/speech/docs/languages. It is regenerated regularly, but not guaranteed to be complete at any moment in time; if the language you wish to use is present in the table but not covered here, please use the listed language code as a hard-coded string until this class catches up.
LanguageCodes.Afrikaans
Language codes for Afrikaans.
LanguageCodes.Amharic
Language codes for Amharic.
LanguageCodes.Arabic
Language codes for Arabic.
LanguageCodes.Armenian
Language codes for Armenian.
LanguageCodes.Azerbaijani
Language codes for Azerbaijani.
LanguageCodes.Basque
Language codes for Basque.
LanguageCodes.Bengali
Language codes for Bengali.
LanguageCodes.Bulgarian
Language codes for Bulgarian.
LanguageCodes.Catalan
Language codes for Catalan.
LanguageCodes.ChineseCantonese
Language codes for Chinese, Cantonese.
LanguageCodes.ChineseMandarin
Language codes for Chinese, Mandarin.
LanguageCodes.Croatian
Language codes for Croatian.
LanguageCodes.Czech
Language codes for Czech.
LanguageCodes.Danish
Language codes for Danish.
LanguageCodes.Dutch
Language codes for Dutch.
LanguageCodes.English
Language codes for English.
LanguageCodes.Filipino
Language codes for Filipino.
LanguageCodes.Finnish
Language codes for Finnish.
LanguageCodes.French
Language codes for French.
LanguageCodes.Galician
Language codes for Galician.
LanguageCodes.Georgian
Language codes for Georgian.
LanguageCodes.German
Language codes for German.
LanguageCodes.Greek
Language codes for Greek.
LanguageCodes.Gujarati
Language codes for Gujarati.
LanguageCodes.Hebrew
Language codes for Hebrew.
LanguageCodes.Hindi
Language codes for Hindi.
LanguageCodes.Hungarian
Language codes for Hungarian.
LanguageCodes.Icelandic
Language codes for Icelandic.
LanguageCodes.Indonesian
Language codes for Indonesian.
LanguageCodes.Italian
Language codes for Italian.
LanguageCodes.Japanese
Language codes for Japanese.
LanguageCodes.Javanese
Language codes for Javanese.
LanguageCodes.Kannada
Language codes for Kannada.
LanguageCodes.Khmer
Language codes for Khmer.
LanguageCodes.Korean
Language codes for Korean.
LanguageCodes.Lao
Language codes for Lao.
LanguageCodes.Latvian
Language codes for Latvian.
LanguageCodes.Lithuanian
Language codes for Lithuanian.
LanguageCodes.Malay
Language codes for Malay.
LanguageCodes.Malayalam
Language codes for Malayalam.
LanguageCodes.Marathi
Language codes for Marathi.
LanguageCodes.Nepali
Language codes for Nepali.
LanguageCodes.NorwegianBokmal
Language codes for Norwegian Bokmål.
LanguageCodes.Persian
Language codes for Persian.
LanguageCodes.Polish
Language codes for Polish.
LanguageCodes.Portuguese
Language codes for Portuguese.
LanguageCodes.Romanian
Language codes for Romanian.
LanguageCodes.Russian
Language codes for Russian.
LanguageCodes.Serbian
Language codes for Serbian.
LanguageCodes.Sinhala
Language codes for Sinhala.
LanguageCodes.Slovak
Language codes for Slovak.
LanguageCodes.Slovenian
Language codes for Slovenian.
LanguageCodes.Spanish
Language codes for Spanish.
LanguageCodes.Sundanese
Language codes for Sundanese.
LanguageCodes.Swahili
Language codes for Swahili.
LanguageCodes.Swedish
Language codes for Swedish.
LanguageCodes.Tamil
Language codes for Tamil.
LanguageCodes.Telugu
Language codes for Telugu.
LanguageCodes.Thai
Language codes for Thai.
LanguageCodes.Turkish
Language codes for Turkish.
LanguageCodes.Ukrainian
Language codes for Ukrainian.
LanguageCodes.Urdu
Language codes for Urdu.
LanguageCodes.Vietnamese
Language codes for Vietnamese.
LanguageCodes.Zulu
Language codes for Zulu.
ListCustomClassesRequest
Message sent by the client for the ListCustomClasses
method.
ListCustomClassesResponse
Message returned to the client by the ListCustomClasses
method.
ListPhraseSetRequest
Message sent by the client for the ListPhraseSet
method.
ListPhraseSetResponse
Message returned to the client by the ListPhraseSet
method.
LongRunningRecognizeMetadata
Describes the progress of a long-running LongRunningRecognize
call. It is
included in the metadata
field of the Operation
returned by the
GetOperation
call of the google::longrunning::Operations
service.
LongRunningRecognizeRequest
The top-level message sent by the client for the LongRunningRecognize
method.
LongRunningRecognizeResponse
The only message returned to the client by the LongRunningRecognize
method.
It contains the result as zero or more sequential SpeechRecognitionResult
messages. It is included in the result.response
field of the Operation
returned by the GetOperation
call of the google::longrunning::Operations
service.
PhraseSet
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
PhraseSet.Types
Container for nested types declared in the PhraseSet message type.
PhraseSet.Types.Phrase
A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits.
List items can also include pre-built or custom classes containing groups
of words that represent common concepts that occur in natural language. For
example, rather than providing a phrase hint for every month of the
year (e.g. "i was born in january", "i was born in febuary", ...), use the
pre-built $MONTH
class improves the likelihood of correctly transcribing
audio that includes months (e.g. "i was born in $month").
To refer to pre-built classes, use the class' symbol prepended with $
e.g. $MONTH
. To refer to custom classes that were defined inline in the
request, set the class's custom_class_id
to a string unique to all class
resources and inline classes. Then use the class' id wrapped in ${...}
e.g. "${my-months}". To refer to custom classes resources, use the class'
id wrapped in ${}
(e.g. ${my-months}
).
Speech-to-Text supports three locations: global
, us
(US North America),
and eu
(Europe). If you are calling the speech.googleapis.com
endpoint, use the global
location. To specify a region, use a
regional endpoint
with matching us
or eu
location value.
PhraseSetName
Resource name for the PhraseSet
resource.
RecognitionAudio
Contains audio data in the encoding specified in the RecognitionConfig
.
Either content
or uri
must be supplied. Supplying both or neither
returns [google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT].
See content limits.
RecognitionConfig
Provides information to the recognizer that specifies how to process the request.
RecognitionConfig.Types
Container for nested types declared in the RecognitionConfig message type.
RecognitionMetadata
Description of audio data to be recognized.
RecognitionMetadata.Types
Container for nested types declared in the RecognitionMetadata message type.
RecognizeRequest
The top-level message sent by the client for the Recognize
method.
RecognizeResponse
The only message returned to the client by the Recognize
method. It
contains the result as zero or more sequential SpeechRecognitionResult
messages.
SpeakerDiarizationConfig
Config to enable speaker diarization.
Speech
Service that implements Google Cloud Speech API.
Speech.SpeechBase
Base class for server-side implementations of Speech
Speech.SpeechClient
Client for Speech
SpeechAdaptation
Speech adaptation configuration.
SpeechAdaptation.Types
Container for nested types declared in the SpeechAdaptation message type.
SpeechAdaptation.Types.ABNFGrammar
SpeechAdaptationInfo
Information on speech adaptation use in results
SpeechClient
Speech client wrapper, for convenient use.
SpeechClient.StreamingRecognizeStream
Bidirectional streaming methods for StreamingRecognize(CallSettings, BidirectionalStreamingSettings).
SpeechClientBuilder
Builder class for SpeechClient to provide simple configuration of credentials, endpoint etc.
SpeechClientImpl
Speech client wrapper implementation, for convenient use.
SpeechContext
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list).
SpeechRecognitionResult
A speech recognition result corresponding to a portion of the audio.
SpeechSettings
Settings for SpeechClient instances.
StreamingRecognitionConfig
Provides information to the recognizer that specifies how to process the request.
StreamingRecognitionConfig.Types
Container for nested types declared in the StreamingRecognitionConfig message type.
StreamingRecognitionConfig.Types.VoiceActivityTimeout
Events that a timeout can be set on for voice activity.
StreamingRecognitionResult
A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.
StreamingRecognizeRequest
The top-level message sent by the client for the StreamingRecognize
method.
Multiple StreamingRecognizeRequest
messages are sent. The first message
must contain a streaming_config
message and must not contain
audio_content
. All subsequent messages must contain audio_content
and
must not contain a streaming_config
message.
StreamingRecognizeResponse
StreamingRecognizeResponse
is the only message returned to the client by
StreamingRecognize
. A series of zero or more StreamingRecognizeResponse
messages are streamed back to the client. If there is no recognizable
audio, and single_utterance
is set to false, then no messages are streamed
back to the client.
Here's an example of a series of StreamingRecognizeResponse
s that might be
returned while processing audio:
results { alternatives { transcript: "tube" } stability: 0.01 }
results { alternatives { transcript: "to be a" } stability: 0.01 }
results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
results { alternatives { transcript: " that's" } stability: 0.01 }
results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true }
Notes:
Only two of the above responses #4 and #7 contain final results; they are indicated by
is_final: true
. Concatenating these together generates the full transcript: "to be or not to be that is the question".The others contain interim
results
. #3 and #6 contain two interimresults
: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stabilityresults
.The specific
stability
andconfidence
values shown above are only for illustrative purposes. Actual values may vary.In each response, only one of these fields will be set:
error
,speech_event_type
, or one or more (repeated)results
.
StreamingRecognizeResponse.Types
Container for nested types declared in the StreamingRecognizeResponse message type.
TranscriptNormalization
Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.
TranscriptNormalization.Types
Container for nested types declared in the TranscriptNormalization message type.
TranscriptNormalization.Types.Entry
A single replacement configuration.
TranscriptOutputConfig
Specifies an optional destination for the recognition results.
UpdateCustomClassRequest
Message sent by the client for the UpdateCustomClass
method.
UpdatePhraseSetRequest
Message sent by the client for the UpdatePhraseSet
method.
WordInfo
Word-specific information for recognized words.
Enums
CustomClassName.ResourceNameType
The possible contents of CustomClassName.
PhraseSetName.ResourceNameType
The possible contents of PhraseSetName.
RecognitionAudio.AudioSourceOneofCase
Enum of possible cases for the "audio_source" oneof.
RecognitionConfig.Types.AudioEncoding
The encoding of the audio data sent in the request.
All encodings support only 1 channel (mono) audio, unless the
audio_channel_count
and enable_separate_recognition_per_channel
fields
are set.
For best results, the audio source should be captured and transmitted using
a lossless encoding (FLAC
or LINEAR16
). The accuracy of the speech
recognition can be reduced if lossy codecs are used to capture or transmit
audio, particularly if background noise is present. Lossy codecs include
MULAW
, AMR
, AMR_WB
, OGG_OPUS
, SPEEX_WITH_HEADER_BYTE
, MP3
,
and WEBM_OPUS
.
The FLAC
and WAV
audio file formats include a header that describes the
included audio content. You can request recognition for WAV
files that
contain either LINEAR16
or MULAW
encoded audio.
If you send FLAC
or WAV
audio file format in
your request, you do not need to specify an AudioEncoding
; the audio
encoding format is determined from the file header. If you specify
an AudioEncoding
when you send send FLAC
or WAV
audio, the
encoding configuration must match the encoding described in the audio
header; otherwise the request returns an
[google.rpc.Code.INVALID_ARGUMENT][google.rpc.Code.INVALID_ARGUMENT] error
code.
RecognitionMetadata.Types.InteractionType
Use case categories that the audio recognition request can be described by.
RecognitionMetadata.Types.MicrophoneDistance
Enumerates the types of capture settings describing an audio file.
RecognitionMetadata.Types.OriginalMediaType
The original media the speech was recorded on.
RecognitionMetadata.Types.RecordingDeviceType
The type of device the speech was recorded with.
StreamingRecognizeRequest.StreamingRequestOneofCase
Enum of possible cases for the "streaming_request" oneof.
StreamingRecognizeResponse.Types.SpeechEventType
Indicates the type of speech event.
TranscriptOutputConfig.OutputTypeOneofCase
Enum of possible cases for the "output_type" oneof.