The RecognitionConfig message provides information to the recognizer that specifies how to process the request.

JSON representation
  "encoding": enum(AudioEncoding),
  "sampleRate": number,
  "languageCode": string,
  "maxAlternatives": number,
  "profanityFilter": boolean,
  "speechContext": {


[Required] Encoding of audio data sent in all RecognitionAudio messages.



[Required] Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling).



[Optional] The language of the supplied audio as a BCP-47 language tag. Example: "en-GB" If omitted, defaults to "en-US". See Language Support for a list of the currently supported language codes.



[Optional] Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of SpeechRecognitionAlternative messages within each SpeechRecognitionResult. The server may return fewer than maxAlternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of 1. If omitted, defaults to 1.



[Optional] If set to true, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". If set to false or omitted, profanities won't be filtered out.



[Optional] A means to provide context to assist the speech recognition.


Audio encoding of the data sent in the audio message. All encodings support only 1 channel (mono) audio. Only FLAC includes a header that describes the bytes of audio that follow the header. The other encodings are raw audio bytes with no header.

For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). Recognition accuracy may be reduced if lossy codecs (such as AMR, AMR_WB and MULAW) are used to capture or transmit the audio, particularly if background noise is present.

ENCODING_UNSPECIFIED Not specified. Will return result google.rpc.Code.INVALID_ARGUMENT.
LINEAR16 Uncompressed 16-bit signed little-endian samples (Linear PCM). This is the only encoding that may be used by speech.asyncrecognize.

This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression; therefore recognition accuracy is not compromised by a lossy codec.

The stream FLAC (Free Lossless Audio Codec) encoding is specified at: 16-bit and 24-bit samples are supported. Not all fields in STREAMINFO are supported.

MULAW 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
AMR Adaptive Multi-Rate Narrowband codec. sampleRate must be 8000 Hz.
AMR_WB Adaptive Multi-Rate Wideband codec. sampleRate must be 16000 Hz.


Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

JSON representation
  "phrases": [


[Optional] A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits.

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Speech-to-Text
Need help? Visit our support page.