RecognitionConfig

The RecognitionConfig message provides information to the recognizer that specifies how to process the request.

JSON representation
{
  "encoding": enum(AudioEncoding),
  "sampleRate": number,
  "languageCode": string,
  "maxAlternatives": number,
  "profanityFilter": boolean,
  "speechContext": {
    object(SpeechContext)
  },
}
Fields
encoding

enum(AudioEncoding)

[Required] Encoding of audio data sent in all RecognitionAudio messages.

sampleRate

number

[Required] Sample rate in Hertz of the audio data sent in all RecognitionAudio messages. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling).

languageCode

string

[Optional] The language of the supplied audio as a BCP-47 language tag. Example: "en-GB" https://www.rfc-editor.org/rfc/bcp/bcp47.txt If omitted, defaults to "en-US". See Language Support for a list of the currently supported language codes.

maxAlternatives

number

[Optional] Maximum number of recognition hypotheses to be returned. Specifically, the maximum number of SpeechRecognitionAlternative messages within each SpeechRecognitionResult. The server may return fewer than maxAlternatives. Valid values are 0-30. A value of 0 or 1 will return a maximum of 1. If omitted, defaults to 1.

profanityFilter

boolean

[Optional] If set to true, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". If set to false or omitted, profanities won't be filtered out.

speechContext

object(SpeechContext)

[Optional] A means to provide context to assist the speech recognition.

AudioEncoding

Audio encoding of the data sent in the audio message. All encodings support only 1 channel (mono) audio. Only FLAC includes a header that describes the bytes of audio that follow the header. The other encodings are raw audio bytes with no header.

For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). Recognition accuracy may be reduced if lossy codecs (such as AMR, AMR_WB and MULAW) are used to capture or transmit the audio, particularly if background noise is present.

Enums
ENCODING_UNSPECIFIED Not specified. Will return result google.rpc.Code.INVALID_ARGUMENT.
LINEAR16 Uncompressed 16-bit signed little-endian samples (Linear PCM). This is the only encoding that may be used by speech.asyncrecognize.
FLAC

This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression; therefore recognition accuracy is not compromised by a lossy codec.

The stream FLAC (Free Lossless Audio Codec) encoding is specified at: http://flac.sourceforge.net/documentation.html. 16-bit and 24-bit samples are supported. Not all fields in STREAMINFO are supported.

MULAW 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
AMR Adaptive Multi-Rate Narrowband codec. sampleRate must be 8000 Hz.
AMR_WB Adaptive Multi-Rate Wideband codec. sampleRate must be 16000 Hz.

SpeechContext

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

JSON representation
{
  "phrases": [
    string
  ],
}
Fields
phrases[]

string

[Optional] A list of strings containing words and phrases "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Speech API