REST Resource: projects.locations.recognizers

Resource: Recognizer
- JSON representation
RecognitionConfig
- JSON representation
AutoDetectDecodingConfig
ExplicitDecodingConfig
- JSON representation
AudioEncoding
RecognitionFeatures
- JSON representation
MultiChannelMode
SpeakerDiarizationConfig
- JSON representation
SpeechAdaptation
- JSON representation
AdaptationPhraseSet
- JSON representation
TranscriptNormalization
- JSON representation
Entry
- JSON representation
TranslationConfig
- JSON representation
DenoiserConfig
- JSON representation
State
Methods

Resource: Recognizer

A Recognizer message. Stores recognition configuration and metadata.

JSON representation

JSON representation
{ "name": string, "uid": string, "displayName": string, "model": string, "languageCodes": [ string ], "defaultRecognitionConfig": { object (`RecognitionConfig`) }, "annotations": { string: string, ... }, "state": enum (`State`), "createTime": string, "updateTime": string, "deleteTime": string, "expireTime": string, "etag": string, "reconciling": boolean, "kmsKeyName": string, "kmsKeyVersionName": string }

{
  "name": string,
  "uid": string,
  "displayName": string,
  "model": string,
  "languageCodes": [
    string
  ],
  "defaultRecognitionConfig": {
    object (RecognitionConfig)
  },
  "annotations": {
    string: string,
    ...
  },
  "state": enum (State),
  "createTime": string,
  "updateTime": string,
  "deleteTime": string,
  "expireTime": string,
  "etag": string,
  "reconciling": boolean,
  "kmsKeyName": string,
  "kmsKeyVersionName": string
}

Fields
`name`	`string` Output only. Identifier. The resource name of the Recognizer. Format: `projects/{project}/locations/{location}/recognizers/{recognizer}`.
`uid`	`string` Output only. System-assigned unique identifier for the Recognizer.
`displayName`	`string` User-settable, human-readable name for the Recognizer. Must be 63 characters or less.
`model (deprecated)`	`string` This item is deprecated! Optional. This field is now deprecated. Prefer the `model` field in the `RecognitionConfig` message. Which model to use for recognition requests. Select the model best suited to your domain to get best results. Guidance for choosing which model to use can be found in the Transcription Models Documentation and the models supported in each region can be found in the Table Of Supported Models.
`languageCodes[] (deprecated)`	`string` This item is deprecated! Optional. This field is now deprecated. Prefer the `languageCodes` field in the `RecognitionConfig` message. The language of the supplied audio as a BCP-47 language tag. Supported languages for each model are listed in the Table of Supported Models. If additional languages are provided, recognition result will contain recognition in the most likely language detected. The recognition result will include the language tag of the language detected in the audio. When you create or update a Recognizer, these values are stored in normalized BCP-47 form. For example, "en-us" is stored as "en-US".
`defaultRecognitionConfig`	`object (RecognitionConfig)` Default configuration to use for requests with this Recognizer. This can be overwritten by inline configuration in the `RecognizeRequest.config` field.
`annotations`	`map (key: string, value: string)` Allows users to store small amounts of arbitrary data. Both the key and the value must be 63 characters or less each. At most 100 annotations. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.
`state`	`enum (State)` Output only. The Recognizer lifecycle state.
`createTime`	`string (Timestamp format)` Output only. Creation time. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`updateTime`	`string (Timestamp format)` Output only. The most recent time this Recognizer was modified. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`deleteTime`	`string (Timestamp format)` Output only. The time at which this Recognizer was requested for deletion. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`expireTime`	`string (Timestamp format)` Output only. The time at which this Recognizer will be purged. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: `"2014-10-02T15:01:23Z"`, `"2014-10-02T15:01:23.045123456Z"` or `"2014-10-02T15:01:23+05:30"`.
`etag`	`string` Output only. This checksum is computed by the server based on the value of other fields. This may be sent on update, undelete, and delete requests to ensure the client has an up-to-date value before proceeding.
`reconciling`	`boolean` Output only. Whether or not this Recognizer is in the process of being updated.
`kmsKeyName`	`string` Output only. The KMS key name with which the Recognizer is encrypted. The expected format is `projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}`.
`kmsKeyVersionName`	`string` Output only. The KMS key version name with which the Recognizer is encrypted. The expected format is `projects/{project}/locations/{location}/keyRings/{key_ring}/cryptoKeys/{crypto_key}/cryptoKeyVersions/{crypto_key_version}`.

RecognitionConfig

Provides information to the Recognizer that specifies how to process the recognition request.

JSON representation

JSON representation
{ "model": string, "languageCodes": [ string ], "features": { object (`RecognitionFeatures`) }, "adaptation": { object (`SpeechAdaptation`) }, "transcriptNormalization": { object (`TranscriptNormalization`) }, "translationConfig": { object (`TranslationConfig`) }, "denoiserConfig": { object (`DenoiserConfig`) }, // Union field `decoding_config` can be only one of the following: "autoDecodingConfig": { object (`AutoDetectDecodingConfig`) }, "explicitDecodingConfig": { object (`ExplicitDecodingConfig`) } // End of list of possible types for union field `decoding_config`. }

{
  "model": string,
  "languageCodes": [
    string
  ],
  "features": {
    object (RecognitionFeatures)
  },
  "adaptation": {
    object (SpeechAdaptation)
  },
  "transcriptNormalization": {
    object (TranscriptNormalization)
  },
  "translationConfig": {
    object (TranslationConfig)
  },
  "denoiserConfig": {
    object (DenoiserConfig)
  },

  // Union field decoding_config can be only one of the following:
  "autoDecodingConfig": {
    object (AutoDetectDecodingConfig)
  },
  "explicitDecodingConfig": {
    object (ExplicitDecodingConfig)
  }
  // End of list of possible types for union field decoding_config.
}

Fields
`model`	`string` Optional. Which model to use for recognition requests. Select the model best suited to your domain to get best results. Guidance for choosing which model to use can be found in the Transcription Models Documentation and the models supported in each region can be found in the Table Of Supported Models.
`languageCodes[]`	`string` Optional. The language of the supplied audio as a BCP-47 language tag. Language tags are normalized to BCP-47 before they are used eg "en-us" becomes "en-US". Supported languages for each model are listed in the Table of Supported Models. If additional languages are provided, recognition result will contain recognition in the most likely language detected. The recognition result will include the language tag of the language detected in the audio.
`features`	`object (RecognitionFeatures)` Speech recognition features to enable.
`adaptation`	`object (SpeechAdaptation)` Speech adaptation context that weights recognizer predictions for specific words and phrases.
`transcriptNormalization`	`object (TranscriptNormalization)` Optional. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.
`translationConfig`	`object (TranslationConfig)` Optional. Optional configuration used to automatically run translation on the given audio to the desired language for supported models.
`denoiserConfig`	`object (DenoiserConfig)` Optional. Optional denoiser config. May not be supported for all models and may have no effect.
Union field `decoding_config`. Decoding parameters for audio being sent for recognition. `decoding_config` can be only one of the following:
`autoDecodingConfig`	`object (AutoDetectDecodingConfig)` Automatically detect decoding parameters. Preferred for supported formats.
`explicitDecodingConfig`	`object (ExplicitDecodingConfig)` Explicitly specified decoding parameters. Required if using headerless PCM audio (linear16, mulaw, alaw).

AutoDetectDecodingConfig

This type has no fields.

Automatically detected decoding parameters. Supported for the following encodings:

WAV_LINEAR16: 16-bit signed little-endian PCM samples in a WAV container.
WAV_MULAW: 8-bit companded mulaw samples in a WAV container.
WAV_ALAW: 8-bit companded alaw samples in a WAV container.
RFC4867_5_AMR: AMR frames with an rfc4867.5 header.
RFC4867_5_AMRWB: AMR-WB frames with an rfc4867.5 header.
FLAC: FLAC frames in the "native FLAC" container format.
MP3: MPEG audio frames with optional (ignored) ID3 metadata.
OGG_OPUS: Opus audio frames in an Ogg container.
WEBM_OPUS: Opus audio frames in a WebM container.
MP4_AAC: AAC audio frames in an MP4 container.
M4A_AAC: AAC audio frames in an M4A container.
MOV_AAC: AAC audio frames in an MOV container.

ExplicitDecodingConfig

Explicitly specified decoding parameters.

JSON representation
{ "encoding": enum (`AudioEncoding`), "sampleRateHertz": integer, "audioChannelCount": integer }

Fields

Fields
`encoding`	`enum (AudioEncoding)` Required. Encoding of the audio data sent for recognition.
`sampleRateHertz`	`integer` Optional. Sample rate in Hertz of the audio data sent for recognition. Valid values are: 8000-48000, and 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of resampling). Note that this field is marked as OPTIONAL for backward compatibility reasons. It is (and has always been) effectively REQUIRED.
`audioChannelCount`	`integer` Optional. Number of channels present in the audio data sent for recognition. Note that this field is marked as OPTIONAL for backward compatibility reasons. It is (and has always been) effectively REQUIRED. The maximum allowed value is 8.

encoding

enum (AudioEncoding)

Required. Encoding of the audio data sent for recognition.

sampleRateHertz

integer

Optional. Sample rate in Hertz of the audio data sent for recognition. Valid values are: 8000-48000, and 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of resampling). Note that this field is marked as OPTIONAL for backward compatibility reasons. It is (and has always been) effectively REQUIRED.

audioChannelCount

integer

Optional. Number of channels present in the audio data sent for recognition. Note that this field is marked as OPTIONAL for backward compatibility reasons. It is (and has always been) effectively REQUIRED.

The maximum allowed value is 8.

AudioEncoding

Supported audio data encodings.

Enums
`AUDIO_ENCODING_UNSPECIFIED`	Default value. This value is unused.
`LINEAR16`	Headerless 16-bit signed little-endian PCM samples.
`MULAW`	Headerless 8-bit companded mulaw samples.
`ALAW`	Headerless 8-bit companded alaw samples.
`AMR`	AMR frames with an rfc4867.5 header.
`AMR_WB`	AMR-WB frames with an rfc4867.5 header.
`FLAC`	FLAC frames in the "native FLAC" container format.
`MP3`	MPEG audio frames with optional (ignored) ID3 metadata.
`OGG_OPUS`	Opus audio frames in an Ogg container.
`WEBM_OPUS`	Opus audio frames in a WebM container.
`MP4_AAC`	AAC audio frames in an MP4 container.
`M4A_AAC`	AAC audio frames in an M4A container.
`MOV_AAC`	AAC audio frames in an MOV container.

RecognitionFeatures

Available recognition features.

JSON representation

JSON representation
{ "profanityFilter": boolean, "enableWordTimeOffsets": boolean, "enableWordConfidence": boolean, "enableAutomaticPunctuation": boolean, "enableSpokenPunctuation": boolean, "enableSpokenEmojis": boolean, "multiChannelMode": enum (`MultiChannelMode`), "diarizationConfig": { object (`SpeakerDiarizationConfig`) }, "maxAlternatives": integer }

{
  "profanityFilter": boolean,
  "enableWordTimeOffsets": boolean,
  "enableWordConfidence": boolean,
  "enableAutomaticPunctuation": boolean,
  "enableSpokenPunctuation": boolean,
  "enableSpokenEmojis": boolean,
  "multiChannelMode": enum (MultiChannelMode),
  "diarizationConfig": {
    object (SpeakerDiarizationConfig)
  },
  "maxAlternatives": integer
}

Fields
`profanityFilter`	`boolean` If set to `true`, the server will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, for instance, "f***". If set to `false` or omitted, profanities won't be filtered out.
`enableWordTimeOffsets`	`boolean` If `true`, the top result includes a list of words and the start and end time offsets (timestamps) for those words. If `false`, no word-level time offset information is returned. The default is `false`.
`enableWordConfidence`	`boolean` If `true`, the top result includes a list of words and the confidence for those words. If `false`, no word-level confidence information is returned. The default is `false`.
`enableAutomaticPunctuation`	`boolean` If `true`, adds punctuation to recognition result hypotheses. This feature is only available in select languages. The default `false` value does not add punctuation to result hypotheses.
`enableSpokenPunctuation`	`boolean` The spoken punctuation behavior for the call. If `true`, replaces spoken punctuation with the corresponding symbols in the request. For example, "how are you question mark" becomes "how are you?". See https://cloud.google.com/speech-to-text/docs/spoken-punctuation for support. If `false`, spoken punctuation is not replaced.
`enableSpokenEmojis`	`boolean` The spoken emoji behavior for the call. If `true`, adds spoken emoji formatting for the request. This will replace spoken emojis with the corresponding Unicode symbols in the final transcript. If `false`, spoken emojis are not replaced.
`multiChannelMode`	`enum (MultiChannelMode)` Mode for recognizing multi-channel audio.
`diarizationConfig`	`object (SpeakerDiarizationConfig)` Configuration to enable speaker diarization. To enable diarization, set this field to an empty SpeakerDiarizationConfig message.
`maxAlternatives`	`integer` Maximum number of recognition hypotheses to be returned. The server may return fewer than `maxAlternatives`. Valid values are `0`-`30`. A value of `0` or `1` will return a maximum of one. If omitted, will return a maximum of one.

MultiChannelMode

Options for how to recognize multi-channel audio.

Enums
`MULTI_CHANNEL_MODE_UNSPECIFIED`	Default value for the multi-channel mode. If the audio contains multiple channels, only the first channel will be transcribed; other channels will be ignored.
`SEPARATE_RECOGNITION_PER_CHANNEL`	If selected, each channel in the provided audio is transcribed independently. This cannot be selected if the selected `model` is `latest_short`.

SpeakerDiarizationConfig

Configuration to enable speaker diarization.

JSON representation
{ "minSpeakerCount": integer, "maxSpeakerCount": integer }

Fields

Fields
`minSpeakerCount`	`integer` Optional. The system automatically determines the number of speakers. This value is not currently used.
`maxSpeakerCount`	`integer` Optional. The system automatically determines the number of speakers. This value is not currently used.

minSpeakerCount

integer

Optional. The system automatically determines the number of speakers. This value is not currently used.

maxSpeakerCount

integer

Optional. The system automatically determines the number of speakers. This value is not currently used.

SpeechAdaptation

Provides "hints" to the speech recognizer to favor specific words and phrases in the results. PhraseSets can be specified as an inline resource, or a reference to an existing PhraseSet resource.

JSON representation
{ "phraseSets": [ { object (`AdaptationPhraseSet`) } ], "customClasses": [ { object (`CustomClass`) } ] }

Fields

Fields
`phraseSets[]`	`object (AdaptationPhraseSet)` A list of inline or referenced PhraseSets.
`customClasses[]`	`object (CustomClass)` A list of inline CustomClasses. Existing CustomClass resources can be referenced directly in a PhraseSet.

phraseSets[]

object (AdaptationPhraseSet)

A list of inline or referenced PhraseSets.

customClasses[]

object (CustomClass)

A list of inline CustomClasses. Existing CustomClass resources can be referenced directly in a PhraseSet.

AdaptationPhraseSet

A biasing PhraseSet, which can be either a string referencing the name of an existing PhraseSets resource, or an inline definition of a PhraseSet.

JSON representation
{ // Union field `value` can be only one of the following: "phraseSet": string, "inlinePhraseSet": { object (`PhraseSet`) } // End of list of possible types for union field `value`. }

Fields

Fields
Union field `value`. `value` can be only one of the following:
`phraseSet`	`string` The name of an existing PhraseSet resource. The user must have read access to the resource and it must not be deleted.
`inlinePhraseSet`	`object (PhraseSet)` An inline defined PhraseSet.

Union field value.

value can be only one of the following:

phraseSet

string

The name of an existing PhraseSet resource. The user must have read access to the resource and it must not be deleted.

inlinePhraseSet

object (PhraseSet)

An inline defined PhraseSet.

TranscriptNormalization

Transcription normalization configuration. Use transcription normalization to automatically replace parts of the transcript with phrases of your choosing. For StreamingRecognize, this normalization only applies to stable partial transcripts (stability > 0.8) and final transcripts.

JSON representation
{ "entries": [ { object (`Entry`) } ] }

Fields

Fields
`entries[]`	`object (Entry)` A list of replacement entries. We will perform replacement with one entry at a time. For example, the second entry in ["cat" => "dog", "mountain cat" => "mountain dog"] will never be applied because we will always process the first entry before it. At most 100 entries.

entries[]

object (Entry)

A list of replacement entries. We will perform replacement with one entry at a time. For example, the second entry in ["cat" => "dog", "mountain cat" => "mountain dog"] will never be applied because we will always process the first entry before it. At most 100 entries.

Entry

A single replacement configuration.

JSON representation
{ "search": string, "replace": string, "caseSensitive": boolean }

Fields

Fields
`search`	`string` What to replace. Max length is 100 characters.
`replace`	`string` What to replace with. Max length is 100 characters.
`caseSensitive`	`boolean` Whether the search is case sensitive.

search

string

What to replace. Max length is 100 characters.

replace

string

What to replace with. Max length is 100 characters.

caseSensitive

boolean

Whether the search is case sensitive.

TranslationConfig

Translation configuration. Use to translate the given audio into text for the desired language.

JSON representation
{ "targetLanguage": string }

Fields

Fields
`targetLanguage`	`string` Required. The language code to translate to.

targetLanguage

string

Required. The language code to translate to.

DenoiserConfig

Denoiser config. May not be supported for all models and may have no effect.

JSON representation
{ "denoiseAudio": boolean, "snrThreshold": number }

Fields

Fields
`denoiseAudio`	`boolean` Denoise audio before sending to the transcription model.
`snrThreshold`	`number` Signal-to-Noise Ratio (SNR) threshold for the denoiser. Here SNR means the loudness of the speech signal. Audio with an SNR below this threshold, meaning the speech is too quiet, will be prevented from being sent to the transcription model. If snrThreshold=0, no filtering will be applied.

denoiseAudio

boolean

Denoise audio before sending to the transcription model.

snrThreshold

number

Signal-to-Noise Ratio (SNR) threshold for the denoiser. Here SNR means the loudness of the speech signal. Audio with an SNR below this threshold, meaning the speech is too quiet, will be prevented from being sent to the transcription model.

If snrThreshold=0, no filtering will be applied.

State

Set of states that define the lifecycle of a Recognizer.

Enums
`STATE_UNSPECIFIED`	The default value. This value is used if the state is omitted.
`ACTIVE`	The Recognizer is active and ready for use.
`DELETED`	This Recognizer has been deleted.

Methods
`batchRecognize`	Performs batch asynchronous speech recognition: send a request with N audio files and receive a long running operation that can be polled to see when the transcriptions are finished.
`create`	Creates a `Recognizer`.
`delete`	Deletes the `Recognizer`.
`get`	Returns the requested `Recognizer`.
`list`	Lists Recognizers.
`patch`	Updates the `Recognizer`.
`recognize`	Performs synchronous Speech recognition: receive results after all audio has been sent and processed.
`undelete`	Undeletes the `Recognizer`.

REST Resource: projects.locations.recognizers

Resource: Recognizer

RecognitionConfig

AutoDetectDecodingConfig

ExplicitDecodingConfig

AudioEncoding

RecognitionFeatures

MultiChannelMode

SpeakerDiarizationConfig

SpeechAdaptation

AdaptationPhraseSet

TranscriptNormalization

Entry

TranslationConfig

DenoiserConfig

State

Methods

`batchRecognize`

`create`

`delete`

`get`

`list`

`patch`

`recognize`

`undelete`