Class RecognitionMetadata (2.25.1)

RecognitionMetadata(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Description of audio data to be recognized.


The use case most closely describing the audio content to be recognized.
industry_naics_code_of_audio int
The industry vertical to which this speech recognition request most closely applies. This is most indicative of the topics contained in the audio. Use the 6-digit NAICS code to identify the industry vertical - see
The audio type that most closely describes the audio being recognized.
The original media the speech was recorded on.
The type of device the speech was recorded with.
recording_device_name str
The device used to make the recording. Examples 'Nexus 5X' or 'Polycom SoundStation IP 6000' or 'POTS' or 'VoIP' or 'Cardioid Microphone'.
original_mime_type str
Mime type of the original audio file. For example audio/m4a, audio/x-alaw-basic, audio/mp3, audio/3gpp. A list of possible audio mime types is maintained at
audio_topic str
Description of the content. Eg. "Recordings of federal supreme court hearings from 2012".




Use case categories that the audio recognition request can be described by.

Values: INTERACTION_TYPE_UNSPECIFIED (0): Use case is either unknown or is something other than one of the other values below. DISCUSSION (1): Multiple people in a conversation or discussion. For example in a meeting with two or more people actively participating. Typically all the primary people speaking would be in the same room (if not, see PHONE_CALL) PRESENTATION (2): One or more persons lecturing or presenting to others, mostly uninterrupted. PHONE_CALL (3): A phone-call or video-conference in which two or more people, who are not in the same room, are actively participating. VOICEMAIL (4): A recorded message intended for another person to listen to. PROFESSIONALLY_PRODUCED (5): Professionally produced audio (eg. TV Show, Podcast). VOICE_SEARCH (6): Transcribe spoken questions and queries into text. VOICE_COMMAND (7): Transcribe voice commands, such as for controlling a device. DICTATION (8): Transcribe speech to text to create a written document, such as a text-message, email or report.



Enumerates the types of capture settings describing an audio file.

Values: MICROPHONE_DISTANCE_UNSPECIFIED (0): Audio type is not known. NEARFIELD (1): The audio was captured from a closely placed microphone. Eg. phone, dictaphone, or handheld microphone. Generally if there speaker is within 1 meter of the microphone. MIDFIELD (2): The speaker if within 3 meters of the microphone. FARFIELD (3): The speaker is more than 3 meters away from the microphone.



The original media the speech was recorded on.

Values: ORIGINAL_MEDIA_TYPE_UNSPECIFIED (0): Unknown original media type. AUDIO (1): The speech data is an audio recording. VIDEO (2): The speech data originally recorded on a video.



The type of device the speech was recorded with.

Values: RECORDING_DEVICE_TYPE_UNSPECIFIED (0): The recording device is unknown. SMARTPHONE (1): Speech was recorded on a smartphone. PC (2): Speech was recorded using a personal computer or tablet. PHONE_LINE (3): Speech was recorded over a phone line. VEHICLE (4): Speech was recorded in a vehicle. OTHER_OUTDOOR_DEVICE (5): Speech was recorded outdoors. OTHER_INDOOR_DEVICE (6): Speech was recorded indoors.