This page describes how to use Speech-to-Text to transcribe audio files
that include more than one channel. Multi-channel recognition is available for
most, but not all, audio encodings supported by Speech-to-Text. For
information about how many channels are recognized in audio files of each
encoding type, see
audioChannelCount
.
If you are using AutoDetectDecodingConfig
,
you do not have to specify how many audio channels the file has. It will be
automatically determined. You must only specify audio channel count when using ExplicitDecodingConfig
.
Audio data usually includes a channel for each speaker present on the recording. For example, audio of two people talking over the phone might contain two channels, where each line is recorded separately.
When you send a request with multiple channels, Speech-to-Text
returns a result to you that identifies the different channels
present in the audio, labeling the alternatives for each result with
the channelTag
field.