Transcribe audio with multiple channels

Stay organized with collections Save and categorize content based on your preferences.

This page describes how to use Speech-to-Text to transcribe audio files that include more than one channel. Multi-channel recognition is available for most, but not all, audio encodings supported by Speech-to-Text. For information about how many channels are recognized in audio files of each encoding type, see audioChannelCount.

If you are using AutoDetectDecodingConfig, you do not have to specify how many audio channels the file has. It will be automatically determined. You must only specify audio channel count when using ExplicitDecodingConfig.

Audio data usually includes a channel for each speaker present on the recording. For example, audio of two people talking over the phone might contain two channels, where each line is recorded separately.

When you send a request with multiple channels, Speech-to-Text returns a result to you that identifies the different channels present in the audio, labeling the alternatives for each result with the channelTag field.