This page describes how to use Speech-to-Text to transcribe audio files
that include more than one channel. Multi-channel recognition is available for
most, but not all, audio encodings supported by Speech-to-Text. For
information about how many channels are recognized in audio files of each
encoding type, see
audioChannelCount
.
Audio data usually includes a channel for each speaker present on the recording. For example, audio of two people talking over the phone might contain two channels, where each line is recorded separately.
To transcribe audio data that includes multiple channels, you must provide the
number of channels in your request to the Speech-to-Text API. In your request, set the
audioChannelCount
field in your request to the number of channels present in
your audio.
When you send a request with multiple channels, Speech-to-Text
returns a result to you that identifies the different channels
present in the audio, labeling the alternatives for each result with
the channelTag
field.
The following code sample demonstrates how to transcribe audio that contains multiple channels.
Protocol
Refer to the speech:recognize
API endpoint for complete details.
To perform synchronous speech recognition, make a POST
request and provide the
appropriate request body. The following shows an example of a POST
request using
curl
. The example uses the Google Cloud CLI to generate an access
token. For instructions on installing the gcloud CLI,
see the quickstart.
The following example show how to send a POST
request using curl
,
where the body of the request specifies the number of channels
present on the audio sample.
curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data '{ "config": { "encoding": "LINEAR16", "languageCode": "en-US", "audioChannelCount": 2, "enableSeparateRecognitionPerChannel": true }, "audio": { "uri": "gs://cloud-samples-tests/speech/commercial_stereo.wav" } }' "https://speech.googleapis.com/v1/speech:recognize" > multi-channel.txt
If the request is successful, the server returns a 200 OK
HTTP
status code and the response in JSON format, saved to a file
named multi-channel.json
.
{ "results": [ { "alternatives": [ { "transcript": "hi I'd like to buy a Chromecast I'm always wondering whether you could help me with that", "confidence": 0.8991147 } ], "channelTag": 1, "languageCode": "en-us" }, { "alternatives": [ { "transcript": "certainly which color would you like we have blue black and red", "confidence": 0.9408236 } ], "channelTag": 2, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " let's go with the black one", "confidence": 0.98783094 } ], "channelTag": 1, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " would you like the new Chromecast Ultra model or the regular Chromecast", "confidence": 0.9573053 } ], "channelTag": 2, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " regular Chromecast is fine thank you", "confidence": 0.9671048 } ], "channelTag": 1, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " okay sure would you like to ship it regular or Express", "confidence": 0.9544821 } ], "channelTag": 2, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " express please", "confidence": 0.9487205 } ], "channelTag": 1, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " terrific it's on the way thank you", "confidence": 0.97655964 } ], "channelTag": 2, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " thank you very much bye", "confidence": 0.9735077 } ], "channelTag": 1, "languageCode": "en-us" } ] }
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Go API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Java API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Node.js API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries. For more information, see the Speech-to-Text Python API reference documentation.
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.