This page describes how to transcribe audio files that include more than one channel using Speech-to-Text.
Often times, audio data include a channel for each speaker present on the recording. For audio of two people talking over the phone, as an example, the audio may contain two channels where each line is recorded separately.
To transcribe audio data that includes multiple channels, you
must provide the number of channels in your request to the
Speech-to-Text API. In your request, set the audioChannelCount
field
in your request to the number of channels present in your audio.
When you send a request with multiple channels, Speech-to-Text
returns a result to you that identifies the different channels
present in the audio, labeling the alternatives for each result with
the channelTag
field.
The following code sample demonstrates how to transcribe audio that contains multiple channels.
Protocol
Refer to the speech:recognize
API endpoint for complete details.
To perform synchronous speech recognition, make a POST
request and provide the
appropriate request body. The following shows an example of a POST
request using
curl
. The example uses the access token for a service account set up for the
project using the Google Cloud
Cloud SDK. For instructions on installing the Cloud SDK,
setting up a project with a service account, and obtaining an access token,
see the quickstart.
The following example show how to send a POST
request using curl
,
where the body of the request specifies the number of channels
present on the audio sample.
curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json; charset=utf-8" \ --data '{ "config": { "encoding": "LINEAR16", "languageCode": "en-US", "audioChannelCount": 2, "enableSeparateRecognitionPerChannel": true }, "audio": { "uri": "gs://cloud-samples-tests/speech/commercial_stereo.wav" } }' "https://speech.googleapis.com/v1/speech:recognize" > multi-channel.txt
If the request is successful, the server returns a 200 OK
HTTP
status code and the response in JSON format, saved to a file
named multi-channel.txt
.
{ "results": [ { "alternatives": [ { "transcript": "hi I'd like to buy a Chromecast I'm always wondering whether you could help me with that", "confidence": 0.8991147 } ], "channelTag": 1, "languageCode": "en-us" }, { "alternatives": [ { "transcript": "certainly which color would you like we have blue black and red", "confidence": 0.9408236 } ], "channelTag": 2, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " let's go with the black one", "confidence": 0.98783094 } ], "channelTag": 1, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " would you like the new Chromecast Ultra model or the regular Chromecast", "confidence": 0.9573053 } ], "channelTag": 2, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " regular Chromecast is fine thank you", "confidence": 0.9671048 } ], "channelTag": 1, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " okay sure would you like to ship it regular or Express", "confidence": 0.9544821 } ], "channelTag": 2, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " express please", "confidence": 0.9487205 } ], "channelTag": 1, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " terrific it's on the way thank you", "confidence": 0.97655964 } ], "channelTag": 2, "languageCode": "en-us" }, { "alternatives": [ { "transcript": " thank you very much bye", "confidence": 0.9735077 } ], "channelTag": 1, "languageCode": "en-us" } ] }
Go
Java
Node.js
Ruby
Python
C#