Learn about troubleshooting steps that you might find helpful if you run into problems using Speech-to-Text.
Cannot authenticate to Speech-to-Text
You might receive an error message indicating that your "Application Default Credentials" are unavailable or you might be wondering how to get an API key to use when calling Speech-to-Text.
Speech-to-Text uses Application Default Credentials for authentication.
You must have a service account for your
project, download the key (JSON file) for your service account to
your development environment, and then set the location of that
JSON file to an environment variable named
environment variable must be available within the
context that you call the Speech-to-Text API. For example, if you set
the variable from within an terminal session but run your code in the
debugger of your IDE, the execution context of your code might not
have access to the variable. In that circumtance, your request to
Speech-to-Text might fail for lack of proper
Speech-to-Text returns an empty response
If a transcript is not returned (e.g. you receive an empty
response) and no errors have occurred, it's likely that the audio is not
using the proper encoding.
Play the file and listen to the output. Is the audio clear and the speech intelligible?
To play files, you can use the SoX (Sound eXchange)
playcommand. A few examples based on different audio encodings are shown below.
FLAC files include a header that indicates the sample rate, encoding type and number of channels, and can be played as follows:
LINEAR16 files do not include a header. To play them you must specify the sample rate, encoding type and number of channels. The LINEAR16 encoding must be 16-bits, signed-integer, little-endian.
play --channels=1 --bits=16 --rate=16000 --encoding=signed-integer \ --endian=little audio.raw
MULAW files also do not include a header and often use a lower sample rate.
play --channels=1 --rate=8000 --encoding=u-law audio.raw
Speech-to-Text service currently supports only one audio channel.
Check that the audio encoding of your data matches the parameters you sent in
RecognitionConfig. For example, if your request specified
"sampleRateHertz":16000, the audio data parameters listed by the SoX
playcommand should match these parameters, as follows:
Encoding: FLAC Channels: 1 @ 16-bit Sampleratehertz: 16000Hz
If the SoX listing shows a
16000Hz, change the
InitialRecognizeRequestto match. If the
1 @ 16-bit, you cannot use this file directly, and will need to convert it to a compatible encoding (see next step).
If your audio file is not in FLAC encoding, try converting it to FLAC using SoX, and repeat the steps above to play the file and verify the encoding, sampleRateHertz, and channels. Here are some examples that convert various audio file-formats to FLAC encoding.
sox audio.wav --channels=1 --bits=16 audio.flac sox audio.ogg --channels=1 --bits=16 audio.flac sox audio.au --channels=1 --bits=16 audio.flac sox audio.aiff --channels=1 --bits=16 audio.flac
To convert a raw file to FLAC, you need to know the audio-encoding of the file. For example, to convert stereo 16-bit signed little-endian at 16000Hz to FLAC:
sox --channels=2 --bits=16 --rate=16000 --encoding=signed-integer \ --endian=little audio.raw --channels=1 --bits=16 audio.flac
Unexpected results from speech recognition
If the results returned by Speech-to-Text are not what you expected: