Cloud Speech-to-Text On-Device

Microphone placement

This section lists the minimum specifications for the microphone and the audio system in a Speech implementation. The microphone determines performance of the Speech library.

1.1. Number and placement

Must have at least one microphone facing the user.
(Recommended) Two microphones with a center-to-center spacing (66mm and 71mm) on a flat surface facing the user.
Place far from sources of sound to minimize audio coupling between the microphone and speakers.

1.2. Audio preprocessing

Must provide the speech application access to the raw microphone signal.
Don't perform time-variant or non-linear processing on the audio.
(Recommended) Don't perform beamforming, other microphone-combining techniques, or other audio preprocessing on the audio signal provided to the speech application.

1.3. Sample Rate

implementations must use a microphone sampling rate of 16 KHz.

1.4. Input Performance

The microphone:

Must be able to capture sound at 94 dB SPL without saturation.
Must be capable of capturing typical sounds in the linear region of the microphone's sensitivity.
Must have Acoustic Overload Points (AOP) that are at least 10 dB louder than speaker-generated input to the microphone between 125 Hz and 8 KHz. In Google testing, digital microphones with AOP greater than 130 dB have performed well.
Must have Total Harmonic Distortion (THD) of 1% or less for a 94 dB SPL signal between 100 Hz and 8 KHz.
Frequency response of microphones must be flat, +/- 3dB, measured in 1/1 octaves from 125 Hz to 8 KHz. Microphones that might achieve a flat response MAY be corrected through filters, which must be linear and time invariant.

1.5. Input Signal

The input signal to the speech application from the microphone must meet the following requirements:

Must configure microphone sensitivity parameter accurately based on calibrated readings of the microphone.
- Record frequency response of the DUT mic when playing the signal from calibrated reference speaker.
- Input signal to the microphones is 94 dB SPL.
- At 16KHz, 24-32 bits depth, input signal to the speech application: within +/- 3dB of an RMS of microphone's sensitivity.

1.6. Bit Depth

The bit depth of microphone signal to speech must be no less than 16 bits.