Speech-to-Text is priced based on the amount of audio successfully processed by the service each month, measured in increments rounded up to 15 seconds. If the API returns a response, the audio sent in the request was successfully processed. This includes an empty response, which indicates that the API processed the audio but could not transcribe it. Requests that result in an error do not count as successfully processed and therefore don't incur any cost.
The pricing table below applies to applications on personal systems (for example, phones, tablets, laptops, desktops). Please contact us for approval and pricing to use the Speech-to-Text API on embedded devices (for example, cars, TVs, appliances, or speakers).
You can view your current billing status, including usage and your current bill, in the console. For more details about managing your account, see the Cloud billing documentation or billing and payments support.
The prices in the table below apply to minutes of audio processed per month.
(all models except enhanced video and phone call)
(video, phone call)
|0-60 Minutes||Over 60 Mins up to 1 Million Mins||0-60 Minutes||Over 60 Mins up to 1 Million Mins|
|Speech Recognition (without Data Logging - default)||Free||$0.006 / 15 seconds **||Free||$0.009 / 15 seconds **|
|Speech Recognition (with Data Logging opt-in)||Free||$0.004 / 15 seconds **||Free||$0.006 / 15 seconds **|
** Each request is rounded up to the nearest increment of 15 seconds.
Speech-to-Text pricing is determined by the following factors:
- Whether recognition is performed using a standard or enhanced model.
- Whether you have opted in to data logging.
- The number of channels in the audio being recognized.
Speech-to-Text offers multiple machine learning models that can be used for speech recognition. Two of these models (the enhanced phone call and video models) provide improved recognition performance tailored for their respective uses and can produce higher quality results when used correctly. See the supported languages page to see if enhanced models are available for your language.
By opting in to data logging, you can allow Google to record audio data sent to Speech-to-Text. This data helps Google improve the machine learning models used for speech transcription. Customers who opt in to data logging benefit from lower Speech-to-Text pricing.
Each audio channel is billed separately. If you send requests with multiple channels, you will be billed according to the sum total length of audio processed from all channels. This time accounting is different from how monthly usage limits are tracked. Usage limits don't take multiple channels into account and are determined only by the length of the audio file. For example, if you send a request with 30 seconds of audio and 4 channels, you will be billed for 120 seconds but only 30 seconds will count against your monthly quota. See the quotas & limits page for more details.
Each request is rounded up to the nearest increment of 15 seconds. For example, if you make three separate requests, each containing 7 seconds of audio, you are billed $0.018 USD for 45 seconds (3 × 15 seconds) of audio. Fractions of seconds are included when rounding up to the nearest increment of 15 seconds. That is, 15.14 seconds are rounded up and billed as 30 seconds.
Monthly usage is capped at 1 million minutes per month. For usage above 1 million minutes of audio per month, we would like to understand more about your needs. Please submit a Speech-to-Text quota request for your project.
Google Cloud Platform costs
If you store audio files to be recognized in Google Cloud Storage, or use other Google Cloud Platform resources in tandem with Speech-to-Text, such as Google App Engine instances, then you will also be billed for the use of those services. See the Google Cloud Platform pricing calculator to determine other costs based on current rates.
- Read the Speech-to-Text documentation.
- Get started with Speech-to-Text.
- Try the Pricing calculator.
- Learn about Speech-to-Text solutions and use cases.