Cloud Speech-to-Text is priced monthly based on the amount of audio successfully processed by the service, measured in increments rounded up to 15 seconds.

To view your current billing status in the Cloud Console, including usage and your current bill, see the Billing page. For more details about managing your account, see the Cloud Billing documentation or Billing and payments support.

Pricing Table

Feature Standard models
(all models except video and enhanced phone)
Premium models*
(video, enhanced phone)
0-60 Minutes Over 60 Mins up to 1 Million Mins 0-60 Minutes Over 60 Mins up to 1 Million Mins
Speech Recognition (without Data Logging - default) Free $0.006 / 15 seconds ** Free $0.009 / 15 seconds **
Speech Recognition (with Data Logging opt-in) Free $0.004 / 15 seconds ** Free $0.006 / 15 seconds **

* Premium models are currently available in US English only.

** Each request is rounded up to the nearest increment of 15 seconds.

There are several factors that affect the cost of using Cloud Speech-to-Text:

  • The type of recognition model that you use, either standard or premium
  • Whether you opt in for data logging or not
  • The number of channels from your source audio

Cloud Speech-to-Text has multiple different types of machine learning models that it can use for speech recognition. Two of these models—the enhanced phone call and video models—provide improved recognition performance. Each of these models is tailored for a specific use case and produces higher quality results when used correctly.

With data logging, customers can allow Google to record audio data sent to Cloud Speech-to-Text. This data helps Google to improve the machine learning models used for speech transcription. Customers who opt in to data logging benefit from lower Cloud Speech-to-Text pricing.

This pricing is for applications on personal systems (e.g., phones, tablets, laptops, desktops). Please contact us for approval and pricing to use the Speech-to-Text API on embedded devices (e.g., cars, TVs, appliances, or speakers).

Each request is rounded up to the nearest increment of 15 seconds. For example, if you make three separate requests, each containing 7 seconds of audio, you are billed $0.018 USD for 45 seconds (3 × 15 seconds) of audio. Fractions of seconds are included when rounding up to the nearest increment of 15 seconds. That is, 15.14 seconds are rounded up and billed as 30 seconds.

Monthly usage is capped at 1 million minutes per month. For usage above 1 million minutes of audio per month, we would like to understand more about your needs. Please submit a Cloud Speech-to-Text Quota Request for your project.

Each audio channel is billed separately. If you set the audioChannelCount to a value greater than 1, as you would if you are requesting separate recognition for each channel, then each channel is billed individually and the request is billed based on the cumulative amount from each channel.

Google Cloud Platform Costs

If you store audio files to be recognized in Google Cloud Storage, or use other Google Cloud Platform resources in tandem with Cloud Speech-to-Text, such as Google App Engine instances, then you will also be billed for the use of those services. See the Google Cloud Platform Pricing Calculator to determine other costs based on current rates.

