Speech-to-Text pricing

Contact sales Go to console

Speech-to-Text is priced based on the amount of audio successfully processed by the service each month, measured in increments of one second. If the API returns a response, the audio sent in the request was successfully processed. This includes an empty response, which indicates that the API processed the audio but could not transcribe it. Requests that result in an server error do not count as successfully processed and therefore don't incur any cost.

You can view your current billing status, including usage and your current bill, in the Google Cloud console. For more details about managing your account, see the Cloud Billing documentation or Cloud Billing support.

Speech-to-Text V2 API

The prices in the table below apply to minutes of audio processed per month for the Speech-to-Text v2 API.

Standard recognition models

Category	Model	0 minute to 500,000 minute	500,000 minute to 1,000,000 minute	1,000,000 minute to 2,000,000 minute	2,000,000 minute and above
Recognition (sku:3099-B70F-0949)	Standard	$0.016 / 1 minute, per 1 month / account	$0.01 / 1 minute, per 1 month / account	$0.008 / 1 minute, per 1 month / account	$0.004 / 1 minute, per 1 month / account
Recognition (Logged) (sku:4292-8666-5DBB)	Standard	$0.012 / 1 minute, per 1 month / account	$0.0075 / 1 minute, per 1 month / account	$0.006 / 1 minute, per 1 month / account	$0.003 / 1 minute, per 1 month / account

Category

Model

0 minute to 500,000 minute

500,000 minute to 1,000,000 minute

1,000,000 minute to 2,000,000 minute

2,000,000 minute and above

Recognition

(sku:3099-B70F-0949)

Standard

$0.016 / 1 minute, per 1 month / account

$0.01 / 1 minute, per 1 month / account

$0.008 / 1 minute, per 1 month / account

$0.004 / 1 minute, per 1 month / account

Recognition (Logged)

(sku:4292-8666-5DBB)

Standard

$0.012 / 1 minute, per 1 month / account

$0.0075 / 1 minute, per 1 month / account

$0.006 / 1 minute, per 1 month / account

$0.003 / 1 minute, per 1 month / account

Medical models

Category	Model	0 minute to 60 minute	60 minute and above
Medical Dictation (sku:6649-62EF-CB8F)	Medical²	$0 (Free) / 1 minute, per 1 month / account	$0.078 / 1 minute, per 1 month / account
Medical Conversation (sku:7247-19E1-FB4D)	Medical²	$0 (Free) / 1 minute, per 1 month / account	$0.078 / 1 minute, per 1 month / account

Category	Model	Per Minute
Dynamic Batch Recognition (sku:7700-6778-EF8E)	Standard¹	$0.003 / 1 minute, per 1 month / account
Dynamic Batch Recognition (Logged) (sku:1315-DEF9-28A6)	Standard¹	$0.00225 / 1 minute, per 1 month / account

Speech-to-Text V1 API

The prices in the table below apply to minutes of audio processed per month for the Speech-to-Text v1 API.

Category	Model	0 minute to 60 minute	60 minute and above
Speech Recognition (with data logging) sku:67F5-A183-E319	Standard¹	$0 (Free) / 1 minute, per 1 month / account	$0.016 / 1 minute, per 1 month / account
Speech Recognition (without data logging) sku:FD95-66F5-3F5F	Standard¹	$0 (Free) / 1 minute, per 1 month / account	$0.024 / 1 minute, per 1 month / account
Speech Recognition (without data logging) sku:6649-62EF-CB8F	Medical²	$0 (Free) / 1 minute, per 1 month / account	$0.078 / 1 minute, per 1 month / account

Category

Model

0 minute to 60 minute

60 minute and above

Speech Recognition (with data logging)

sku:67F5-A183-E319

Standard¹

$0 (Free) / 1 minute, per 1 month / account

$0.016 / 1 minute, per 1 month / account

Speech Recognition (without data logging)

sku:FD95-66F5-3F5F

Standard¹

$0 (Free) / 1 minute, per 1 month / account

$0.024 / 1 minute, per 1 month / account

Speech Recognition (without data logging)

sku:6649-62EF-CB8F

Medical²

$0 (Free) / 1 minute, per 1 month / account

$0.078 / 1 minute, per 1 month / account

Standard¹ models include: default, command_and_search, latest_short, latest_long, phone_call, video, chirp (Speech-to-Text V2 only)
Medical² models include: medical_conversation, medical_dictation
Each request is rounded up to the nearest increment of 1 seconds

Pricing factors

Speech-to-Text pricing is determined by the following factors:

The number of channels in the audio being recognized
The length and amount of audio you send
The recognition model you are using
The batch method you are using
The API version you are using

Multiple channels

Each audio channel is billed separately. If you send requests with multiple channels, you will be billed according to the sum total length of audio processed from all channels. This time accounting is different from how monthly usage limits are tracked. Usage limits don't take multiple channels into account and are determined only by the length of the audio file. For example, if you send a request with 30 seconds of audio and 4 channels, you will be billed for 120 seconds but only 30 seconds will count against your monthly quota. See the quotas and limits page for more details.

Dynamic batch

The Speech-to-Text V2 API has an option to use dynamic batch. Dynamic batch processes audio at a lower level of urgency. If you enable dynamic batch, you will be billed at a discounted rate.

Large workloads

For customers with very large workloads, additional volume discounts may be available. Please contact sales to learn more.

Google Cloud pricing

If you store audio files to be recognized in Google Cloud Storage, or use other Google Cloud resources in tandem with Speech-to-Text, such as Google App Engine instances, then you will also be billed for the use of those services. See Google Cloud's pricing calculator to determine other costs based on current rates.

What's next

Read the Speech-to-Text documentation
Get started with Speech-to-Text
Try the pricing calculator
Learn about Speech-to-Text solutions and use cases

Request a custom quote

With Google Cloud's pay-as-you-go pricing, you only pay for the services you use. Connect with our sales team to get a custom quote for your organization.

Contact sales Go to console