Quotas and limits

This document contains the current API restrictions and usage limits for Cloud Speech-to-Text. This page will be updated to reflect any changes to these restrictions and usage limits. We reserve the right to change these limits.

You can request a quota increase if necessary. See the Cloud quota page for more information on viewing and managing your quota.

After submitting your request, Google may contact you for more information, and inform you whether your request is approved or denied.

Content Limits

Content to Speech-to-Text is provided as audio data, either directly within the content field of the request or referenced within a Google Cloud Storage URI in the uri field of the request. There is a limit of 10 MB on all synchronous requests sent to the API. In the case of the StreamingRecognize method, each request is limited to send 15 KB of audio chunks. Exceeding this limit will throw an error.

The API contains the following limits on the size of this content (and are subject to change).

Content Limit Audio Length
Synchronous Requests ~1 Minute
Streaming Requests ~5 Minutes**

** If you need to stream content for more than 5 minutes, see the endless streaming tutorial.

For StreamingRecognize requests, audio must be sent at a rate that approximates real time.

Attempting to process content in excess of these content limits will produce an error. For more information, see Error messages and Troubleshooting.

Within any request, you may also supply PhraseSet and CustomClass resources. The following limits apply to such a context:

Speech Adaptation Limit Value
Maximum allowable phrase boost value 20
Phrases in a PhraseSet 1200
Maximum number of items in a CustomClass 500
Maximum character per CustomClass item 500
Maximum number of PhraseSets per SpeechAdaptation 20
Maximum number of CustomClasses per SpeechAdaptation 20

Request Limits

The current API usage limits for Speech-to-Text are as follows (and are subject to change):

Type of Limit Usage Limit
Resource requests per 60 seconds (per region) \ 100
Operation requests per 60 seconds (per region) \ 150
Synchronous recognition requests per 60 seconds (per region) \ 300
Streaming recognition requests per 60 seconds (per region) * 3000
Streaming recognition sessions per 60 seconds (per region) * 60

* The StreamingRecognize method has a quota limit of 60 concurrent sessions per minute. StreamingRecognize also has a limit of 3000 requests per minute which applies to all concurrent sessions together. The initial StreamingRecognize request for a session does not count against the request quota.

Requests and/or attempts at audio processing in excess of these limits will produce an error. For more information, see Error messages and Troubleshooting.

These limits apply to each Speech-to-Text developer project, and are shared across all applications and IP addresses using a given a developer project.