Quotas and limits

This document contains the current API restrictions and usage limits for Speech-to-Text. This page will be updated to reflect any changes to these restrictions and usage limits. We reserve the right to change these limits.

You can request a quota increase if necessary. See the Google Cloud quota page for more information on viewing and managing your quota.

After submitting your request, Google might contact you for more information, and inform you whether your request is approved or denied.

Content limits

Synchronous requests

Synchronous recognition requests (using the Recognize method) accept audio data either inline in the content field of the request or as a Cloud Storage URI in the uri field of the request. Audio sent to a synchronous request is limited to 10 MB or 1 minute of audio duration (whichever is reached first). For more information on synchronous recognition, see the synchronous recognition overview.

Streaming requests

Streaming recognition requests (using the StreamingRecognize method) only accept inline audio in the audio field of the request. Each request in the stream is limited to 25 KB of audio. A stream can remain open for up to 5 minutes, and the audio must be sent at a rate that approximates real time. If you need to stream content for longer than 5 minutes, see the endless streaming tutorial. For more information on streaming recognition, see the streaming recognition overview.

Batch requests

Batch recognition requests (using the BatchRecognize method) only accept audio as a Cloud Storage URI in the uri field of the request. Each BatchRecognizeRequest can contain up to 15 files to transcribe. Each file can be up to 8 hours in duration. For more information on synchronous recognition, see the batch recognition overview.

Multiple language recognition

Multiple language recognition is only available in the global, US, and EU Speech-to-Text endpoints.

Adaptation

Within any request, you may also supply PhraseSet and CustomClass resources. The following limits apply to these resources:

Speech Adaptation Limit Value
Maximum allowable phrase boost value 20
Phrases in a PhraseSet 1,200
Phrases per request 5,000
Characters per phrase 100
Total characters per request 100,000
Maximum number of items in a CustomClass 500
Maximum characters per CustomClass item 500
Maximum number of PhraseSets per SpeechAdaptation 20
Maximum number of CustomClasses per SpeechAdaptation 20

Resource limits

The current API resource limits for Speech-to-Text are as follows (and are subject to change):

Type of Limit Usage Limit
Number of recognizers (per region) 5,000
Number of custom classes (per region) 5,000
Number of phrase sets (per region) 5,000

Request limits

The current API usage limits for Speech-to-Text are as follows (and are subject to change):

Type of Limit Usage Limit
Resource requests per 60 seconds (per region) 100
Operation requests per 60 seconds (per region) 150
Synchronous recognition requests per 60 seconds (per region) 300
Streaming recognition requests per 60 seconds (per region) * 3,000
Streaming recognition sessions per 5 minutes (per region) * 300
Batch recognition requests per 60 seconds (per region) 150

* Streaming recognition has a quota limit of 300 concurrent sessions per 5 minutes and a limit of 3,000 requests per minute, which applies to all concurrent sessions together. The initial configuration request for a session does not count against the request quota.

These limits apply to each Speech-to-Text developer project, and are shared across all applications and IP addresses using a given a developer project.