This document contains the current API restrictions and usage limits for Speech-to-Text. This page will be updated to reflect any changes to these restrictions and usage limits. We reserve the right to change these limits.
You can request a quota increase if necessary. See the Google Cloud quota page for more information on viewing and managing your quota.
After submitting your request, Google might contact you for more information, and inform you whether your request is approved or denied.
Content limits
Synchronous requests
Synchronous recognition requests (using the Recognize
method) accept audio data either inline in the
content
field of the request or as a
Cloud Storage URI in the uri
field
of the request. Audio sent to a synchronous request is limited to 10 MB or 1
minute of audio duration (whichever is reached first). For more information on
synchronous recognition, see the
synchronous recognition overview.
Streaming requests
Streaming recognition requests (using the
StreamingRecognize
method) only
accept inline audio in the audio
field of the
request. Each request in the stream is limited to 25 KB of audio. A stream can
remain open for up to 5 minutes, and the audio must be sent at a rate that
approximates real time. If you need to stream content for longer than 5 minutes,
see the
endless streaming tutorial. For more information on
streaming recognition, see the
streaming recognition overview.
Batch requests
Batch recognition requests (using the BatchRecognize
method) only accept audio as a Cloud Storage URI
in the uri
field of the request. Each
BatchRecognizeRequest
can contain up to 15
files
to transcribe. Each file can be up to 8 hours
in duration. For more information on synchronous recognition, see the
batch recognition overview.
Multiple language recognition
Multiple language recognition is only available in the global, US, and EU Speech-to-Text endpoints.
Adaptation
Within any request, you may also supply PhraseSet and CustomClass resources. The following limits apply to these resources:
Speech Adaptation Limit | Value |
---|---|
Maximum allowable phrase boost value | 20 |
Phrases in a PhraseSet | 1,200 |
Phrases per request | 5,000 |
Characters per phrase | 100 |
Total characters per request | 100,000 |
Maximum number of items in a CustomClass | 500 |
Maximum characters per CustomClass item | 500 |
Maximum number of PhraseSets per SpeechAdaptation | 20 |
Maximum number of CustomClasses per SpeechAdaptation | 20 |
Resource limits
The current API resource limits for Speech-to-Text are as follows (and are subject to change):
Type of Limit | Usage Limit |
---|---|
Number of recognizers (per region) | 5,000 |
Number of custom classes (per region) | 5,000 |
Number of phrase sets (per region) | 5,000 |
Request limits
The current API usage limits for Speech-to-Text are as follows (and are subject to change):
Type of Limit | Usage Limit |
---|---|
Resource requests per 60 seconds (per region) | 100 |
Operation requests per 60 seconds (per region) | 150 |
Synchronous recognition requests per 60 seconds (per region) | 300 |
Streaming recognition requests per 60 seconds (per region) * | 3,000 |
Streaming recognition sessions per 5 minutes (per region) * | 300 |
Batch recognition requests per 60 seconds (per region) | 150 |
* Streaming recognition has a quota limit of 300 concurrent sessions per 5 minutes and a limit of 3,000 requests per minute, which applies to all concurrent sessions together. The initial configuration request for a session does not count against the request quota.
These limits apply to each Speech-to-Text developer project, and are shared across all applications and IP addresses using a given a developer project.