This page explains dynamic shared quota (DSQ) and how DSQ is different from Provisioned Throughput.
Introduction to dynamic shared quota
Dynamic shared quota (DSQ) distributes available on-demand capacity among all queries being processed by Google Cloud services for specific models. This capability eliminates the need to set quota limits and to submit quota increase requests (QIRs).
DSQ processes requests from all customers to the same regional or multi-regional endpoints. Quotas are removed, and available capacity is distributed to each project.
Provisioned Throughput is the only way to ensure high availability for your application and to get predictable service levels for your production workloads. For more information about Provisioned Throughput, see Provisioned Throughput.
Supported models
This section lists models that support dynamic shared quota (DSQ), which is enabled by default in these models.
Google models
The following table lists the Google models (and versions) that support DSQ:
Model | DSQ release date | Status |
---|---|---|
Gemini 1.5 Flash (gemini-1.5-flash-002 ) |
September 24, 2024 | Live |
Gemini 1.5 Pro (gemini-1.5-pro-002 ) |
September 24, 2024 | Live |
DSQ quotas aren't listed in the Quotas & System Limits page in the Google Cloud console.
Troubleshoot DSQ errors
When there isn't enough capacity to serve your query, you might receive a 429 error. To troubleshoot errors that might occur, see Error code 429.
What's next
- To learn more about Gemini models that support DSQ, see Gemini models.
- To learn more about Generative AI quotas and limits, see Generative AI on Vertex AI rate limits.
- To learn more about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
- To learn more about Google Cloud quotas and limits, see
Understand quota values and system limits.