Dynamic shared quota

This page explains dynamic shared quota (DSQ) and how DSQ is different from Provisioned Throughput.

Introduction to dynamic shared quota

Dynamic shared quota (DSQ) distributes available on-demand capacity among all queries being processed by Google Cloud services for specific models. This capability eliminates the need to set quota limits and to submit quota increase requests (QIRs).

DSQ processes requests from all customers to the same regional or multi-regional endpoints. Quotas are removed, and available capacity is distributed to each project.

Provisioned Throughput is the only way to ensure high availability for your application and to get predictable service levels for your production workloads. For more information about Provisioned Throughput, see Provisioned Throughput.

Supported models

This section lists models that support dynamic shared quota (DSQ), which is enabled by default in these models.

Google models

The following table lists the Google models (and versions) that support DSQ:

Model	DSQ release date	Status
Gemini 1.5 Flash (`gemini-1.5-flash-002`)	September 24, 2024	Live
Gemini 1.5 Pro (`gemini-1.5-pro-002`)	September 24, 2024	Live

DSQ quotas aren't listed in the Quotas & System Limits page in the Google Cloud console.

Troubleshoot DSQ errors

When there isn't enough capacity to serve your query, you might receive a 429 error. To troubleshoot errors that might occur, see Error code 429.

What's next

To learn more about Gemini models that support DSQ, see Gemini models.
To learn more about Generative AI quotas and limits, see Generative AI on Vertex AI rate limits.
To learn more about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
To learn more about Google Cloud quotas and limits, see Understand quota values and system limits.