[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-25。"],[],[],null,["Dynamic shared quota (DSQ) was introduced to serve your pay-as-you-go (PayGo)\nrequests with greater flexibility to adapt to your workload needs without having\nto manage quotas and quota increase requests (QIR). With DSQ, there are no predefined\nquota limits on your usage. Instead, DSQ provides access to a large, shared pool of\nresources, dynamically allocated based on real-time availability of resources and\nreal-time demand across all customers of that model. When more customers are active,\neach customer gets a lower amount of throughput. Similarly, if there are fewer customers,\neach customer might get higher throughput.\n\nSupported models\n\nThe following Gemini models and their [supervised fine-tuned](/vertex-ai/generative-ai/docs/models/gemini-use-supervised-tuning) models support DSQ:\n\n- [Gemini 2.5 Flash Image Preview](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash#image) (Preview)\n- [Gemini 2.5 Flash-Lite](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash-lite)\n- [Gemini 2.0 Flash with Live API](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash#live-api) (Preview)\n- [Gemini 2.0 Flash with image generation](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash) (Preview)\n- [Gemini 2.5 Pro](/vertex-ai/generative-ai/docs/models/gemini/2-5-pro)\n- [Gemini 2.5 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-5-flash)\n- [Gemini 2.0 Flash](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash)\n- [Gemini 2.0 Flash-Lite](/vertex-ai/generative-ai/docs/models/gemini/2-0-flash-lite)\n\nThe following legacy Gemini models support DSQ:\n\n- Gemini 1.5 Pro\n- Gemini 1.5 Flash\n\nThe following Imagen models support DSQ:\n\n- Imagen 4\n- Imagen 4 Fast\n- Imagen 4 Ultra\n\nHow DSQ works\n\nDynamic shared quota (DSQ) adapts to your traffic patterns and needs and\nminimizes usage frictions. Your project's access to resources under DSQ is not\ncapped by an arbitrary number we set. Instead, it's determined by the overall\ncapacity of the shared pool and the current collective demand from all customers.\nThis model is designed to offer significant flexibility, allowing your workloads\nto burst and consume more resources when available. Conversely, it also allows\nall customers of the shared pool to have a chance to access resources when\navailable without requiring to configure per customer quota.\n\nTo ensure a fair and stable experience for all users in the shared resource\nenvironment, Dynamic Shared Quota intelligently manages how requests are handled,\nespecially during periods of very high demand from isolated sources. Rather than\na fixed cap, DSQ employs a dynamic prioritization approach. This means that while\nthe system is designed to accommodate bursts, unusually large and rapid spikes in\ntraffic from a single source may be handled with a different priority than more\nconsistent, steady traffic. This sophisticated management ensures that broad user\nactivity and regular workloads are protected from transient, extreme spikes,\npromoting overall system stability and equitable access.\n\nGemini requests with multi-modal inputs are subject to the\ncorresponding system rate limits that include\n[image](/vertex-ai/generative-ai/docs/multimodal/image-understanding#supported_models),\n[audio](/vertex-ai/generative-ai/docs/multimodal/audio-understanding#supported_models),\n[video](/vertex-ai/generative-ai/docs/multimodal/video-understanding#supported_models), and\n[document](/vertex-ai/generative-ai/docs/multimodal/document-understanding#supported_models).\n\nTo help ensure high availability for your application and to get predictable\nservice levels for your production workloads, see\n[Provisioned Throughput](/vertex-ai/generative-ai/docs/provisioned-throughput).\n\nUnderstanding Resource Exhaustion 429 errors under DSQ\n\nWe understand that encountering a 'resource exhausted' 429 error can be\nfrustrating and might lead you to suspect you are hitting some sort of quota\nlimit. However, with DSQ, this is not the case. These errors indicate that the\noverall shared pool of resources for that specific type (e.g., a particular\nmodel in a specific region) at a specific time is experiencing extremely high\ndemand from many users simultaneously. Think of it like trying to get on a very\npopular train during peak rush hour. There isn't a 'ticket limit' specifically\nfor you, but the train itself might be momentarily full. It's a temporary state\nof contention for resources, not a fixed limit imposed on your project.\n\nDSQ is constantly working to manage and distribute the available capacity fairly\nand efficiently. When you receive such an error, it means instantaneous demand\nhas outstripped the available supply in that shared pool. Unlike a hard\nquota where you'd be blocked even if resources were idle elsewhere, DSQ aims to\ngive you access whenever resources are free. The exhaustion error is a reflection\nof the entire system's current load, not a ceiling on your account.\n\nWe recommend implementing retry mechanisms, as availability in this dynamic\nenvironment can change quickly. For more tactics of handling Resource Exhaustion\nerrors, see [A guide to handling 429 errors](/blog/products/ai-machine-learning/learn-how-to-handle-429-resource-exhaustion-errors-in-your-llms)\nor [Error code 429](/vertex-ai/generative-ai/docs/provisioned-throughput/error-code-429).\n\nWhat's next\n\n- To learn about quotas and limits for Vertex AI, see [Vertex AI quotas and limits](/vertex-ai/docs/quotas).\n- To learn more about Google Cloud quotas and system limits, see the [Cloud Quotas documentation](/docs/quotas/overview)."]]