Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Stay organized with collections
Save and categorize content based on your preferences.
If the number of your requests exceeds the capacity allocated to process
requests, then error code 429 is returned. The following table displays the
error message generated by each type of quota framework:
Quota framework
Message
Pay-as-you-go
Resource exhausted, please try again later.
Provisioned Throughput
Too many requests. Exceeded the Provisioned Throughput.
With a Provisioned Throughput subscription, you can reserve an
amount of throughput for specific generative AI models. If you don't have a
Provisioned Throughput subscription and resources aren't available
to your application, then an error code 429 is returned. Although you don't
have reserved capacity, you can try your request again. However, the request
isn't counted against your error rate as described in your service level
agreement (SLA).
For projects that have purchased Provisioned Throughput,
Vertex AI measures a project's throughput and reserves the purchased
amount of throughput for the project's actual usage.
For standard Provisioned Throughput, when you use less than your
purchased amount, errors that might otherwise be 429 are returned as 5XX and
count toward the SLA error rate. For Single Zone Provisioned Throughput,
when you use less than your purchased amount, capacity-related 429 errors are
treated as 5XX but don't count toward the SLA error rate. When you exceed your
purchased amount, the additional requests are processed on-demand as pay-as-you-go.
Pay-as-you-go
On the pay-as-you-go quota framework, you have the following options to
resolving 429 errors:
Use the global endpoint
instead of a regional endpoint whenever possible.
If your model uses quotas, you can submit a Quota Increase Request (QIR). If
your model uses Dynamic shared
quota, smoothing traffic
and reducing large spikes can help. For more information, see Dynamic shared
quota (DSQ).
Subscribe to Provisioned Throughput for a more consistent level of service.
For more information, see
Provisioned Throughput.
Provisioned Throughput
To correct the 429 error generated by Provisioned Throughput, do the
following:
Use the Default behavior
example, which doesn't set a
header in prediction requests. Any overages are processed on-demand and billed
as pay-as-you-go.
Increase the number of GSUs in your Provisioned Throughput
subscription.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-26 UTC."],[],[],null,["If the number of your requests exceeds the capacity allocated to process\nrequests, then error code `429` is returned. The following table displays the\nerror message generated by each type of quota framework:\n\n| Quota framework | Message |\n|------------------------|-----------------------------------------------------------|\n| Pay-as-you-go | `Resource exhausted, please try again later.` |\n| Provisioned Throughput | `Too many requests. Exceeded the Provisioned Throughput.` |\n\nWith a Provisioned Throughput subscription, you can reserve an\namount of throughput for specific generative AI models. If you don't have a\nProvisioned Throughput subscription and resources aren't available\nto your application, then an error code `429` is returned. Although you don't\nhave reserved capacity, you can try your request again. However, the request\nisn't counted against your error rate as described in your [service level\nagreement (SLA)](/vertex-ai/generative-ai/sla).\n\nFor projects that have purchased Provisioned Throughput,\nVertex AI measures a project's throughput and reserves the purchased\namount of throughput for the project's actual usage.\n\nFor standard Provisioned Throughput, when you use less than your\npurchased amount, errors that might otherwise be `429` are returned as `5XX` and\ncount toward the SLA error rate. For Single Zone Provisioned Throughput,\nwhen you use less than your purchased amount, capacity-related `429` errors are\ntreated as `5XX` but don't count toward the SLA error rate. When you exceed your\npurchased amount, the additional requests are processed on-demand as pay-as-you-go.\n\nPay-as-you-go\n\nOn the pay-as-you-go quota framework, you have the following options to\nresolving `429` errors:\n\n- Use the [global endpoint](/vertex-ai/generative-ai/docs/learn/locations#global-endpoint) instead of a regional endpoint whenever possible.\n- Implement a retry strategy by using [truncated exponential backoff](/storage/docs/retry-strategy#exponential-backoff).\n- If your model uses quotas, you can submit a Quota Increase Request (QIR). If your model uses [Dynamic shared\n quota](/vertex-ai/generative-ai/docs/dynamic-shared-quota#supported_models), smoothing traffic and reducing large spikes can help. For more information, see [Dynamic shared\n quota (DSQ)](/vertex-ai/generative-ai/docs/dynamic-shared-quota).\n- Subscribe to Provisioned Throughput for a more consistent level of service. For more information, see [Provisioned Throughput](/vertex-ai/generative-ai/docs/provisioned-throughput).\n\nProvisioned Throughput\n\nTo correct the 429 error generated by Provisioned Throughput, do the\nfollowing:\n\n- Use the [Default behavior\n example](/vertex-ai/generative-ai/docs/use-provisioned-throughput#default), which doesn't set a header in prediction requests. Any overages are processed on-demand and billed as pay-as-you-go.\n- Increase the number of GSUs in your Provisioned Throughput subscription.\n\nWhat's next\n\n- To learn more about dynamic shared quota, see [Dynamic shared\n quota](/vertex-ai/generative-ai/docs/dsq).\n- To learn more about Provisioned Throughput, see [Provisioned Throughput](/vertex-ai/generative-ai/docs/provisioned-throughput).\n- To learn about quotas and limits for Vertex AI, see [Vertex AI quotas and limits](/vertex-ai/docs/quotas).\n- To learn more about Google Cloud quotas and system limits, see the [Cloud Quotas documentation](/docs/quotas/overview).\n- To learn more about API errors, see [API errors](/vertex-ai/generative-ai/docs/model-reference/api-errors)."]]