When the number of requests sent to a model exceeds the available processing capacity, Vertex AI returns a 429
error code, indicating that the resource is exhausted. The specific error message and resolution path depend on whether you are using the pay-as-you-go service or have purchased Provisioned Throughput.
Understanding the 429 Error
The following table compares how the 429
error is handled in the pay-as-you-go and Provisioned Throughput quota frameworks.
Feature | Pay-as-you-go | Provisioned Throughput |
---|---|---|
Error Message | Resource exhausted, please try again later. |
Too many requests. Exceeded the Provisioned Throughput. |
Cause | The number of requests exceeds the available capacity in the shared resource pool. | The number of requests exceeds your reserved throughput capacity. |
SLA Impact | Requests that receive a 429 error are not counted against your error rate as described in the service level agreement (SLA). |
Errors for usage below your purchased throughput are returned as 5XX and count against the SLA. Errors for usage above your purchased throughput are treated as pay-as-you-go and don't count against the SLA. |
With a Provisioned Throughput subscription, you reserve a specific amount of throughput for your models. If you don't have a subscription and shared resources are unavailable, you will receive a 429
error. Although you don't have reserved capacity, you can retry your request.
For projects with Provisioned Throughput, Vertex AI reserves the purchased throughput for your project's usage. When you use less than your purchased amount, errors that might otherwise be 429
are returned as 5XX
and count toward the SLA error rate. When you exceed your purchased amount, the additional requests are processed on-demand as pay-as-you-go.
How to Resolve 429 Errors
The steps to resolve a 429
error vary depending on your quota framework.
Pay-as-you-go
On the pay-as-you-go quota framework, you have the following options to resolve 429
errors:
- Use the global endpoint: Whenever possible, use the global endpoint instead of a regional endpoint.
- Implement a retry strategy: Use truncated exponential backoff to retry requests.
- Request a quota increase: If your model uses quotas, you can submit a Quota Increase Request (QIR).
- Smooth traffic: If your model uses Dynamic Shared Quota (DSQ), smoothing traffic and reducing large spikes can help. For more information, see Dynamic shared quota.
- Subscribe to Provisioned Throughput: For a more consistent level of service, subscribe to Provisioned Throughput. For more information, see Provisioned Throughput.
Provisioned Throughput
To resolve a 429
error when you have a Provisioned Throughput subscription, you can do the following:
- Allow on-demand processing: Use the default behavior by not setting a header in your prediction requests. Any overages are processed on-demand and billed as pay-as-you-go.
- Increase reserved capacity: Increase the number of GSUs in your Provisioned Throughput subscription.
What's next
- To learn more about dynamic shared quota, see Dynamic shared quota.
- To learn more about Provisioned Throughput, see Provisioned Throughput.
- To learn about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
- To learn more about Google Cloud quotas and limits, see Understand quota values and system limits.
- To learn more about API errors, see API errors.