Error code 429

When the number of requests sent to a model exceeds the available processing capacity, Vertex AI returns a 429 error code, indicating that the resource is exhausted. The specific error message and resolution path depend on whether you are using the pay-as-you-go service or have purchased Provisioned Throughput.

Understanding the 429 Error

The following table compares how the 429 error is handled in the pay-as-you-go and Provisioned Throughput quota frameworks.

Feature Pay-as-you-go Provisioned Throughput
Error Message Resource exhausted, please try again later. Too many requests. Exceeded the Provisioned Throughput.
Cause The number of requests exceeds the available capacity in the shared resource pool. The number of requests exceeds your reserved throughput capacity.
SLA Impact Requests that receive a 429 error are not counted against your error rate as described in the service level agreement (SLA). Errors for usage below your purchased throughput are returned as 5XX and count against the SLA. Errors for usage above your purchased throughput are treated as pay-as-you-go and don't count against the SLA.

With a Provisioned Throughput subscription, you reserve a specific amount of throughput for your models. If you don't have a subscription and shared resources are unavailable, you will receive a 429 error. Although you don't have reserved capacity, you can retry your request.

For projects with Provisioned Throughput, Vertex AI reserves the purchased throughput for your project's usage. When you use less than your purchased amount, errors that might otherwise be 429 are returned as 5XX and count toward the SLA error rate. When you exceed your purchased amount, the additional requests are processed on-demand as pay-as-you-go.

How to Resolve 429 Errors

The steps to resolve a 429 error vary depending on your quota framework.

Pay-as-you-go

On the pay-as-you-go quota framework, you have the following options to resolve 429 errors:

  • Use the global endpoint: Whenever possible, use the global endpoint instead of a regional endpoint.
  • Implement a retry strategy: Use truncated exponential backoff to retry requests.
  • Request a quota increase: If your model uses quotas, you can submit a Quota Increase Request (QIR).
  • Smooth traffic: If your model uses Dynamic Shared Quota (DSQ), smoothing traffic and reducing large spikes can help. For more information, see Dynamic shared quota.
  • Subscribe to Provisioned Throughput: For a more consistent level of service, subscribe to Provisioned Throughput. For more information, see Provisioned Throughput.

Provisioned Throughput

To resolve a 429 error when you have a Provisioned Throughput subscription, you can do the following:

  • Allow on-demand processing: Use the default behavior by not setting a header in your prediction requests. Any overages are processed on-demand and billed as pay-as-you-go.
  • Increase reserved capacity: Increase the number of GSUs in your Provisioned Throughput subscription.

What's next