If the number of your requests exceeds the capacity allocated to process
requests, then error code 429
is returned. The following table displays the
error message generated by each type of quota framework:
Quota framework | Message |
---|---|
Pay-as-you-go | Resource exhausted, please try again later. |
Provisioned Throughput | Too many requests. Exceeded the Provisioned Throughput. |
With a Provisioned Throughput subscription, you can reserve an
amount of throughput for specific generative AI models. If you don't have a
Provisioned Throughput subscription and resources aren't available
to your application, then an error code 429
is returned. Although you don't
have reserved capacity, you can try your request again. However, the request
isn't counted against your error rate as described in your service level
agreement (SLA).
For projects that have purchased Provisioned Throughput,
Vertex AI measures a project's throughput and reserves that amount of
throughput so that it's available. When you're using less than your purchased
throughput amount, errors that might otherwise return as 429
are returned as
5XX
and are counted as part of the error rate that is described in the SLA.
Pay-as-you-go
On the pay-as-you-go quota framework, you have the following options for
resolving 429
errors:
- Implement a retry strategy by using truncated exponential backoff.
- If you've set a consumer override and configured it to control cost, then increase the limit. For more information, see Dynamic shared quota.
- Subscribe to Provisioned Throughput for a more consistent level of service. For more information, see Provisioned Throughput.
What's next
- To learn more about dynamic shared quota, see Dynamic shared quota.
- To learn about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
- To learn more about Google Cloud quotas and limits, see
Understand quota values and system limits.