You might receive quota errors for a number of reasons, such as exceeding quota values or not setting the quota on a project correctly. If you want to be alerted when errors happen, you can create custom alerts for specific quota errors, as described in Set up quota alerts.
Exceeding rate quotas
Rate quotas reset after a predefined time interval that is specific to each service. For more information, see the quotas documentation for the specific service.
Exceeding quota values
If your project exceeds its maximum quota value while using a service, Google Cloud returns an error based on how you accessed the service:
- If you exceed a quota value with an API request, Google Cloud returns an HTTP
413 REQUEST ENTITY TOO LARGE
status code. Note that when using the BigQuery legacy streaming API in a production environment, you may receive a413 REQUEST ENTITY TOO LARGE
status code if your HTTP requests are larger than 10 MB. You may also receive this error if you exceed 300 MB per second. For more information see Streaming inserts. - If you exceeded a quota value with an HTTP/REST request, Google Cloud returns an
HTTP
429 TOO MANY REQUESTS
status code. - If you exceed a quota for Compute Engine, Google Cloud typically returns an
HTTP
403 QUOTA_EXCEEDED
status code, whether it was from API, HTTP/REST, or gRPC. If the quota is a rate quota, then403 RATE_LIMIT_EXCEEDED
is returned. - If you exceeded a quota value using gRPC, Google Cloud returns a
ResourceExhausted
error. How this error appears to you depends on the service. - If you exceeded a quota value using a Google Cloud CLI command, the
gcloud CLI outputs a quota-exceeded error message and returns
with the exit code
1
. - If you received a
QUOTA_EXCEEDED
message during a service rollout, see the following section.
Exceeding quota values during a service rollout
Google Cloud sometimes changes the default quota values for resources and APIs. These changes take place gradually, which means that during the rollout of a new default quota, the quota value that appears in the Google Cloud console might not reflect the new quota value that is available to you.
If a quota rollout is in progress, you may receive an error message that states
The future limit is the new default quota that will be available after a
service rollout completes.
If you see this error message, the cited quota value
and future value are correct, even if what appears in the Google Cloud console
is different.
For additional information, view the audit logs and look for a
QUOTA_EXCEEDED
message."status": { ... "message": "QUOTA_EXCEEDED", "details": [ { ... "value": { "quotaExceeded": { ... "futureLimit": FUTUREVALUE } } } ] },
To view charts that show current and peak usage, go to the Quotas & System Limits page and then click Monitoring. You might need to go to the end of the table.
If you need more quota, you can request a quota adjustment.
API error messages
If your quota project (also called a billing project) isn't set correctly, API requests might return error messages that are similar to the following:
User credentials not supported by this API
API not enabled in the project
No quota project set
These and other errors can often be fixed by setting the quota project. For more information, see Quota project overview.
Error code 429
If the number of your requests exceeds the capacity allocated to process
requests, then error code 429
is returned. The following table displays the
error message generated by each type of quota framework:
Product | Message |
---|---|
Pay-as-you-go | Resource exhausted, please try again later. |
Provisioned Throughput | Too many requests. Exceeded the provisioned throughput. |
With a Provisioned Throughput subscription, you can reserve an amount of
throughput for specific generative AI models. If you don't have a Provisioned
Throughput subscription and resources aren't available to your application, then
an error code 429
is returned. Although you don't have reserved capacity, you
can try your request again. However, the request isn't counted against your
error rate as described in your
service level agreement (SLA).
For projects that have purchased Provisioned Throughput, Vertex AI
measures a project's throughput and reserves that amount of throughput so that
it's available. When you're using less than your purchased throughput amount,
errors that might otherwise return as 429
are returned as 5XX
and are
counted as part of the error rate that is described in the SLA.
Pay-as-you-go
On the pay-as-you-go quota framework, you have the following options for
resolving 429
errors:
- Implement a retry strategy by using truncated exponential backoff.
- If you've set a consumer override and configured it to control cost, then increase the limit. For more information, see Dynamic shared quota.
- Subscribe to Provisioned Throughput for a more consistent level of service. For more information, see Provisioned Throughput.
Provisioned Throughput
To correct the error generated by Provisioned Throughput, do the following:
- Use the default example, which doesn't set a header in prediction requests. Any overages go to on-demand (pay-as-you-go).
- Increase the number of GSUs in your Provisioned Throughput subscription.