Troubleshoot quota errors

You might receive quota errors for a number of reasons, such as exceeding quota values or not setting the quota on a project correctly. If you want to be alerted when errors happen, you can create custom alerts for specific quota errors, as described in Set up quota alerts.

Exceeding rate quotas

Rate quotas reset after a predefined time interval that is specific to each service. For more information, see the quotas documentation for the specific service.

Exceeding quota values

If your project exceeds its maximum quota value while using a service, Google Cloud returns an error based on how you accessed the service:

  • If you exceed a quota value with an API request, Google Cloud returns an HTTP 413 REQUEST ENTITY TOO LARGE status code. Note that when using the BigQuery legacy streaming API in a production environment, you may receive a 413 REQUEST ENTITY TOO LARGE status code if your HTTP requests are larger than 10 MB. You may also receive this error if you exceed 300 MB per second. For more information see Streaming inserts.
  • If you exceeded a quota value with an HTTP/REST request, Google Cloud returns an HTTP 429 TOO MANY REQUESTS status code.
  • If you exceed a quota for Compute Engine, Google Cloud typically returns an HTTP 403 QUOTA_EXCEEDED status code, whether it was from API, HTTP/REST, or gRPC. If the quota is a rate quota, then 403 RATE_LIMIT_EXCEEDED is returned.
  • If you exceeded a quota value using gRPC, Google Cloud returns a ResourceExhausted error. How this error appears to you depends on the service.
  • If you exceeded a quota value using a Google Cloud CLI command, the gcloud CLI outputs a quota-exceeded error message and returns with the exit code 1.
  • If you received a QUOTA_EXCEEDED message during a service rollout, see the following section.

Exceeding quota values during a service rollout

Google Cloud sometimes changes the default quota values for resources and APIs. These changes take place gradually, which means that during the rollout of a new default quota, the quota value that appears in the Google Cloud console might not reflect the new quota value that is available to you.

If a quota rollout is in progress, you may receive an error message that states The future limit is the new default quota that will be available after a service rollout completes. If you see this error message, the cited quota value and future value are correct, even if what appears in the Google Cloud console is different.

  • For additional information, view the audit logs and look for a QUOTA_EXCEEDED message.

        "status": {
          ...
          "message": "QUOTA_EXCEEDED",
          "details": [
            {
              ...
              "value": {
                "quotaExceeded": {
                  ...
                  "futureLimit": FUTUREVALUE
                }
              }
            }
          ]
        },
    
  • To view charts that show current and peak usage, go to the Quotas & System Limits page and then click Monitoring. You might need to go to the end of the table.

  • If you need more quota, you can request a quota adjustment.

API error messages

If your quota project (also called a billing project) isn't set correctly, API requests might return error messages that are similar to the following:

  • User credentials not supported by this API
  • API not enabled in the project
  • No quota project set

These and other errors can often be fixed by setting the quota project. For more information, see Quota project overview.

Error code 429

If the number of your requests exceeds the capacity allocated to process requests, then error code 429 is returned. The following table displays the error message generated by each type of quota framework:

Product Message
Pay-as-you-go Resource exhausted, please try again later.
Provisioned Throughput Too many requests. Exceeded the provisioned throughput.

With a Provisioned Throughput subscription, you can reserve an amount of throughput for specific generative AI models. If you don't have a Provisioned Throughput subscription and resources aren't available to your application, then an error code 429 is returned. Although you don't have reserved capacity, you can try your request again. However, the request isn't counted against your error rate as described in your service level agreement (SLA).

For projects that have purchased Provisioned Throughput, Vertex AI measures a project's throughput and reserves that amount of throughput so that it's available. When you're using less than your purchased throughput amount, errors that might otherwise return as 429 are returned as 5XX and are counted as part of the error rate that is described in the SLA.

Pay-as-you-go

On the pay-as-you-go quota framework, you have the following options for resolving 429 errors:

Provisioned Throughput

To correct the error generated by Provisioned Throughput, do the following:

  • Use the default example, which doesn't set a header in prediction requests. Any overages go to on-demand (pay-as-you-go).
  • Increase the number of GSUs in your Provisioned Throughput subscription.