About quotas

OpenAPI | gRPC

Cloud Endpoints provides quotas, which let you control the rate at which applications can call your API. Setting a quota lets you specify usage limits to protect your API from an excessive number of requests from calling applications. The excessive requests might have been caused by a simple typo or from an inefficiently designed system that makes needless calls to your API. Regardless of the cause, blocking traffic from a source once it reaches a certain level is necessary for the overall health of your API. By setting a quota, you ensure that one application cannot negatively impact other applications that use your API.

This page provides an overview of the key functionality provided by quotas.

Requests are tied to the consumer project

After you configure a quota, Endpoints tracks the number of requests per minute per consumer Google Cloud project. Each application that calls your API must:

Have a Google Cloud project.
Have enabled your API in their Google Cloud project.
Send an API key with each request to your API. This lets Endpoints identify the Google Cloud project that the calling application is associated with and to increment the request counter for the Google Cloud project.

You can either have your API consumers create their own projects in the Google Cloud console, or you can create the projects for them. Because Endpoints enforces quotas per project, you must have one project for each API consumer.

Limit the number of requests per minute

By setting a quota, you can limit the number of requests per minute to your entire API or only to specific methods. If the client code from a consumer project exceeds the limit that you have configured, the request is rejected before it gets to your API, and an HTTP status code of 429 too many requests is returned. Calling applications will need to handle the 429 status code and use exponential backoff or some other retry logic to decrease the rate of calls to your API.

Configure one or more quotas

You can configure one or more named quotas and specify a different rate limit for each quota. For example, you could have some methods in your API that are resource-intensive (such as a method that runs a complex query and returns a large list of results), and other methods that are fast and lightweight. You might want to configure two quotas with different rate limits, and associate the resource-intensive methods with one quota, and the lightweight methods with the other quota.

Configure a cost

When you associate a method with a quota, you always specify a cost for the request. This allows different methods to consume the same quota at different rates. You can use costs as an alternative to configuring different quotas. For example, suppose you configure a quota with a limit of 1000 requests per minute. For the lightweight methods, you configure a cost of 1, which means clients can call the lightweight methods 1000 times per minute. For the resource-intensive methods, you configure a cost of 2, which means that each time the client calls the method, the request counter is incremented by 2, until the limit of 1000 is reached. In effect, this limits the resource-intensive methods to 500 requests per minute.

Override the configured quota

The Endpoints > Services page displays the quota configured for each method in your API. If needed, you can override the configured limit for a specific consumer project. To set an override, you need to enter the project number of the consumer project on the Endpoints > Services page. If you don't have access to the consumer project that you want to override, you need to contact someone who has access to obtain the project number.