Rate Limiting

This page describes how to use Service Infrastructure to implement rate limiting for managed services that are integrated with the Service Management API.

A managed service can serve many service consumers. In order to protect system capacity and ensure fair usage, a managed service often uses rate limiting to distribute its capacity among its service consumers. The Service Management and Service Control APIs allow you to manage and enforce rate limiting.

Configuring rate limits

To use the rate limiting feature, configure _quota metrics_ and _quota limits_ in the service configuration for your service producer project.

Currently, the supported rate limiting is the number of requests per minute per service consumer, where the service consumer is a Google Cloud Platform project as identified by an API key, a project id, or a project number. For rate limiting, the concep of request is an opaque concept. A service can choose an HTTP request as a request, or a byte of payload as a request. The rate limiting feature is independent of the semantics of a request.

Quota metrics

A metric is a named counter for measuring a certain value over time. For example, the number of HTTP requests a service receives is a metric. A quota metric is a metric that is used for quota and rate limiting purposes. When an activity occurs with a service, one or more quota metrics may increase. When the metric value hits the predefined quota limit, the service should reject the activity with a 429 error.

Quota limits

A quota limit represents an enforceable limit on a quota metric. For example, the number of requests per service consumer per minute is a quota limit. At this time, the only supported type of quota limit is per minute per consumer, specifically, 1/min/{project}.

The actual rate limit for a (service, consumer) pair is controlled by 3 settings:

  • The default limit specified for the managed service.
  • The service producer override for the service consumer.
  • The service consumer override for the service consumer.

The effective rate limit is:

  • The default limit if there is no override.
  • The service producer override if there is a service producer override, but no service consumer override.
  • The minimum(service consumer override, default limit) if there is a service consumer override, but no service producer override.
  • The minimum(service consumer override, service producer override) if there are both service producer and service consumer overrides.

Enforcing rate limiting

To enforce rate limiting, each server that belongs to a managed service needs to call the Service Control API services.allocateQuota method regularly. If the response of the services.allocateQuota method indicates that the usage is above the limit, the server should reject the incoming request with a 429 error. For more information, see the reference documentation for the services.allocateQuota method.

It is recommended that each server should use batching, caching, and predictive logic to improve system performance and reliability. In general, one server should only call the services.allocateQuota method once per second for the same (service, consumer, metric) tuple.

The following example demonstrates how to call the services.allocateQuota method to check for rate limiting. The important request parameters that must be set correctly are the service name, the consumer id, the metric name, and the metric value. The services.allocateQuota method will try to increase the usage by the specified amount for the (service, consumer, metric) tuple. If the increased usage goes above the limit, an error is returned.

gcurl -d '{
  "allocateOperation": {
    "operationId": "123e4567-e89b-12d3-a456-426655440000",
    "methodName": "google.example.hello.v1.HelloService.GetHello",
    "consumerId": "project:endpointsapis-consumer",
    "quotaMetrics": [{
      "metricName": "endpointsapis.appspot.com/requests",
      "metricValues": [{
        "int64Value": 1
    "quotaMode": "NORMAL"
}' https://servicecontrol.googleapis.com/v1/services/endpointsapis.appspot.com:allocateQuota
  "operationId": "123e4567-e89b-12d3-a456-426655440000",
  "quotaMetrics": [
      "metricName": "serviceruntime.googleapis.com/api/consumer/quota_used_count",
      "metricValues": [
          "labels": {
            "/quota_name": "endpointsapis.appspot.com/requests"
          "int64Value": "1"
  "serviceConfigId": "2017-09-10r0"

Error handling

If the HTTP response code is 200, and the response contains RESOURCE_EXHAUSTED QuotaError, your server should reject the request with a 429 error. If the response doesn't contain any quota error, your server should continue serving the incoming requests. For all other quota errors, your server should reject the request with a 409 error. Due to the security risks, you need to be very careful about what error information you include in the error message.

For all other HTTP response codes, it is likely your server has some programming bug. It is recommended your server continue to serve the incoming requests while you debug the problem. If the services.allocateQuota method returns any unexpected error, your service should log the error and accept the income requests. You can debug the error later.

Fail Open

The rate limiting feature is for protecting your managed service from getting overloaded and distributing your service capacity fairly among service consumers. Because most service consumers should not reach their rate limits during normal operations, your managed service should accept all incoming requests if the rate limiting feature is unavailable, also known as fail open. This prevents your service availability being affected by the rate limiting system.

If you use the services.allocateQuota method, your service must ignore 500, 503 and 504 errors without any retry. To prevent a hard dependency on the rate limiting feature, the Service Control API issues a limited amount of error injection on a regular basis.

Was this page helpful? Let us know how we did: