Rate Limiting

This page describes how to use Google Service Control to implement rate limiting for managed services that are integrated with Google Service Management.

A managed service can serve many consumers. In order to protect system capacity and ensure fair usage, a managed service often uses rate limiting to distribute its capacity among its consumers. Google Service Management and Google Service Control provide public APIs to manage and enforce rate limiting.

Configuring rate limits

To use the rate limiting feature, the service producer must configure quota metrics and quota limits in their service configuration.

Currently, the supported rate limiting is the number of requests per minute per consumer, where the consumer is a Google project as identified by an API key, a project id, or a project number. For rate limiting, the concept of request is an opaque concept. A service can choose an HTTP request as a request, or a byte of payload as a request. The rate limiting feature is independent of the semantics of request.

Quota metrics

A metric is a named counter for measuring certain value over time. For example, the number of HTTP requests a service receives is a metric. A quota metric is a metric that is used for quota and rate limiting purposes. When an activity occurs with a service, one or more quota metrics may increase. When the metric value hits the predefined quota limit, the service should reject the activity with 429 errors.

Quota limits

A quota limit represents an enforceable limit on a quota metric. For example, the number of requests per consumer per minute is a quota limit. At this time, the only supported type of quota limit is per minute per consumer, i.e., 1/min/{project}.

The actual rate limit for a pair of (service, consumer) is controlled by 3 settings:

  • The default limit specified for the service.
  • The producer override for the consumer.
  • The consumer override for the consumer.

The effective rate limit is:

  • The default limit if there is no override.
  • The producer override if there is a producer override, but no consumer override.
  • The minimum(consumer override, default limit) if there is a consumer override, but no producer override.
  • The minimum(consumer override, producer override) if there are both producer and consumer overrides.

Enforcing rate limiting

To enforce rate limiting, each server that belongs to a managed service needs to call the Service Control API services.allocateQuota method regularly. If the response of services.allocateQuota indicates that the usage is above the limit, the server should reject the incoming request with 429 errors. For more information, see services.allocateQuota reference.

It is recommended that each server should use batching, caching, and predictive logic to improve system performance and reliability. In general, one server should only call the services.allocateQuota method once per second for the same (service, consumer, metric) tuple.

The following example demonstrates how to call the services.allocateQuota method to check for rate limiting. The important request parameters that must be set correctly are the service name, the consumer id, the metric name, and the metric value. The services.allocateQuota method will try to increase the usage by the specified amount for the (service, consumer, metric) tuple. If the increased usage goes above the limit, an error will be returned.

NOTE: the service consumer can be specified using a project id, a project number, or an API key.

gcurl -d '{
  "allocateOperation": {
    "operationId": "123e4567-e89b-12d3-a456-426655440000",
    "methodName": "google.example.hello.v1.HelloService.GetHello",
    "consumerId": "project:endpointsapis-consumer",
    "quotaMetrics": [{
      "metricName": "endpointsapis.appspot.com/requests",
      "metricValues": [{
        "int64Value": 1
      }]
    }],
    "quotaMode": "NORMAL"
  }
}' https://servicecontrol.googleapis.com/v1/services/endpointsapis.appspot.com:allocateQuota
{
  "operationId": "123e4567-e89b-12d3-a456-426655440000",
  "quotaMetrics": [
    {
      "metricName": "serviceruntime.googleapis.com/api/consumer/quota_used_count",
      "metricValues": [
        {
          "labels": {
            "/quota_name": "endpointsapis.appspot.com/requests"
          },
          "int64Value": "1"
        }
      ]
    }
  ],
  "serviceConfigId": "2017-09-10r0"
}

Error handling

If the HTTP response code is 200, and the response contains RESOURCE_EXHAUSTED QuotaError, your server should reject the request with 429 error. If the response doesn't contain any quota error, your server should continue serving the incoming requests. For all other quota errors, your server should reject the request with 409 error. Due to security risk, you need to be very careful what error information to be included in the error message.

For all other HTTP response code, it is likely your server has some programming bug, it is recommended your server continue to serve the incoming requests while you debug the problem. If the services.allocateQuota method returns any unexpected error, your service should log the error and accept the income requests. You can debug the error later.

Fail Open

The rate limiting feature is for protecting your service from getting overloaded and distributing your service capacity fairly among consumers. Because most consumers should not reach their rate limits during normal operations, your service should accept all incoming requests when rate limiting feature itself is unavailable, aka fail open. This will prevent your service availability being affected by the rate limiting system.

If you use services.allocateQuota REST API, your service must ignore 500, 503 and 504 errors without any retry. To prevent hard dependency on the rate limiting feature, the Service Control API will issue limited amount of error injection on a regular basis.