Rate limiting overview

Google Cloud Armor provides capabilities to help protect your Google Cloud applications against a variety of Layer 3 and Layer 7 attacks. Rate-based rules help you protect your applications from a large volume of requests that flood your instances and block access for legitimate users.

Rate limiting can do the following:

  • Prevent any particular client from exhausting application resources.
  • Protect your application instances from erratic and unpredictable spikes in the rate of client requests.

In addition, when a resource is presented with a high volume of traffic from a small number of clients, you can prevent your other clients from being affected by large spikes of traffic from that small number of clients, enabling your resources to handle as many requests as possible.

Google Cloud Armor has two types of rate-based rules:

  1. Throttle: You can enforce a maximum request limit per client or across all clients by throttling individual clients to a user-configured threshold.
  2. Rate-based ban: You can rate limit requests that match a rule on a per-client basis and then temporarily ban those clients for a configured period of time if they exceed a user-configured threshold.

You can preview the effects of rate limiting rules in a security policy by using preview mode and examining your request logs.

Identifying clients for rate limiting

Google Cloud Armor identifies individual clients for rate limiting by using the following key types for aggregating requests and enforcing rate limits:

  • ALL: This is the key for all clients whose requests satisfy the rule match condition.
  • IP: A single key for each client source IP address whose requests satisfy the rule match condition.
  • HTTP-HEADER: A single key for each unique HTTP header value whose name is configured.
  • XFF-IP: A single key for each original source IP address of the client, as specified in the X-Forwarded-Header.

Throttling traffic

The throttle action in a rule allows you to enforce a per-client request threshold to protect backend services. This rule enforces the threshold to limit traffic from each client that satisfies the match conditions in the rule. The threshold is configured as a specified number of requests in a specified time interval.

For example, you might set the request threshold to 2,000 requests within 1,200 seconds (20 minutes). If a client sends 2,500 requests within any 1,200 second period, approximately 20% of the client's traffic is denied until the permitted request volume is at or below the configured threshold.

You set these parameters to control the action:

  • rate_limit_threshold: The number of requests per client allowed within a specified time interval. The minimum value is 10 and the maximum value is 10,000.
    • interval_sec: The number of seconds in the time interval. The value must be 60, 120, 180, 240, 300, 600, 900, 1200, 1800, 2700, or 3600 seconds.
  • exceed_action: When a request exceeds the rate_limit_threshold, Google Cloud Armor denies the requests and returns the specified HTTP response code. For throttle rules, we recommend using the 429 (Too Many Requests) response code.
  • conform_action: This is always an allow action. It is the action to take when the number of requests is under the rate_limit_threshold.

When a client's traffic rate is under or equal to the rate_limit_threshold, requests follow the conform-action which is always allow. The request is immediately sent to the backend service. When a client's traffic rate exceeds the specified rate_limit_threshold, requests over the limit are blocked for the rest of the threshold interval. The deny error code is set in the exceed-action parameter.

Banning clients based on request rates

The rate_based_ban action in a rule allows you to enforce a per-client threshold to protect backend services and temporarily ban clients that exceed the limit by rejecting all requests from the client for a configurable time period. The threshold is configured as a specified number of requests in a specified time interval. You can temporarily ban traffic that matches the specified match condition and exceeds the configured threshold.

For example, you might set the request threshold to 2,000 requests within 1,200 seconds (20 minutes). If a client sends 2,500 requests within any 1,200 seconds, traffic from that client exceeding the 2,000 request threshold is blocked until the full 1,200 seconds has elapsed and for an additional number of seconds that you set as the ban duration period. If the ban duration period is set to 3600, for example, traffic from the client would be banned for 3,600 seconds (one hour) beyond the end of the threshold interval.

These parameters control the action of a rate_based_ban rule:

  • rate_limit_threshold: The number of requests per client allowed within a specified time interval. The minimum value is 10 and the maximum value is 10,000.
    • interval_sec: The number of seconds in the time interval. The value must be 60, 120, 180, 240, 300, 600, 900, 1200, 1800, 2700, or 3600 seconds.
  • exceed_action This is always a deny action. When a client's traffic rate exceeds the specified rate_limit_threshold, the client is banned (all incoming requests are denied) for the rest of the threshold interval and for number of seconds specified in the ban-duration.
  • conform_action: This is always an allow action. It is the action to take when the number of requests is under the rate_limit_threshold.
  • ban_duration_sec: This is the additional number of seconds for which a client is banned after the interval_sec period elapses. The value must be 60, 120, 180, 240, 300, 600, 900, 1200, 1800, 2700, or 3600 seconds.

When a client's request rate is under the rate limit threshold, this immediately allows the request to proceed to the backend service. When a client's traffic rate exceeds the specified rate_limit_threshold, the client is banned—all incoming requests are denied—for the rest of the threshold interval and for the next ban-duration_sec seconds, regardless of whether or not the threshold is exceeded. The deny error code is taken from the configured exceed- action.

With this configuration, it is possible to accidentally ban welcome clients that only occasionally exceed the allowable request rate. To prevent this, and ban only clients that frequently exceed the request rate, you can optionally track the total client requests against an additional, preferably longer, threshold configuration called the ban_threshold. In this mode, the client is banned for the configured ban_duration seconds only if the request rate crosses the configured ban-threshold. If the request rate does not exceed the ban-threshold, the requests keep getting throttled to rate_limit_threshold. For the purpose of ban_threshold, the total requests from the client, consisting of all incoming requests before throttling, are counted.

Threshold enforcement

The configured thresholds for throttling and rate-based bans are enforced independently in each of the Google Cloud regions where your HTTP(S) backend services are deployed. For example, if your service is deployed in two regions, each of the two regions rate limits each key to the configured threshold, so your backend service might experience cross-region aggregated traffic volumes that are twice the configured threshold. If the configured threshold is set to 5,000 requests, the backend service might receive 5,000 requests from one region and 5,000 requests from the second region.

However, for the key type IP address, it is reasonable to assume that traffic from the same client IP address is directed to the region that is closest to the region where your backends are deployed. In this case, rate limiting can be considered to be enforced at a backend service level, regardless of the regions it is deployed in.

It is important to note that the enforced rate limits are approximate and might not be strictly accurate compared to the configured thresholds. Also, in rare cases, because of internal routing behavior, it is possible that rate limiting might be enforced in more regions than the regions you are deployed in, thus impacting accuracy. For these reasons, we recommend that you use rate limiting only for abuse mitigation or maintaining application and service availability, not for enforcing strict quota or licensing requirements.

Logging

Cloud Logging records the security policy name, matched rate limit rule priority, rule ID, the associated action, and other information in your request logs. For more information about logging, see Using request logging.

What's next