Multi Cluster Ingress traffic spillover

Problem

Your Google Kubernetes Engine Multi Cluster Ingress Load Balancer is unnecessarily spilling traffic over from X to Y region when backend pods scale down to 0. This causes higher latency for requests.

Environment

Multi cluster Ingress
External Global HTTPS load balancer (LB)
Google Kubernetes Engine

Solution

Use spread constraints to ensure at least 1 Pod in every zone of the region.

In Q1 2022, Multi Cluster Ingress will support maxRatePerEndpoint which can also be used to prevent the traffic spillover. This value is configured on the backendService.

Cause

The Ingress load balancer backend services are configured with Max-Rate balance mode. The HTTPS LB reroutes traffic across regions on the following 5 scenarios:

When a region’s entire backends reach capacity, traffic reroutes to other regions.
When more than half of the backends in a region are unhealthy, traffic reroutes to other regions.
When all regions' backends are at or above capacity, traffic balances to all regions.
When temporary overflow happens in a slow autoscaling operation, traffic reroutes to other regions.
When high latency happens in some regions, traffic may flow to other regions.

In an example scenario, if you have 3 backend endpoints each in EU and US region, when the EU region has less traffic, 2 of 3 EU backends scale down to 0. This triggers traffic reroute as scenario 2.

Traffic is rerouted to other backends and which includes the US region. Applying maxRateperEndpoint will avoid this issue because reroute will be based on rate capacity on each backend.