This page describes how to use a service load balancing policy to support advanced cost, latency, and resiliency optimizations for the following load balancers:
- Global external Application Load Balancer
- Cross-region internal Application Load Balancer
- Global external proxy Network Load Balancer
- Cross-region internal proxy Network Load Balancer
Cloud Service Mesh also supports advanced load balancing optimizations. For details, see Advanced load balancing overview in the Cloud Service Mesh documentation.
A service load balancing policy (serviceLbPolicy
) is a resource associated
with the load balancer's backend
service. A service load balancing policy
lets you customize the
parameters that influence how traffic is distributed within the backends
associated with a backend service:
- Customize the load balancing algorithm used to determine how traffic is distributed within a particular region or a zone.
- Enable auto-capacity draining so that the load balancer can quickly drain traffic from unhealthy backends.
- Set a failover threshold to determine when a backend is considered unhealthy. This lets traffic fail over to a different backend to avoid unhealthy backends.
Additionally, you can designate specific backends as preferred backends. These backends must be used to capacity before requests are sent to the remaining backends.
The following diagram shows how Cloud Load Balancing evaluates routing, load balancing, and traffic distribution.
Before you begin
Before reviewing the contents of this page, carefully review the Request distribution process described on the External Application Load Balancer overview page. For load balancers that are always Premium Tier, all the load balancing algorithms described on this page support spilling over between regions if a first-choice region is already full.
Supported backends
Service load balancing policies and preferred backends can be configured only on load balancers that use the supported backends as indicated in the following table.
Backend | Supported? |
---|---|
Instance groups
|
|
Regional MIGs | |
Zonal NEGs (GCE_VM_IP_PORT endpoints) |
|
Hybrid NEGs (NON_GCP_PRIVATE_IP_PORT endpoints) |
|
Serverless NEGs | |
Internet NEGs | |
Private Service Connect NEGs |
Load balancing algorithms
This section describes the load balancing algorithms that you can configure in a
service load balancing policy. If you don't configure an algorithm, or if you
don't configure a service load balancing policy at all, the load balancer uses
WATERFALL_BY_REGION
by default.
Waterfall by region
WATERFALL_BY_REGION
is the default load balancing algorithm. With this
algorithm, in aggregate, all the Google Front Ends (GFEs) in the region closest
to the user attempt to fill backends in proportion to their configured target
capacities (modified by their capacity scalers).
Each individual second-layer GFE prefers to select backend instances or
endpoints in a zone that's as close as possible (defined by network round-trip
time) to the second-layer GFE. Because WATERFALL_BY_REGION
minimizes latency
between zones, at low request rates, each second-layer GFE might exclusively
send requests to backends in the second-layer GFE's preferred zone.
If all the backends in the closest region are running at their configured capacity limit, traffic will then start to overflow to the next closest region while optimizing network latency.
Spray to region
The SPRAY_TO_REGION
algorithm modifies the individual behavior of each
second-layer GFE to the extent that each second-layer GFE has no preference
for selecting backend instances or endpoints that are in a zone as close as
possible to the second-layer GFE. With SPRAY_TO_REGION
, each second-layer
GFE sends requests to all backend instances or endpoints, in all zones of the
region, without preference for a shorter round-trip time between the
second-layer GFE and the backend instances or endpoints.
Like WATERFALL_BY_REGION
, in aggregate, all second-layer GFEs in the region
fill backends in proportion to their configured target capacities (modified by
their capacity scalers).
While SPRAY_TO_REGION
provides more uniform distribution among backends in all
zones of a region, especially at low request rates, this uniform distribution
comes with the following considerations:
- When backends go down (but continue to pass their health checks), more second-layer GFEs are affected, though individual impact is less severe.
- Because each second-layer GFE has no preference for one zone over another, the second-layer GFEs create more cross-zone traffic. Depending on the number of requests being processed, each second-layer GFE might create more TCP connections to the backends as well.
Waterfall by zone
The WATERFALL_BY_ZONE
algorithm modifies the individual behavior of each
second-layer GFE to the extent that each second-layer GFE has a very strong
preference to select backend instances or endpoints that are in the
closest-possible zone to the second-layer GFE. With WATERFALL_BY_ZONE
, each
second-layer GFE only sends requests to backend instances or endpoints in
other zones of the region when the second-layer GFE has filled (or
proportionally overfilled) backend instances or endpoints in its most favored
zone.
Like WATERFALL_BY_REGION
, in aggregate, all second-layer GFEs in the region
fill backends in proportion to their configured target capacities (modified by
their capacity scalers).
The WATERFALL_BY_ZONE
algorithm minimizes latency with the following
considerations:
WATERFALL_BY_ZONE
does not inherently minimize cross-zone connections. The algorithm is steered by latency only.WATERFALL_BY_ZONE
does not guarantee that each second-layer GFE always fills its most favored zone before filling other zones. Maintenance events can temporarily cause all traffic from a second-layer GFE to be sent to backend instances or endpoints in another zone.WATERFALL_BY_ZONE
can result in less uniform distribution of requests among all backend instances or endpoints within the region as a whole. For example, backend instances or endpoints in the second-layer GFE's most favored zone might be filled to capacity while backends in other zones are not filled to capacity.
Compare load balancing algorithms
The following table compares the different load balancing algorithms.
Behavior | Waterfall by region | Spray to region | Waterfall by zone |
---|---|---|---|
Uniform capacity usage within a single region | Yes | Yes | No |
Uniform capacity usage across multiple regions | No | No | No |
Uniform traffic split from load balancer | No | Yes | No |
Cross-zone traffic distribution | Yes. Traffic is distributed evenly across zones in a region while optimizing network latency. Traffic might be sent across zones if needed. | Yes | Yes. Traffic first goes to the nearest zone until it is at capacity. Then, it goes to the next closest zone. |
Sensitivity to traffic spikes in a local zone | Average; depends on how much traffic has already been shifted to balance across zones. | Lower; single zone spikes are spread across all zones in the region. | Higher; single zone spikes are likely to be served entirely by the same zone until the load balancer is able to react. |
Auto-capacity draining
When a backend is unhealthy, you usually want to exclude it from load balancing decisions as fast as possible. Excluding unhealthy backends optimizes overall latency by sending traffic only to healthy backends.
When you enable the auto-capacity draining feature, the load balancer
automatically scales a backend's capacity to zero when less than 25 percent
of the backend's instances or endpoints are passing health checks. This
removes the unhealthy backend from the global load balancing pool.
This action is functionally equivalent to setting
backendService.capacityScaler
to 0
for a backend when you want to avoid
routing traffic to that backend.
If 35 percent (10 percent above the threshold) of a previously auto-drained backend's instances or endpoints are passing health checks for 60 seconds, the backend is automatically undrained and added back to the load balancing pool. This ensures that the backend is truly healthy and does not flip-flop between a drained and undrained state.
Even with auto-capacity draining enabled, the load balancer doesn't drain more than 50 percent of backends attached to a backend service, regardless of a backend's health status. Keeping 50 percent of backends attached reduces the risk of overloading healthy backends.
One use-case of auto-capacity draining is to use it to minimize the risk of overloading your preferred backends. For example, if a backend is marked preferred but most of its instances or endpoints are unhealthy, auto-capacity draining removes the backend from the load balancing pool. Instead of overloading the remaining healthy instances or endpoints in the preferred backend, auto-capacity draining shifts traffic to other backends.
You can enable auto-capacity draining as part of the service load balancing policy. For details, see Configure a service load balancing policy.
Auto-capacity is not supported with backends that don't use a balancing mode. This includes backends such as internet NEGs, serverless NEGs, and PSC NEGs.
Failover threshold
The load balancer determines the distribution of traffic among backends in a multi-level fashion. In the steady state, it sends traffic to backends that are selected based on one of the previously described load balancing algorithms. These backends, called primary backends, are considered optimal in terms of latency and capacity.
The load balancer also keeps track of other backends that can be used if the primary backends become unhealthy and are unable to handle traffic. These backends are called failover backends. These backends are typically nearby backends with remaining capacity.
If instances or endpoints in the primary backend become unhealthy, the load balancer doesn't shift traffic to other backends immediately. Instead, the load balancer first shifts traffic to other healthy instances or endpoints in the same backend to help stabilize traffic load. If too many endpoints in a primary backend are unhealthy, and the remaining endpoints in the same backend are not able to handle the extra traffic, the load balancer uses the failover threshold to determine when to start sending traffic to a failover backend. The load balancer tolerates unhealthiness in the primary backend up to the failover threshold. After that, traffic is shifted away from the primary backend.
The failover threshold is a value between 1 and 99, expressed as a percentage of endpoints in a backend that must be healthy. If the percentage of healthy endpoints falls below the failover threshold, the load balancer tries to send traffic to a failover backend. By default, the failover threshold is 70.
If the failover threshold is set too high, unnecessary traffic spills can occur due to transient health changes. If the failover threshold is set too low, the load balancer continues to send traffic to the primary backends even though there are a lot of unhealthy endpoints.
Failover decisions are localized. Each local Google Front End (GFE) behaves independently of the other. It is your responsibility to make sure that your failover backends can handle the additional traffic.
Failover traffic can result in overloaded backends. Even if a backend is unhealthy, the load balancer might still send traffic there. To exclude unhealthy backends from the pool of available backends, enable the auto-capacity drain feature.
Preferred backends
Preferred backends are backends whose capacity you want to completely use before spilling traffic over to other backends. Any traffic over the configured capacity of preferred backends is routed to the remaining non-preferred backends. The load balancing algorithm then distributes traffic between the non-preferred backends of a backend service.
You can configure your load balancer to prefer and completely use one or more backends attached to a backend service before routing subsequent requests to the remaining backends.
Consider the following limitations when you use preferred backends:
- The backends configured as preferred backends might be further away from the clients and result in higher average latency for client requests. This happens even if there are other closer backends which could have served the clients with lower latency.
- Certain load balancing algorithms (
WATERFALL_BY_REGION
,SPRAY_TO_REGION
, andWATERFALL_BY_ZONE
) don't apply to backends configured as preferred backends.
To learn how to set preferred backends, see Set preferred backends.
Configure a service load balancing policy
The service load balancing policy resource lets you configure the following fields:
- Load balancing algorithm
- Auto-capacity draining
- Failover threshold
To set a preferred backend, see Set preferred backends.
Create a policy
To create and configure a service load balancing policy, complete the following steps:
Create a service load balancing policy resource. You can do this either by using a YAML file or directly, by using
gcloud
parameters.With a YAML file. You specify service load balancing policies in a YAML file. Here is a sample YAML file that shows you how to configure a load balancing algorithm, enable auto-capacity draining, and to set a custom failover threshold:
name: projects/PROJECT_ID/locations/global/serviceLbPolicies/SERVICE_LB_POLICY_NAME autoCapacityDrain: enable: True failoverConfig: failoverHealthThreshold: FAILOVER_THRESHOLD_VALUE loadBalancingAlgorithm: LOAD_BALANCING_ALGORITHM
Replace the following:
- PROJECT_ID: the project ID.
- SERVICE_LB_POLICY_NAME: the name of the service load balancing policy.
- FAILOVER_THRESHOLD_VALUE: the failover threshold value. This should be a number between 1 and 99.
- LOAD_BALANCING_ALGORITHM: the load balancing
algorithm to be used. This can be either
SPRAY_TO_REGION
,WATERFALL_BY_REGION
, orWATERFALL_BY_ZONE
.
After you create the YAML file, import the file to a new service load balancing policy.
gcloud network-services service-lb-policies import SERVICE_LB_POLICY_NAME \ --source=PATH_TO_POLICY_FILE \ --location=global
Without a YAML file. Alternatively, you can configure service load balancing policy features without using a YAML file.
To set the load balancing algorithm and enable auto-draining, use the following parameters:
gcloud network-services service-lb-policies create SERVICE_LB_POLICY_NAME \ --load-balancing-algorithm=LOAD_BALANCING_ALGORITHM \ --auto-capacity-drain \ --failover-health-threshold=FAILOVER_THRESHOLD_VALUE \ --location=global
Replace the following:
- SERVICE_LB_POLICY_NAME: the name of the service load balancing policy.
- LOAD_BALANCING_ALGORITHM: the load balancing
algorithm to be used. This can be either
SPRAY_TO_REGION
,WATERFALL_BY_REGION
, orWATERFALL_BY_ZONE
. - FAILOVER_THRESHOLD_VALUE: the failover threshold value. This should be a number between 1 and 99.
Update a backend service so that its
--service-lb-policy
field references the newly created service load balancing policy resource. A backend service can only be associated with one service load balancing policy resource.gcloud compute backend-services update BACKEND_SERVICE_NAME \ --service-lb-policy=SERVICE_LB_POLICY_NAME \ --global
You can associate a service load balancing policy with a backend service while creating the backend service.
gcloud compute backend-services create BACKEND_SERVICE_NAME \ --protocol=PROTOCOL \ --port-name=NAMED_PORT_NAME \ --health-checks=HEALTH_CHECK_NAME \ --load-balancing-scheme=LOAD_BALANCING_SCHEME \ --service-lb-policy=SERVICE_LB_POLICY_NAME \ --global
Remove a policy
To remove a service load balancing policy from a backend service, use the following command:
gcloud compute backend-services update BACKEND_SERVICE_NAME \ --no-service-lb-policy \ --global
Set preferred backends
You can configure preferred backends by using either the Google Cloud CLI or the API.
gcloud
Add a preferred backend
To set a preferred backend, use the gcloud compute backend-services
add-backend
command
to set the --preference
flag when you're adding the backend to the
backend service.
gcloud compute backend-services add-backend BACKEND_SERVICE_NAME \ ... --preference=PREFERENCE \ --global
Replace PREFERENCE with the level of preference you want to
assign to the backend. This can be either PREFERRED
or DEFAULT
.
The rest of the command depends on the type of backend you're using
(instance group or NEG). For all the required parameters, see the
gcloud compute backend-services add-backend
command.
Update a backend's preference
To update a backend's --preference
parameter, use the
gcloud compute backend-services update-backend
command.
gcloud compute backend-services update-backend BACKEND_SERVICE_NAME \ ... --preference=PREFERENCE \ --global
The rest of the command depends on the type of backend you're using
(instance group or NEG). The following example command updates a
backend instance group's preference and sets it to PREFERRED
:
gcloud compute backend-services update-backend BACKEND_SERVICE_NAME \ --instance-group=INSTANCE_GROUP_NAME \ --instance-group-zone=INSTANCE_GROUP_ZONE \ --preference=PREFERRED \ --global
API
To set a preferred backend, set the preference
flag on each
backend by using the global backendServices
resource.
Here is a sample that shows you how to configure the backend preference:
name: projects/PROJECT_ID/locations/global/backendServices/BACKEND_SERVICE_NAME
...
- backends
name: BACKEND_1_NAME
preference: PREFERRED
...
- backends
name: BACKEND_2_NAME
preference: DEFAULT
...
Replace the following:
- PROJECT_ID: the project ID
- BACKEND_SERVICE_NAME: the name of the backend service
- BACKEND_1_NAME: the name of the preferred backend
- BACKEND_2_NAME: the name of the default backend
Troubleshooting
Traffic distribution patterns can change when you attach a new service load balancing policy to a backend service.
To debug traffic issues, use Cloud Monitoring to look at how traffic flows between the load balancer and the backend. Cloud Load Balancing logs and metrics can also help you understand load balancing behavior.
This section summarizes a few common scenarios that you might see in the newly exposed configuration.
Traffic from a single source is sent to too many distinct backends
This is the intended behavior of the SPRAY_TO_REGION
algorithm. However, you
might experience issues caused by wider distribution of your traffic. For
example, cache hit rates might decrease because backends see traffic from a
wider selection of clients. In this case, consider using other algorithms like
WATERFALL_BY_REGION
.
Traffic is not being sent to backends with lots of unhealthy endpoints
This is the intended behavior when autoCapacityDrain
is enabled. Backends
with a lot of unhealthy endpoints are drained and removed from the load
balancing pool. If you don't want this behavior, you can disable auto-capacity
draining. However, this means that traffic can be sent to backends with a lot
of unhealthy endpoints and requests can fail.
Traffic is being sent to more distant backends before closer ones
This is the intended behavior if your preferred backends are further away than your default backends. If you don't want this behavior, update the preference settings for each backend accordingly.
Traffic is not being sent to some backends when using preferred backends
This is the intended behavior when your preferred backends have not yet reached capacity. The preferred backends are assigned first based on round-trip time latency to these backends.
If you want traffic sent to other backends, you can do one of the following:
- Update preference settings for the other backends.
- Set a lower target capacity setting for your preferred backends. The target
capacity is configured by using either the
max-rate
or themax-utilization
fields depending on the backend service's balancing mode.
Traffic is being sent to a remote backend during transient health changes
This is the intended behavior when the failover threshold is set to a high value. If you want traffic to keep going to the primary backends when there are transient health changes, set this field to a lower value.
Healthy endpoints are overloaded when other endpoints are unhealthy
This is the intended behavior when the failover threshold is set to a low value. When endpoints are unhealthy, the traffic intended for these unhealthy endpoints is instead spread among the remaining endpoints in the same backend. If you want the failover behavior to be triggered sooner, set this field to a higher value.
Limitations
- Each backend service can only be associated with a single service load balancing policy resource.