Failover for external Application Load Balancers

This page describes how failover works for external Application Load Balancers. The failover configuration involves two load balancers: a primary load balancer and a backup load balancer. For the purpose of this discussion, the primary load balancer is the load balancer for which you want to configure failover. The backup load balancer is the load balancer that receives connections when the primary load balancer starts failing health checks.

Failover and failback are the automatic processes that route traffic to and from a load balancer. When Cloud DNS detects an outage and routes traffic from the primary load balancer to the backup load balancer, the process is called failover. When Cloud DNS reverses this and redirects traffic to the primary load balancer, the process is called failback.

How failover works

Global to regional failover for external Application Load Balancers is handled by creating two or more regional external Application Load Balancers in the regions where you want the traffic to failover to. Only regional external Application Load Balancers can be used as backup load balancers. Regional external Application Load Balancers are not only self-contained within individual Google Cloud regions, they are also isolated from any global external Application Load Balancer or classic Application Load Balancer infrastructure running in the same region.

Regional external Application Load Balancers work best as failover load balancers for global external Application Load Balancers because they are both based on Envoy proxies and process traffic in very similar ways. This is in contrast to a classic Application Load Balancer that has notable differences in how traffic is handled.

In summary, the following failover scenarios are supported:

From a global external Application Load Balancer to a regional external Application Load Balancer
From a regional external Application Load Balancer to a regional external Application Load Balancer
From a classic Application Load Balancer to a regional external Application Load Balancer

Failover and failback workflow

The following setup demonstrates failover from a global external Application Load Balancer to two regional external Application Load Balancers, with one in each region where the global load balancer has deployed backends.

Failover from a global external Application Load Balancer to two regional external Application Load Balancers. — Failover from a global external Application Load Balancer to two regional external Application Load Balancers (click to enlarge).

The following sections describe a typical workflow with the different components involved in a failover configuration.

Detect failures in the primary load balancer

Google Cloud uses health checks to detect whether your primary external Application Load Balancer is healthy. You configure these health checks to send probes from three source regions. These three source regions must be representative of the regions from where your clients will access the load balancer. For example, if you have a global external Application Load Balancer and most of your client traffic originates from North America and Europe, you can configure probes originating from two regions in North America and one region in Europe.

If health checks originating from two or more of these regions fail, this triggers failover to the backup regional external Application Load Balancer.

Additional notes:
- You must specify exactly three source regions when you create the health check. Only global health checks can specify source regions.
- HTTP, HTTPS, and TCP health checks are supported.
- The health check probes actually originate from a Point of Presence (PoP) on the internet within some small distance of the configured Google Cloud source region.
Route traffic to backup load balancers

If the primary load balancer starts failing health checks, Google Cloud uses Cloud DNS failover routing policies to determine how to route traffic to the backup load balancers.

The duration of the outage, or the time it takes for traffic to failover from the primary to backup load balancers, is determined by the DNS TTL value, the health check interval, and the health check's unhealthy threshold. For recommended settings, see Best practices.
Failback to the primary load balancer

After the health checks start passing again, failback to the primary load balancer is automatic. There is no downtime expected during failback because both the backup and primary load balancers are serving traffic.
Test failover periodically

We recommend that you periodically test the failover workflow as part of your business continuity plan. Make sure to test both gradual and instantaneous shifts in traffic from primary to backup load balancers. After verifying that failover works, trigger a failback to verify that traffic is rerouted back to the primary load balancer as expected.

Configure failover

To configure failover, perform the following steps:

Review your existing primary load balancer configuration and check that the features (such as security features, traffic management and routing features, and CDN) used by the primary load balancer are available with the backup regional external Application Load Balancer. If similar features are not available, then this load balancer might not be a good candidate for failover.
Create the backup regional external Application Load Balancer with a configuration that mirrors the primary load balancer as much as possible.
Create the health check and the DNS routing policy to detect outages and route traffic from the primary to the backup load balancer during failover.

Review primary load balancer configuration

Before you begin, verify that the backup regional external Application Load Balancer supports all features currently used with your primary load balancer.

To avoid traffic disruption, review the following differences:

GKE deployments. GKE users should note that load balancers deployed using GKE Gateway are more compatible with this failover mechanism than load balancers deployed using the GKE Ingress controller. This is because GKE Gateway supports configuration of both the global and regional external Application Load Balancers. However, the GKE Ingress controller supports only the classic Application Load Balancer.

For best results, use GKE Gateway to deploy both the primary and backup load balancers.
Cloud CDN. Regional external Application Load Balancers don't support Cloud CDN. Therefore, in the event of a failure, any operations relying on Cloud CDN are also affected. For better redundancy, we recommend configuring a third-party CDN solution that can act as a fallback to Cloud CDN.
Google Cloud Armor. If you use Google Cloud Armor for your primary load balancer, make sure that you also configure the same Google Cloud Armor features when configuring the backup regional external Application Load Balancer. Google Cloud Armor has different features available in the regional versus the global scope. For more information, see the following sections in the Google Cloud Armor documentation:
- Regional Google Cloud Armor security policies
- Global Google Cloud Armor security policies
SSL certificates. If you want to use a common SSL certificate for both the primary and backup load balancers, confirm that the type of SSL certificate being used with the primary load balancer is compatible with the backup regional external Application Load Balancer. Review the differences between the SSL certificates available with global, regional, and classic load balancers. For details, see the following sections:
- Compute Engine SSL certificates
- Certificate Manager SSL certificates
Backend buckets. Regional external Application Load Balancers don't support Cloud Storage buckets as backends. You can't set up failover for load balancers using backend buckets.

Configure the backup load balancer

The backup load balancer is a regional external Application Load Balancer that you configure in the region where you want traffic to be redirected in the event of a failure.

Note the following considerations as you configure your backup load balancer:

You must configure the features of the backup regional external Application Load Balancer to be as similar as possible to the primary load balancer so that traffic is processed similarly across both deployments.
- Global external Application Load Balancer. The regional external Application Load Balancers support most of the same features as the global external Application Load Balancers, with a few exceptions. The regional load balancer also supports the same advanced traffic management capabilities as the global load balancer, which makes it easier to achieve equivalence between the primary and backup load balancers.
- Classic Application Load Balancer. With the classic Application Load Balancer, feature parity between the primary and backup load balancer is harder to achieve because the regional external Application Load Balancer is an Envoy-based load balancer that processes traffic differently. Make sure you test the failover and failback thoroughly before deploying to production.
To view the specific capabilities of the regional, global, and classic Application Load Balancers, see the Load balancer feature comparison page.

We recommend that you use an automation framework such as Terraform to help achieve and maintain consistency in load balancer configurations across both primary and backup deployments.
We recommend that you set up backup regional external Application Load Balancers in every region where you have backends. For example, if you fail over from a global deployment with instance groups in five regions to backup regional load balancers in three regions only, you risk overloading your backend services in these three regions while backend services in the remaining two regions are idle.

Additionally, we recommend that you configure Cloud DNS to use weighted round robin policies when rerouting failover traffic from a primary global load balancer to these backup regional load balancers. Assign weights to each backup load balancer by taking into account the maximum sizes of the backend instance groups in different regions.
Regional external Application Load Balancers support both Premium and Standard Network Service Tiers. If latency is not your primary concern during failover, we recommend that you set up the backup regional external Application Load Balancers using Standard Tier. Using Standard Tier infrastructure offers additional isolation from the Premium Tier infrastructure used by global external Application Load Balancers.
If you want to use the same backends for both primary and backup load balancers, you create the backup regional external Application Load Balancer in the region where the backends are located. If you've enabled autoscaling for the backend instance groups, you must fulfil the requirements for sharing backends across deployments.
If needed, reserve additional Envoy proxies for regional external Application Load Balancers to help ensure that, in case of a failover event, the additional traffic doesn't disrupt any other load balancer deployments in the same region. For details, see Reserve additional proxy-only subnet capacity.

To learn how to configure a regional external Application Load Balancer, see Set up a regional external Application Load Balancer with VM instance group backends.

Reserve additional proxy-only subnet capacity

All regional Envoy-based load balancers in a region and VPC network share the same pool of Envoy proxies. In a failover event, the backup regional external Application Load Balancers see an increase in proxy usage to handle failover traffic from the primary load balancer. To help ensure that capacity is always available for the backup load balancers, we recommend that you review the size of your proxy-only subnet. We recommend that you calculate the estimated number of proxies needed to handle traffic in a given region and increase capacity if needed. This also helps ensure that failover events don't disrupt other regional Envoy-based load balancers in the same region and network.

Generally, a regional external Application Load Balancer proxy can manage up to:

600 (HTTP) or 150 (HTTPS) new connections per second
3,000 active connections
1,400 requests per second

If you're using DNS policies to split traffic across multiple backup load balancers in different regions, you must take that into account when estimating proxy requirements per region and network. A larger proxy-only subnet lets Google Cloud assign a larger number of Envoy proxies to your load balancer when necessary.

You can't expand a proxy-only subnet in the same way that you would for a primary address range (with the expand-ip-range command). Instead, you must create a backup proxy-only subnet that meets your needs and then promote it to the active role.

To learn how to change the size of your proxy-only subnet, see Change the size or address range of a proxy-only subnet.

Sharing backends between primary and backup load balancers

To achieve complete infrastructural redundancy, you must introduce redundancy at both the load balancer level and at the backend level. That means that you must configure your backup regional external Application Load Balancers with backends (instance groups or network endpoint groups) that have no overlap with the primary load balancers.

However, if you do want to share a backend instance group between the primary and secondary load balancers, and autoscaling is enabled for the instance groups, you must fulfill the following requirements to help ensure that proper failover occurs:

The autoscaler must be set up with CPU-based scaling only. The load balancer's utilization-based scaling method is not supported.
Both the global and regional backend services must use only the UTILIZATION balancing mode. Using the RATE balancing mode is not recommended because your instances could receive 2x traffic from both global and regional load balancers during the failover process.
Scale-in controls must be configured to prevent the autoscaler from prematurely scaling down the group during downtime when traffic is switching over from the global load balancer to the regional load balancer. This downtime can be as high as the sum of DNS TTL plus the health check interval configured.

Failure to set up autoscaling correctly can result in a secondary outage during failover because the loss of traffic from the global load balancer causes the instance group to rapidly shrink.

Configure Cloud DNS and health checks

This section describes how to use Cloud DNS and Google Cloud health checks to configure your Cloud Load Balancing environment to detect outages and route traffic to the backup load balancers.

Use the following steps to configure the required health check and routing policies:

Create a health check for the primary load balancer's forwarding rule IP address.
```
gcloud beta compute health-checks create http HEALTH_CHECK_NAME \
    --global \
    --source-regions=SOURCE_REGION_1,SOURCE_REGION_2,SOURCE_REGION_3 \
    --use-serving-port \
    --check-interval=HEALTH_CHECK_INTERVAL \
    --healthy-threshold=HEALTHY_THRESHOLD \
    --unhealthy-threshold=UNHEALTHY_THRESHOLD \
    --request-path=REQUEST_PATH
```
Replace the following:
- HEALTH_CHECK_NAME: the name of the health check
- SOURCE_REGION: the three Google Cloud regions from which health checks are performed. You must specify exactly three source regions.
- HEALTH_CHECK_INTERVAL: the amount of time in seconds from the start of one probe issued by one prober to the start of the next probe issued by the same prober. The minimum supported value is 30 seconds. For recommended values, see Best practices.
- HEALTHY_THRESHOLD and UNHEALTHY_THRESHOLD: the number of sequential probes that must succeed or fail for the VM instance to be considered healthy or unhealthy. If either is omitted, Google Cloud uses a default threshold of 2.
- REQUEST_PATH: the URL path to which Google Cloud sends health check probe requests. If omitted, Google Cloud sends probe requests to the root path, /. If the endpoints being health-checked are private, which is not typical for external forwarding rule IP addresses, you can set this path to /afhealthz.
Create a Cloud DNS record set in Cloud DNS and apply a routing policy to it. The routing policy must be configured to resolve your domain name to either the primary load balancer's forwarding rule IP address, or, in the event of a health check failure, to one of the backup load balancers' forwarding rule IP addresses.
```
gcloud beta dns record-sets create DNS_RECORD_SET_NAME \
    --ttl=TIME_TO_LIVE \
    --type=RECORD_TYPE \
    --zone="MANAGED_ZONE_NAME" \
    --routing-policy-type=FAILOVER \
    --routing-policy-primary-data=PRIMARY_LOAD_BALANCER_FORWARDING_RULE \
    --routing-policy-backup-data_type=GEO \
    --routing-policy-backup-data="BACKUP_REGION_1=BACKUP_LOAD_BALANCER_1_IP_ADDRESS[;BACKUP_REGION_2=BACKUP_LOAD_BALANCER_2_IP_ADDRESS;BACKUP_REGION_3=BACKUP_LOAD_BALANCER_3_IP_ADDRESS]" \
    --health-check=HEALTH_CHECK_NAME \
    --backup-data-trickle-ratio=BACKUP_DATA_TRICKLE_RATIO
```
Replace the following:
- DNS_RECORD_SET_NAME: the DNS or domain name of the record set to add—for example, test.example.com
- TIME_TO_LIVE: the time to live (TTL) for the record set in number of seconds. For recommended values, see Best practices.
- RECORD_TYPE: the record type—for example, A
- MANAGED_ZONE_NAME: the name of the managed zone whose record sets you want to manage—for example, my-zone-name
- PRIMARY_LOAD_BALANCER_FORWARDING_RULE: the forwarding rule name of the primary load balancer
- BACKUP_REGION: the regions where the backup load balancers are deployed
- BACKUP_LOAD_BALANCER_IP_ADDRESS: the forwarding rule IP addresses of the backup load balancers corresponding to each region
- BACKUP_DATA_TRICKLE_RATIO: the ratio of traffic to send to the backup load balancers, even when the primary load balancer is healthy. The ratio must be between 0 and 1, such as 0.1. The default is set to 0.

Best practices

Here are some best practices to keep in mind when you configure the Cloud DNS record and health checks:

The time it takes for traffic to fail over from primary to backup load balancers (that is, the duration of the outage) depends on the DNS TTL value, the health check interval, and the health check's unhealthy threshold parameter.

With Google's Cloud DNS, the upper bound for this period can be calculated using the following formula:
```
Duration of outage = DNS TTL + Health Check Interval * Unhealthy Threshold
```
For failover configuration, we recommend setting the DNS TTL to 30-60 seconds. Higher TTLs lead to longer downtimes because clients on the internet continue to access the primary external Application Load Balancers even after DNS has failed over to the backup regional external Application Load Balancer.
Configure the healthy and unhealthy threshold parameters in the health checks such that you avoid unnecessary failovers caused by transient errors. Higher thresholds increase the time it takes for traffic to fail over to backup load balancers.
To help ensure that your failover setup works as expected, you can set up your DNS routing policy to always send a small percentage of traffic to the backup load balancers even when the primary load balancers are healthy. This can be done by using the --backup-data-trickle-ratio parameter when you create the DNS record set.

You can configure the percentage of the traffic sent to the backup as a fraction from 0 to 1. The typical value is 0.1, although Cloud DNS lets you send 100 percent of the traffic to the backup VIP addresses, to manually trigger a failover.