High availability for regional external Application Load Balancers

This page describes how to configure a highly available deployment with regional external Application Load Balancers. To achieve high availability, deploy multiple individual regional external Application Load Balancers in regions that best support your application's traffic. This works because regional external Application Load Balancers in different regions are not only isolated from each other, they are also isolated from any global external Application Load Balancer or classic Application Load Balancer infrastructure running in the same region.

Use a Cloud DNS geolocation routing policy to route traffic to two or more load balancers in different regions. The policy routes traffic to the closest available region based on the origin of the client request. We also recommend that you use health checks so that Google Cloud can detect any regional outages and route traffic only to the healthy load balancers.

Here's a sample setup that shows two regional external Application Load Balancers in two different regions.

High availability with two regional external Application Load Balancers. — High availability with two regional external Application Load Balancers (click to enlarge).

The following sections describe a typical workflow with the different components involved in this configuration.

Use health checks to detect regional failures

Google Cloud uses health checks to detect whether your load balancers are healthy. You configure these health checks to send probes from three source regions. These three source regions must be representative of the regions from where your clients access the load balancers. For example, if you have a regional external Application Load Balancer with the majority of your client traffic originating from North America and Europe, you can have probes originating from two or more regions in North America and probes originating from two or more regions in Europe.

Additional notes:
- You must specify exactly three source regions when you create the health check. Only global health checks can specify source regions.
- HTTP, HTTPS, and TCP health checks are supported.
- The health check probes actually originate from a Point of Presence (PoP) on the internet within some small distance of the configured Google Cloud source region.
Route traffic to healthy load balancers

Google Cloud uses a Cloud DNS geolocation routing policy to steer traffic to the load balancers. When all the load balancers are healthy, Cloud DNS routes traffic to the load balancer that is geographically closest to the origin of the client request.

When a load balancer in a particular region starts failing health checks, traffic is routed to available healthy load balancers in other regions.
Failback to using all the load balancers

Failback is automatic when the health checks start passing again. There is no downtime expected during failback because all the available load balancers are serving traffic.

Configure cross-region load balancing

Perform the following steps to configure a cross-region deployment that facilitates high availability:

Create regional external Application Load Balancers in the regions that you determine best support traffic for your application. Each of these load balancers must have the same traffic management and security configuration.
Create the health check and the DNS routing policy to steer traffic to the load balancers based on the client location, and to route traffic away from an unhealthy load balancer in case of an outage.

Create load balancers in multiple regions

Note the following considerations as you configure your additional redundant load balancers:

Configure all the regional external Application Load Balancers with similar features so that traffic is processed consistently regardless of which load balancer serves the request. For example, you must make sure that you're using the same type of SSL certificate, the same Google Cloud Armor policies, and the same traffic management settings for all the regional external Application Load Balancers.

We recommend that you use an automation framework such as Terraform to help achieve and maintain consistency in load balancer configurations across the different regional deployments.
We recommend that you set up regional external Application Load Balancers in every region that you determine would best support traffic for your application.
Regional external Application Load Balancers support both Premium and Standard Network Service Tiers. We recommend that you set up the additional regional external Application Load Balancers in Premium Tier to ensure low latency.

To learn how to configure a regional external Application Load Balancer, see Set up a regional external Application Load Balancer with VM instance group backends.

Configure Cloud DNS and health checks

This section describes how to use Cloud DNS and Google Cloud health checks to configure your Cloud Load Balancing environment to detect outages and route traffic to load balancers in other regions.

Use the following steps to configure the required health check and routing policies:

Create a health check for the primary load balancer's forwarding rule IP address.
```
gcloud beta compute health-checks create http HEALTH_CHECK_NAME \
    --global \
    --source-regions=SOURCE_REGION_1,SOURCE_REGION_2,SOURCE_REGION_3 \
    --use-serving-port \
    --check-interval=HEALTH_CHECK_INTERVAL \
    --healthy-threshold=HEALTHY_THRESHOLD \
    --unhealthy-threshold=UNHEALTHY_THRESHOLD \
    --request-path=REQUEST_PATH
```
Replace the following:
- HEALTH_CHECK_NAME: the name of the health check
- SOURCE_REGION: the three Google Cloud regions from which health checks probes are sent. You must specify exactly three source regions.
- HEALTH_CHECK_INTERVAL: the amount of time in seconds from the start of one probe issued by one prober to the start of the next probe issued by the same prober. The minimum supported value is 30 seconds. For recommended values, see Best practices.
- HEALTHY_THRESHOLD and UNHEALTHY_THRESHOLD: specify the number of sequential probes that must succeed or fail for the VM instance to be considered healthy or unhealthy. If either is omitted, Google Cloud uses a default threshold of 2.
- REQUEST_PATH: the URL path to which Google Cloud sends health check probe requests. If omitted, Google Cloud sends probe requests to the root path, /. If the endpoints being health-checked are private, which is not typical for external forwarding rule IP addresses, you can set this path to /afhealthz.
In Cloud DNS, create a record set and apply a geolocation routing policy to it.
```
gcloud beta dns record-sets create DNS_RECORD_SET_NAME \
    --ttl=TIME_TO_LIVE \
    --type=RECORD_TYPE \
    --zone="MANAGED_ZONE_NAME" \
    --routing-policy-type="GEO" \
    --routing-policy-data="FORWARDING_RULE_NAME_A@REGION_A;FORWARDING_RULE_NAME_B@REGION_B[,;FORWARDING_RULE_NAME_C@REGION_C]" \
    --health-check=HEALTH_CHECK_NAME
```
Replace the following:
- DNS_RECORD_SET_NAME: the DNS or domain name of the record set to add—for example, test.example.com
- TIME_TO_LIVE: the time to live (TTL), in seconds, for the record. For recommended values, see DNS considerations.
- RECORD_TYPE: the record type—for example, A
- MANAGED_ZONE_NAME: the name of the managed zone whose record sets you want to manage—for example, my-zone-name
- FORWARDING_RULE_NAME: the names of the forwarding rules for the load balancer in each REGION
- REGION: the regions where each load balancer is deployed

Best practices

Here are some best practices to keep in mind when you configure the Cloud DNS record and health checks:

The time it takes for traffic to be routed from unhealthy to healthy load balancers (that is, the duration of the outage) depends on the DNS TTL value, the health check interval, and the health check's unhealthy threshold parameter.

With Google's Cloud DNS, the upper bound for this period can be calculated using the following formula:
```
Duration of outage = DNS TTL + Health Check Interval * Unhealthy Threshold
```
We recommend setting the DNS TTL to 30 to 60 seconds. Higher TTLs lead to longer downtimes because clients on the internet continue to access the unhealthy load balancers even after DNS has failed over to other regions.
Configure the healthy and unhealthy threshold parameters in the health checks such that you avoid unnecessary and abrupt rerouting of traffic due to transient errors. Higher thresholds increase the time it takes for traffic to shift to load balancers in other regions.