High availability options using regional PDs

This document discusses how you can use regional persistent disks to build high availability (HA) services by comparing different options for increasing service availability, as well as comparing cost, performance, and resiliency for different service architectures. Additionally, it describes failure types and recovery actions to help you decide if regional persistent disks are the correct solution for your HA service.

Regional persistent disk is a storage option that provides synchronous replication of data between two zones in a region. Regional persistent disks can be a good building block to use when you implement HA services in Compute Engine.

The benefit of regional persistent disks is that in the event of a zonal outage, where your virtual machine (VM) instance might become unavailable, you can force attach a regional persistent disk to a VM instance in a secondary zone in the same region. To perform this task, you must either start another VM instance in the same zone as the regional persistent disk that you are force attaching, or maintain a hot standby VM instance in that zone. A hot standby is a running VM instance that is identical to the one you are using. The two instances have the same data.

The force-attach operation executes in less than one minute, which achieves a recovery time objective (RTO) in minutes. The total RTO depends not only on the storage failover (the force attachment of the regional persistent disk), but also on whether a secondary VM instance must be created first, the length of time the underlying file system detects a hot-attached disk, the recovery time of the corresponding applications, and other factors.

Design considerations

Before you start designing a HA service, understand the characteristics of the application, the file system, and the operating system. These characteristics are the basis for the design and can rule out various approaches. For example, if an application does not support application-level replication, some corresponding design options are not applicable.

Similarly, if the application, the file system, or the operating system are not crash tolerant, then using regional persistent disks or even zonal persistent disk snapshots might not be an option. Crash tolerance is defined as the ability to recover from an abrupt termination without losing or corrupting data that was already committed to a persistent disk prior to the crash.

Consider the following:

  1. Know the effect on application and write performance.
  2. Determine the service recovery time objective. Understand how quickly your service must recover from a zonal outage and the SLA requirements.
  3. Understand the cost to build a resilient and reliable service architecture. In terms of cost, the following options are for application synchronous and application asynchronous replication:

    Using two instances of the database and VM. In this case the following items determine the total cost:

    • VM instance costs
    • Persistent disk costs
    • Costs of maintaining application replication

    To achieve high availability with a regional persistent disk, use the same VM instance and persistent disk components, but also include a regional persistent disk. Regional persistent disks are double the cost per byte compared to zonal persistent disks because they are replicated in two zones.

    However, using regional persistent disks might reduce your maintenance cost because the data is automatically replicated to two replicas without the requirement of maintaining application replication.

    You can reduce host costs even more by only starting the back-up VM on demand during failover rather than maintaining the VM as a hot standby.

Compare cost, performance, and resiliency

The following table highlights the trade-offs in cost, performance, and resiliency for the different service architectures.

HA service
Zonal persistent disk
Application level
Application level
HA solution
using regional
persistent disk
Protects against application, VM, zone failure1
Mitigation against application corruption (Example: application crash-intolerance)2
Cost $ $$
Two instances of the database/VM running + cost of maintaining and setting up application replication + cross zone networking
Two instances of the database/VM running + cost of maintaining and setting up application replication + cross zone networking
$1.5x - $$
The costs are the same as application replication if you use a hot standby. You can achieve a lower cost by spinning up a back-up VM on demand during failover.
Application performance
No effect

Trade-off on application performance with synchronous replication

No effect

No effect for most applications
Suited for application with low RPO requirement (No tolerance to data loss)
Data loss depending on when the snapshot was taken

No data loss3

Data loss because replication is asynchronous

No data loss
Storage recovery time from disaster4 0 (minutes) 0 (seconds) 0 (seconds) 0 (seconds)—to force attach the disk to a standby VM instance

1 Using regional persistent disks or snapshots is not sufficient to protect from and mitigate against failures and corruptions. Your application, file system, and possibly other software components must be crash consistent or use some sort of quiescing.

2 The replication of some applications does provide mitigation against some application corruptions. For example, MySQL master application corruption doesn't cause its replica instances to become corrupted as well. Review your application's documentation for details.

3 Data loss means unrecoverable loss of data committed to persistent storage. Any non-committed data is still lost.

4 Failover performance doesn't include file system check and application recovery and load after failover.

Building HA database services using regional persistent disks

This section covers high level concepts for building HA solutions for stateful database services (MySQL, Postgres, etc.) using regional persistent disks and Compute Engine.

This discussion covers mitigation of single zone outages. An application can still become unavailable in case of broader outages, for example, if a whole region becomes unavailable. Depending on your needs, you might want to consider cross-regional replication techniques for even higher availability.

Database HA configurations typically have at least two VM instances. Preferably these instances are part of one or more managed instance groups:

  • A primary VM instance in the primary zone
  • A standby VM instance in a secondary zone

A primary VM instance has at least two persistent disks: a boot disk, and a regional persistent disk. The regional persistent disk contains database data and any other mutable data that should be preserved to another zone in case of an outage.

A standby VM instance requires a separate boot disk to be able to recover from configuration-related outages, which could result from an operating system upgrade, for example. You cannot force attach a boot disk to another VM during a failover.

The primary and standby VM instances are configured to use a load balancer with the traffic directed to the primary VM based on health check signals. This configuration is also known as a hot standby. The disaster recovery scenario for data outlines other failover configurations, which might be more appropriate for your scenario.

Health checks

Health checks are implemented by the health check agent and serve two purposes:

  1. The health check agent resides within the primary and secondary VMs to monitor the instances and communicate with the load balancer to direct traffic. This is particularly useful with instance groups.
  2. The health check agent syncs with the application-specific regional control plane and makes failover decisions based on control plane behavior. The control plane must be in a zone that differs from the instance whose health it is monitoring.

The health check agent itself must be fault tolerant. For example, notice that, in the image that follows, the control plane is separated from the primary instance, which resides in zone us-central1-a, while the standby VM resides in zone us-central1-f.

Health check agent role in
                                               the VM.

The health check agent's role in primary and standby VM instances.


When a failure is detected within a primary VM or database, the application control plane can initiate failover to the standby VM in the secondary zone. During the failover, the regional persistent disk that is synchronously replicated to the secondary zone is force attached to the standby VM by the application control plane, and all traffic is directed to that VM based on health check signals.

Overall failover latency, excluding failure-detection time, is the sum of the following latencies:

  • Zero seconds to force attach a regional persistent disk to a standby VM
  • Time required for application initialization and crash recovery

The Disaster Recovery Building Blocks page covers the current building blocks available on Compute Engine. Regional persistent disks add another key building block for architecting HA solutions by providing disk-level replication.

Failure modes

The following table lists the different failure modes and recommended actions for services that use regional persistent disks.

Failure category and (probability) Failure types Action
Zone failure (medium) Disk only failure in local zone. Failures can either be temporary or long lasting.

Compute Engine control plane
Power failure
Networking failure

Temporary hiccups in regional disk operations are transparently handled by a regional persistent disk without a need for failover. A regional persistent disk automatically detects errors and slowness, switches replication modes, and performs catch up of data replicated only to one zone.

In case of storage problems in a primary zone, a regional persistent disk automatically performs reads from a secondary zone. This can result in increased latency of read operations. Under these circumstances, the application might trigger failover based on performance.

Application control plane can trigger failover based on health check thresholds.
Application failure (high) Application unresponsive
Application admin actions (for example, upgrade)
Human error
(for example, misconfiguration of parameters such as SSL certificate, ACLs, etc.)
Application control plane can trigger failover based on health check thresholds.
VM failure (medium) Infrastructure/hardware failure
VM unresponsive due to CPU contention, intermediate network interruption
VMs are usually autohealed. The application control plane can trigger failover based on health check thresholds.
Application corruption (low-medium) Application data corruption
(for example, due to application bugs or a bad OS upgrade)
Application recovery:

Challenges with database replication

The following table lists some common challenges with setting up and managing application synchronous or semi-synchronous replication (like MySQL) and how they compare to block replication with regional persistent disks.

Challenges Application synchronous
or semi-synchronous replication
Block replication with
regional persistent disks
Maintaining stable replication between master and failover replica. There are a number of things that could go wrong and cause an instance to fall out of HA mode:
  1. Misconfiguration of replication parameters such as SSL certificate mismatch or missing ACL on the master side.
  2. High load on master instance causing failover replica to be unable to keep up.
  3. Bugs causing replication issues such as application issues, OS misconfiguration, or Docker failure.
  4. Infrastructure failures like CPU contention, frozen VM, or intermediate network interruption.
Storage failures are handled by regional persistent disks. This happens transparently to the application except for a possible fluctuation in the disk's performance.
There must be user-defined health checks to reveal any application or VM issues and trigger failover.
The end-to-end failover time is longer than desired. The time taken for the failover operation doesn't have an upper bound. Waiting for all transactions to be replayed (step 2 above) could take an arbitrarily long time, depending on the schema and the load on the database. Regional persistent disks provide synchronous replication, so the failover time is bounded by the sum of the following latencies:
  1. Creation of a secondary VM, unless there is already a hot standby VM instance available.
  2. Force attaching a regional persistent disk.
  3. Application initialization.
Split-brain To avoid split-brain, both approaches require provisions to ensure that there is only one master at a time.

What's next