Design considerations for resilient workloads with regional persistent disks

Last reviewed 2020-09-23 UTC

This document explains the behaviors and interactions between a stateful application, a health-check agent, and an application-specific regional control plane that is used to monitor and orchestrate a zonal failover by deploying regional persistent disks.

This document is intended for application developers as a follow-up to High availability options using regional persistent disks, expanding on the design and architecture described in the section Building HA database services using regional persistent disks. We recommend that you read that document first, especially the sections about design considerations and cost comparison, performance, and resilience.

A stateless application increases resilience by having at least one secondary virtual machine (VM) instance running in a different Compute Engine zone. When the primary VM instance fails, the application continues to run on the secondary VM instance. A stateful application can persist its application state to a zonal persistent disk in order to recover its state from a VM instance restart. To be resilient, a stateful application must also persist the application state to a secondary VM instance.

The following diagram illustrates a typical two-node stateful application that is replicated across two zones. The application in each zone has a zonal persistent disk to capture the application state, and a networking connection between the VM instances to synchronize application state changes between the nodes.

A load balancer is used to replicate an application state to a primary and a secondary VM, which are in different zones.

Adding a regional persistent disk

Another way to synchronize a stateful application's application state is to add a regional persistent disk. When an application writes its application state to a regional persistent disk, Google Cloud automatically synchronizes the block storage with another zone. Regional persistent disk storage also helps to ensure that only one VM instance across both zones is attached to the regional persistent disk at a time.

The following diagram shows the architecture of a stateful database application.

A regional persistent disk is attached to two VM instances across two zones.

As the preceding diagram shows, there are still two application VM instances—a primary VM instance and a secondary VM instance—deployed in two zones. Along with using a regional persistent disk for application state storage, there's now an extra entity, the application-specific regional control plane. The application-specific regional control plane decides which VM instance has the regional persistent disk attached and which VM instance is the current primary VM instance. This architecture is an active-passive configuration because only the primary VM instance can commit an application state to the regional persistent disk.

VM instances and the stateful application

The preceding architecture diagram illustrates a hot, active-passive database application. The following configurations are also possible:

If your recovery time objective (RTO) can afford the additional latency of starting a secondary VM instance, you can save on Compute Engine costs by running only the active VM instance. In a failover, the application-specific regional control plane starts the secondary VM instance and attaches the regional persistent disk.
Batch or stream processing workloads that checkpoint their progress to the regional persistent disk. In a failover, the application resumes processing from that last checkpoint.

Managing the VM instance startups

Because only a single VM instance can have a regional persistent disk attached at one time, you need to start up the VM instances and attach the regional persistent disk systematically. One best practice is to separate the VM instance and the application startup from the attachment of the regional persistent disk. The VM instance's startup scripts should not initiate the regional persistent disk attachment. Instead, the startup scripts should start the health check agent and wait for the regional persistent disk to be attached.

At startup, the VM instance needs to perform the following sequential steps:

Start the health check agent.
Wait for the regional persistent disk to be attached.
After the regional persistent disk is attached, mount the file system.
After the file system is mounted, start the application.

These steps cover the system startup, but there is also a failover. During a failover, the regional persistent disk is force-attached to the secondary VM instance. The regional persistent disk is also forcibly removed from the primary VM instance, and I/O operations to the file system fail. At this point, you need to shut down or restart the VM instance.

Running the health check agent and health checks

As described in the preceding section, the VM instance waits for the regional persistent disk to be attached before starting the application. The application-specific regional control plane attaches the regional persistent disk, but only to a VM instance that's waiting for the disk to be attached. When a disk is attached, the application-specific control plane monitors the application's health and initiates a failover if the application becomes unhealthy.

Each VM instance has one of the following states:

Down
Starting
Waiting for a disk
Application running

The health check agent reports the current state of the VM instance. Instead of reporting these two states through a single health check, you can run two binary health checks. If the VM instance is ready to have the regional persistent disk attached, or if the regional persistent disk is attached and writable, then the VM instance health check reports a healthy status. If the application is running and is able to write the application state to the regional persistent disk, the application health check reports a healthy status.

Using two binary health checks has several advantages:

You can use the Compute Engine managed health check service, which polls the health check agent and also resolves transient errors through threshold counts.
A managed instance group (MIG) can monitor the instance health check and autoheal an unhealthy VM instance.
The load balancer can monitor the application health check and route traffic to the active application instance.

You can prevent the system from reacting to a transient failure by decreasing the health check reporting frequency, or by increasing the threshold of the repeated signals that are required to transition from one level to another. Both approaches delay the system from reacting to an outage and increase the time to recovery. By testing and measuring these parameters, you can adjust health check parameters to balance system recovery time.

Understanding the application-specific regional control plane

The final piece in the architecture is the application-specific regional control plane, which is responsible for the following two functions:

Managing the lifecycle of the primary and secondary VM instances.
Deciding if a failover is required, by monitoring the status of the application health check.

If a failover is needed, the application-specific regional control plane orchestrates the failover with the following steps:

Checks that a secondary VM instance is running and is waiting for the regional persistent disk to be attached.
Forces the attachment of the regional persistent disk to the secondary VM instance.
Monitors and restarts the failed primary VM instance. When the VM instance is restarted, the control plane initiates a failback as needed.

The application-specific regional control plane itself must be highly available across the two zones where the application is running. In on-premises data centers, high availability (HA) is often achieved by deploying additional servers to build a quorum, decide which VM instance is the primary VM instance, and orchestrate failover. This approach often uses HA monitoring tools such as Heartbeat, Pacemaker, or Keepalived.

Although you can use the application-specific regional control plane anywhere in the cloud, Google Cloud offers the following managed and regionally available services that simplify the implementation of this approach:

Google Cloud serverless products such as App Engine, Cloud Run, and Cloud Functions, which are easy to manage and deploy.
Managed health checks that offload the monitoring of the application instances.
Managed instance groups that manage the lifecycle of the server instances.

The following diagram illustrates the use of Cloud Functions for the application-specific regional control plane, along with a stateful managed instance group and managed health checks.

An application-specific regional control plane manages the primary and secondary VMs.

The preceding diagram shows two VM instances of the application, primary and secondary. Each VM instance runs in a separate zone and is managed by a stateful regional MIG. A regional persistent disk is available across the same two zones. Two managed health check services are running. One managed health check service monitors VM instance health status and is used by the stateful MIG. The other health check service monitors application health status and is used by the load balancer's target pool.

An application-specific regional control plane interacts with the target pool application health status and with the stateful regional MIG in order to monitor application status and to initiate attaching the regional persistent disk to the current healthy VM instance.

What's next

Read Provisioning regional persistent disks in Google Kubernetes Engine.
To learn how you can adapt on-premises HA tools for use in Google Cloud, see Patterns for using floating IP addresses.
Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.