About synchronous disk replication


Regional Persistent Disk and Hyperdisk Balanced High Availability are storage options that let you implement high availability (HA) services in Compute Engine. Regional Persistent Disk and Hyperdisk Balanced High Availability synchronously replicate data between two zones in the same region and ensure HA for disk data for up to one zonal failure.

Regional Persistent Disk and Hyperdisk Balanced High Availability volumes are designed for workloads that require a lower Recovery Point Objective (RPO) and Recovery Time Objective (RTO). To learn more about RPO and RTO, see Basics of disaster recovery planning.

Regional Persistent Disk and Hyperdisk Balanced High Availability volumes are designed to work with regional managed instance groups.

This document provides an overview of how to build HA services with Regional Persistent Disk and Hyperdisk Balanced High Availability volumes.

When you decide to use Regional Persistent Disk or Hyperdisk Balanced High Availability, make sure that you compare the different options for increasing service availability and the cost, performance, and resiliency for different service architectures.

About synchronous disk replication

A Regional Persistent Disk or Hyperdisk Balanced High Availability (Preview) volume, also referred to as a replicated disk, has a primary and a secondary zone within its region where it stores disk data:

  • Primary zone is the same zone where the compute instance that you attach the disk to is located.
  • Secondary zone is an alternate zone of your choice within the same region.

Compute Engine maintains replicas of your disk in both these zones. When you write data to your disk, Compute Engine synchronously replicates that data to the disk replicas in both zones to ensure HA. The data of each zonal replica is spread across multiple physical machines within the zone to ensure durability. Zonal replicas ensure that the data of the disk remains available and provide protection against temporary outages in one of the disk zones.

Replica state for zonal replicas

Disk replica state for Regional Persistent Disk or Hyperdisk Balanced High Availability (Preview) shows you the state of a zonal replica in comparison to the content of the disk. Zonal replicas for your disks are in one of the following disk replica states at all times:

  • Synced: The replica is available, synchronously receives all the writes performed to the disk, and is up to date with all the data on the disk.
  • Catching up: The replica is available but is still catching up with the data on the disk from the other replica.
  • Out of sync: The replica is temporarily unavailable and out of sync with the data on the disk.

To learn how to check and track the replica states of your zonal replicas, see Monitor the disk replica states.

Replication states for synchronously replicated disks

Depending on the state of the individual zonal replicas, your Regional Persistent Disk or Hyperdisk Balanced High Availability (Preview) volume can be in one of the following replication states:

  • Fully replicated: Replicas in both zones are available and are synced with the latest disk data.
  • Catching up: Your zonal replicas are available, but one of the zonal replicas is catching up with the latest disk data.
  • Degraded: One of the zonal replicas has a status of out of sync due to a failure or an outage.

If the disk replication status is catching up or degraded, then one of the zonal replicas is not updated with all the data. Any outage during this time in the zone of the healthy replica results in an unavailability of the disk until the healthy replica zone is restored.

When your Regional Persistent Disk or Hyperdisk Balanced High Availability volume is catching up, Google Cloud starts healing the zonal replica that is catching up. Google recommends that you wait for the affected zonal replica to catch up with the data on the disk, at which point its status changes to Synced. After the zonal replica then moves to the synced state, the replicated disk status changes back to the Fully replicated state.

If the replicated disk has a status of catching up or degraded for a prolonged period of time and does not meet your organization's RPO requirements, we recommend that you take snapshots of the primary replica in either of following ways:

  • Enable scheduled snapshots.
  • Create a manual snapshot of your Regional Persistent Disk or Hyperdisk Balanced High Availability disk.

After you create a snapshot, you can create a new Regional Persistent Disk or Hyperdisk Balanced High Availability disk by using that snapshot as the source. This restores the snapshot to the new disk. Your new disk also starts in a fully replicated state with healthy data replication.

To learn how to check the replication state of your Regional Persistent Disk or Hyperdisk Balanced High Availability disk, see Determine the replication state of disks.

Replica recovery checkpoint

A replica recovery checkpoint is a disk attribute that represents the most recent crash-consistent point in time of a fully replicated disk. Compute Engine automatically creates and maintains a single replica recovery checkpoint for each replicated disk. When a disk is fully replicated, Compute Engine keeps refreshing its checkpoint approximately every 10 minutes to ensure that the checkpoint remains updated. When the disk replication status is degraded, Compute Engine lets you create a standard snapshot from the replica recovery checkpoint of that disk. The resulting standard snapshot captures the data from the most recent crash-consistent version of the fully replicated disk.

In rare scenarios, when your disk is degraded, the zonal replica that is synced with the latest disk data can also fail before the out-of-sync replica catches up. You won't be able to force-attach your disk to compute instances in either zone. Your replicated disk becomes unavailable and you must migrate the data to a new disk. In such scenarios, if you don't have any existing standard snapshots available for your disk, you might still be able to recover your disk data from the incomplete replica by using a standard snapshot created from the replica recovery checkpoint.

Compute Engine automatically creates replica recovery checkpoints for each mounted Regional Persistent Disk or Hyperdisk Balanced High Availability (Preview) disk. You don't incur any additional charges for the creation of these checkpoints. However, you do incur any applicable storage charges for the creation of snapshots and compute instances when you use these checkpoints to migrate your replicated disk to functioning zones.

Learn more about how to recover your replicated disk data using a replica recovery checkpoint.

Replicated disk failover

In the event of an outage in a zone, the zone becomes inaccessible and the compute instance in that zone can't perform read or write operations on its disk. To allow the instance to keep performing read and write operations for the replicated disk, Compute Engine allows migration of disk data to the other zone where the disk has a replica. This process is called failover.

The failover process involves detaching the zonal replica from the instance in the affected zone and then attaching the zonal replica to a new instance in the secondary zone. Compute Engine synchronously replicates the data on your disk to the secondary zone to ensure a quick failover in case of a single replica failure.

Failover by application-specific regional control plane

The application-specific regional control plane is not a Google Cloud service. When you design HA service architectures, you must build your own application-specific regional control plane. This application control plane decides which instance must have the replicated disk attached and which instance is the current primary instance.

When a failure is detected in the primary instance or database of the replicated disk, the application-specific regional control plane of your HA service architecture can automatically initiate failover to the standby instance in the secondary zone. During the failover, the application-specific regional control plane reattaches the replicated disk to the standby instance in the secondary zone. Compute Engine then directs all traffic to that instance based on health check signals.

The overall failover latency, excluding failure-detection time, is the sum of the following latencies:

  • Less than 1 minute to attach a replicated disk to a standby instance
  • Time required for application initialization and crash recovery

For more information, see Understanding the application-specific regional control plane.

The Disaster Recovery Building Blocks page covers the building blocks available on Compute Engine.

Failover by force-attach

One of the benefits of Regional Persistent Disk and Hyperdisk Balanced High Availability (Preview) is that in the unlikely event of a zonal outage, you can manually failover your workload to another zone. When the original zone has an outage, you can't complete the disk detach operation until that zonal replica is restored. In this scenario, you might need to attach the secondary zonal replica to a new compute instance without detaching the primary zonal replica from your primary instance. This process is called force-attach.

When your compute instance in the primary zone becomes unavailable, you can force attach your disk to an instance in the secondary zone. To perform this task, you must do one of the following:

  • Start another compute instance in the same zone as the replicated disk that you are force attaching.
  • Maintain a hot standby compute instance in that zone. A hot standby is a running instance that is identical to the one in the primary zone. The two instances have the same data.

Compute Engine executes the force-attach operation in less than one minute. The total recovery time objective (RTO) depends not only on the storage failover (the force attachment of the replicated disk), but also on other factors, including the following:

  • Whether you must first create a secondary instance
  • The length of time that it takes the underlying file system to detect a hot-attached disk
  • The recovery time of the corresponding applications

For more information about how to failover your compute instance using force-attach, see Failover your replicated disk using force-attach.

Regional Persistent Disk and Hyperdisk Balanced High Availability favor workload availability, which means there are tradeoffs for data protection in the unlikely event that both disk replicas are unavailable at the same time. For more information, see Manage failures for replicated disks.

Limitations

The following sections list the limitations that apply for Regional Persistent Disk and Hyperdisk Balanced High Availability (Preview).

General limitations for replicated disks

  • You can attach regional Persistent Disk only to VMs that use E2, N1, N2, and N2D machine types.
  • You can attach Hyperdisk Balanced High Availability only to supported machine types.
  • You cannot create a regional Persistent Disk from an image.
  • When using read-only mode, you can attach a regional balanced Persistent Disk to a maximum of 10 VM instances.
  • The minimum size of a regional standard Persistent Disk is 200 GiB.
  • You can only increase the size of a regional Persistent Disk or Hyperdisk Balanced High Availability volume; you can't decrease its size.
  • Regional Persistent Disk and Hyperdisk Balanced High Availability volumes have different performance characteristics than their corresponding zonal disks. For more information, see Block storage performance.
  • If you create a replicated disk by cloning a zonal disk, then the two zonal replicas aren't fully in sync at the time of creation. After creation, you can use the regional disk clone within 3 minutes, on average. However, you might need to wait for tens of minutes before the disk reaches a fully replicated state and the recovery point objective (RPO) is close to zero. Learn how to check if your replicated disk is fully replicated.

Limitations for replica recovery checkpoints

  • A replica recovery checkpoint is part of the device metadata and doesn't show you any disk data by itself. You can only use the checkpoint as a mechanism to create a snapshot of your degraded disk. After you create the snapshot by using the checkpoint, you can use the snapshot to restore your data.
  • You can create snapshots from a replica recovery checkpoint only when your disk is degraded.
  • Compute Engine refreshes the replica recovery checkpoint of your disk only when the disk is fully replicated.
  • Compute Engine maintains only one replica recovery checkpoint for a disk and only maintains the latest version of that checkpoint.
  • You can't view the exact creation and refresh timestamps of a replica recovery checkpoint.
  • You can create a snapshot from your replica recovery checkpoint only by using the Compute Engine API.

What's next