About regional Persistent Disk


Regional Persistent Disk is a storage option that lets you implement high availability (HA) services in Compute Engine. Regional Persistent Disk synchronously replicate data between two zones in the same region and ensure HA for disk data for up to one zonal failure.

Regional Persistent Disk volumes are designed for workloads that require a lower Recovery Point Objective (RPO) and Recovery Time Objective (RTO). To learn more about RPO and RTO, see Basics of disaster recovery planning.

Regional Persistent Disk volumes are designed to also work with regional managed instance groups.

This document provides an overview about regional Persistent Disk and how you can use a regional Persistent Disk volume to build HA services.

When you decide to use regional Persistent Disk, make sure that you compare the different options for increasing service availability and the cost, performance, and resiliency for different service architectures.

Zonal disk replication for regional Persistent Disk

A regional Persistent Disk volume has a primary and a secondary zone within its region where it stores disk data:

  • Primary zone is the same zone as the virtual machine (VM) instance to which you attach the disk.
  • Secondary zone is an alternate zone of your choice within the same region.

Compute Engine maintains replicas of your regional Persistent Disk volume in both these zones. When you write data to your disk, Compute Engine synchronously replicates that data to the disk replicas in both zones to ensure HA. The data of each zonal replica is spread across multiple physical machines within the zone to ensure durability. Zonal replicas ensure that the data of the Persistent Disk volume remains available and provide protection against temporary outages in one of the disk zones.

Replica state for zonal replicas

Disk replica state for regional Persistent Disk shows you the state of a zonal replica in comparison to the content of the disk. Zonal replicas for your regional Persistent Disk are in one of the following disk replica states at all times:

  • Synced: The replica is available, synchronously receives all the writes performed to the disk, and is up to date with all the data on the disk.
  • Catching up: The replica is available but is still catching up with the data on the disk from the other replica.
  • Out of sync: The replica is temporarily unavailable and out of sync with the data on the disk.

To learn how to check and track the replica states of your zonal replicas, see Monitor the disk replica states of regional Persistent Disk volumes.

Regional Persistent Disk replication state

Depending on the state of the individual zonal replicas, your regional Persistent Disk volume can be in one of the following replication states:

  • Fully replicated. Replicas in both zones are available and are synced with the latest disk data.
  • Catching up. Your zonal replicas are available, but one of the zonal replicas is catching up with the latest disk data.
  • Degraded: One of the zonal replicas is in out of sync due to a failure or an outage.

If your regional Persistent Disk volume is catching up or degraded, then one of the zonal replicas is not updated with all the data. Any outage during this time in the zone of the healthy replica results in an unavailability of the regional Persistent Disk volume until the healthy replica zone is restored.

When your regional Persistent Disk volume is catching up, Google Cloud starts healing the zonal replica that is catching up. Google recommends that you wait for the affected zonal replica to catch up with the data on the disk. After the zonal replica then moves to the synced state, the regional Persistent Disk volume consequently moves back to a fully replicated state. If the regional Persistent Disk volume is catching up or degraded for a prolonged period of time and does not meet your organization's RPO requirements, we recommend that you take snapshots of your disk in either of following ways:

  • Enable scheduled snapshots.
  • Create a manual snapshot for your regional Persistent Disk volume.

After the snapshot is created, you can create a new regional Persistent Disk volume by using that snapshot. You recover your data on the new regional Persistent Disk volume. Your new volume also starts in a fully replicated state with healthy data replication.

To learn how to check the replication state of your regional Persistent Disk volume, see Determine the replication state of regional Persistent Disk.

Regional Persistent Disk replica recovery checkpoint

A replica recovery checkpoint is a regional Persistent Disk attribute that represents the most recent crash-consistent point in time of a fully replicated disk. Compute Engine automatically creates and maintains a single replica recovery checkpoint for each regional Persistent Disk volume. When a regional Persistent Disk volume is fully replicated, Compute Engine keeps refreshing its checkpoint approximately every 10 minutes to ensure that the checkpoint remains updated. When the regional Persistent Disk volume becomes degraded, Compute Engine lets you create a standard snapshot from the replica recovery checkpoint of that disk. The resulting standard snapshot captures the data from the most recent crash-consistent version of the fully replicated disk.

In rare scenarios, when your disk is degraded, the zonal replica that is is synced with the latest disk data can also fail before the out-of-sync replica catches up. You won't be able to force-attach your disk to VMs in either zone. Your regional Persistent Disk volume becomes unavailable and you must migrate the data to a new disk. In such scenarios, if you don't have any existing standard snapshots available for your disk, you might still be able to recover your disk data from the incomplete replica by using a standard snapshot created from the replica recovery checkpoint.

Compute Engine automatically creates replica recovery checkpoints for each mounted regional Persistent Disk volume. You don't incur any additional charges for the creation of these checkpoints. However, you do incur any applicable storage charges for the creation of snapshots and VMs when you use these checkpoints to migrate your disk to functioning zones.

Learn more about how to recover your regional Persistent Disk data using a replica recovery checkpoint.

Regional Persistent Disk failover

In the event of an outage in a zone, the zone becomes inaccessible and the VM in that zone can't perform read or write operations on its disk. To allow the VM to keep performing read and write operations for the disk, Compute Engine allows migration of disk data to the other zone where the disk has a replica. This process is called regional Persistent Disk failover. The failover process involves detaching the VM from the disk replica in the affected zone and then reattaching a new VM to the disk replica in the other zone. Compute Engine synchronously replicates the data on your disk to the secondary region to ensure a quick failover in case of a single replica failure.

Failover by application-specific regional control plane

The application-specific regional control plane is not a Google Cloud service. When you design HA service architectures, you must build your own application-specific regional control plane. This application control plane decides which VM must have the regional Persistent Disk attached and which VM is the current primary VM. When a failure is detected in the primary VM or database of the regional Persistent Disk volume, the application-specific regional control plane of your HA service architecture can automatically initiate failover to the standby VM in the secondary zone. During the failover, the application-specific regional control plane reattaches the regional Persistent Disk volume to the standby VM in the secondary zone. Compute Engine then directs all traffic to that VM based on health check signals.

The overall failover latency, excluding failure-detection time, is the sum of the following latencies:

  • Zero seconds to reattach a regional Persistent Disk volume to a standby VM
  • Time required for application initialization and crash recovery

For more information, see Understanding the application-specific regional control plane

The Disaster Recovery Building Blocks page covers the building blocks available on Compute Engine.

Failover by force-attach

One of the benefits of regional Persistent Disk is that in the unlikely event of a zonal outage, you can also manually failover your workload running on regional Persistent Disk to another zone. When the original zone has an outage, you cannot complete the detach operation until that zonal replica is restored. In this scenario, you might need to attach the new VM to the secondary zonal replica without detaching the VM from your primary zonal replica. This process is called force-attach.

When your VM instance in the primary zone becomes unavailable, you can force attach your disk to a VM instance in the secondary zone. To perform this task, you must do one of the following:

  • Start another VM instance in the same zone as the regional Persistent Disk volume that you are force attaching.
  • Maintain a hot standby VM instance in that zone. A hot standby is a running VM instance that is identical to the one you are using. The two instances have the same data.

Compute Engine executes the force-attach operation in less than one minute. The total recovery time objective (RTO) depends not only on the storage failover (the force attachment of the regional Persistent Disk volume), but also on other factors, including the following:

  • Whether you must first create a secondary VM instance
  • The length of time that the underlying file system detects a hot-attached disk
  • The recovery time of the corresponding applications

For more information about how to failover your VM using force-attach, see Failover your regional Persistent Disk volume using force-attach.

Regional Persistent Disk favors workload availability, which means there are tradeoffs for data protection in the unlikely event that both disk replicas are unavailable at the same time. For more information, see Manage failures for regional Persistent Disk.

Limitations

The following sections list the limitations that apply for regional Persistent Disk.

General limitations for regional Persistent Disk

  • You can attach regional Persistent Disk only to VMs that use E2, N1, N2, and N2D machine types.
  • You cannot create a regional Persistent Disk from an image.
  • When using read-only mode, you can attach a regional balanced Persistent Disk to a maximum of 10 VM instances.
  • The minimum size of a regional standard Persistent Disk is 200 GiB.
  • You can only increase the size of a regional Persistent Disk volume; you can't decrease its size.
  • Regional Persistent Disk volumes have different performance characteristics than zonal Persistent Disk volumes. For more information, see Block storage performance.
  • If you create a regional Persistent Disk by cloning a zonal disk, then the two zonal replicas aren't fully in sync at the time of creation. After creation, you can use the regional disk clone within 3 minutes, on average. However, you might need to wait for tens of minutes before the disk reaches a fully replicated state and the recovery point objective (RPO) is close to zero. Learn how to check if your regional Persistent Disk is fully replicated.

Limitations for regional Persistent Disk replica recovery checkpoint

  • A replica recovery checkpoint is part of the device metadata and doesn't show you any disk data by itself. You can only use the checkpoint as a mechanism to create a snapshot of your degraded disk. After you create the snapshot by using the checkpoint, you can use the snapshot to restore your data.
  • You can create snapshots from a replica recovery checkpoint only when your disk is degraded.
  • Compute Engine refreshes the replica recovery checkpoint of your disk only when the disk is fully replicated.
  • Compute Engine maintains only one replica recovery checkpoint for a disk and only maintains the latest version of that checkpoint.
  • You can't view the exact creation and refresh timestamps of a replica recovery checkpoint.
  • You can create a snapshot from your replica recovery checkpoint only by using the Compute Engine API.

What's next