Implementing Application-Consistent Data Protection for Compute Engine Workloads
David Seidman
Group Product Manager
Compute Engine provides reliable data protection infrastructure for your most demanding workloads. This infrastructure is the foundation for high-scale, mission critical services at Google and across industries including energy, retail, media, and finance.
Today we are announcing the availability of application consistent Persistent Disk snapshot hooks for Linux, available for snapshots and machine images. New OS technology provides hooks for custom scripts to execute before a snapshot or machine image is taken, and after a snapshot or machine image is taken, on Linux instances. The API, console, and gcloud CLI reuse the existing guestFlush flag available for Windows workloads, making it simple to apply the same automation to workloads on both platforms. For documentation, read Creating a Linux application consistent snapshot.
Crash Consistent vs. Application Consistent Snapshots
When data is backed up, it can be preserved with crash consistency or application consistency. Data is considered crash consistent if its state is the same as if a machine had been suddenly unplugged - in this case all disk I/Os are either fully committed, or not present on the disk. There is a point in time when all disk I/Os acknowledged before that time are captured in a backup, and no disk I/Os issued after that time are part of the backup.
Snapshots and machine images are always crash consistent, regardless of what steps a user takes. You do not need to take any action to pause a workload for crash consistency. Crash consistency is often good enough for many workloads, and is preferred for workloads that can’t tolerate pauses or filesystem freezes.
To ensure application consistent protection, you must perform additional state management to ensure that a workload will operate correctly when data is restored. Achieving application consistency may include flushing the state of memory for in-flight transactions. Application consistency is optional, as it typically incurs a brief workload pause, but highly recommended for database backups to ensure recoverability and recover faster. (Note that customers using Google’s Actifio GO can achieve application consistent backups leveraging GO’s own hooks.)
In cases where data dependencies exist across multiple disks on a VM instance, cross-disk consistency may also be required to ensure that backups are captured at the same point in time across all disks. Machine images provide this kind of consistency, where the state of all attached disks is backed up with crash consistency at a given point in time that is automatically synchronized within the VM being protected. If your VM has multiple attached disks, machine images will protect those disks atomically.
Some pause-sensitive customers choose to only take crash consistent snapshots and machine images. A workload restoring crash consistent data may need to replay application-level journals before the workload can be restored. Workload operators can run offline validation on crash consistent snapshots or machine images to determine if they will be usable in advance of any attempted online restores.
Until now, application consistency was available on Windows VMs for Persistent Disk Snapshots and Google Machine Images using the Windows Volume Shadow Copy Service (VSS). Application consistency could be achieved for a single disk using snapshots, or across all attached disks on a VM using machine images, by invoking a guestFlush flag. This would invoke VSS in conjunction with VSS-integrated Windows applications. However, no built-in capability for application consistency has existed on Linux.
Application Consistency on Linux Workloads
As mentioned earlier in this blog, we are announcing the availability of application consistent Persistent Disk snapshot hooks for Linux, available for snapshots and machine images. New OS technology provides hooks for custom scripts to execute before a snapshot or machine image is taken, and after a snapshot or machine image is taken, on Linux instances. The API, console, and gcloud CLI reuse the existing guestFlush flag available for Windows workloads, making it simple to apply the same automation to workloads on both platforms. For documentation, read Creating a Linux application consistent snapshot.
In the past, Linux workloads requiring application consistency would need to poll for the state of a snapshot during its creation, and check for the snapshot to achieve UPLOADING state before restarting the workload. By migrating to guestFlush and in-guest snapshot hooks, you can avoid the need to monitor snapshot resource state, and operate directly on application state from within the guest OS.
Script hooks provide flexibility and control within the Linux guest for your automation. There is no need to stop the VM before taking snapshots or machine images in order to achieve application consistency. First, you configure scripts that will run before and after the snapshot is captured. Then, using the gcloud CLI, console, or API, you create a snapshot with the guestFlush option enabled, which configures the system to invoke the scripts at the appropriate time. Snapshot hooks for Linux can be used to provide custom automation before and after a snapshot is taken, to quiesce and unquiesce a workload or perform other housekeeping and workload inspection.
The ultimate benefit of using the guestFlush option occurs when a snapshot is restored in a usable, uncorrupted application state. Note that application consistency is guaranteed only by the behavior of your custom scripts, and not by the snapshot operation itself.
You can integrate your own scripts, or reach out to your favorite data protection vendor for a solution. Third party products can integrate with application consistency hooks to provide added value in their solutions. Anand Venkatesh, Product Manager at Commvault, says “Commvault now enables our customers to perform application-consistent backups for SAP HANA, PostgreSQL, MySQL, and other Linux-based workloads running on Compute Engine. Configuration is as simple as enabling a setting in Commvault Command Center™ UI, and customers can choose to utilize the built-in automation or run their own custom scripts prior to the backup. This is just another example of how Commvault integrates tightly with Google to enable simple, fast, reliable protection of workloads running on Compute Engine.”
Learn More About Protecting Compute Engine Workloads
When managing and protecting your Compute Engine workloads you have several options to consider. You can now choose application consistency for snapshots and machine images on both Windows and Linux, or you can stick with crash consistency - which is always guaranteed - to eliminate any pause to quiesce the workload. Check out additional best practices for protecting data as well as our Backup & DR solutions page.