Instant snapshots: protect Compute Engine workloads from errors and corruption
Karthik Satish
Senior Product Manager
David Seidman
Group Product Manager
Imagine you’re an application administrator performing an application upgrade within a one-hour maintenance window, when all of a sudden the upgrade fails. You now want to initiate your rollback procedures. The normal course of action in the rollback is to recover from regular snapshots; this takes several minutes to complete, depending on how much data you have to restore. Or you could look to restore from a pre-upgrade replica of the application — which is expensive to maintain.
What if you had a cost-effective and extremely fast recovery solution that’s independent of disk size, allowing you to use more of the maintenance window and a predictable rollback time?
Today, we are introducing instant snapshots for Compute Engine, which provide near-instantaneous, high-frequency, point-in-time checkpoints of a disk that can be rapidly restored as needed.
Instant snapshots provide a recovery point objective (RPO) of seconds, and a recovery time objective (RTO) in the tens of seconds. Google Cloud is the only hyperscaler to offer high-performance checkpointing that allows you to recover in seconds. Once an instant snapshot is restored, your Compute Engine workload immediately runs with full disk performance, while competing hyperscalers can take tens of minutes or hours before the workload recovers with full performance.
With instant snapshots in Compute Engine, you get recovery in situations where the core infrastructure is intact, but you need to roll back the data to an earlier state. Common use cases include:
-
Enabling rapid recovery from user error, application software failures, and file system corruption.
-
Backup verification workflows, such as for database workloads, that create periodic snapshots and immediately restore them to run data consistency checks.
-
Taking restore points before an application upgrade to enable rapid rollback in the event that planned maintenance was unsuccessful.
In addition to the above, using instant snapshots can also:
-
Improve developer productivity: In rapid development cycles with long build times or complex code, accidental errors and build failures increase time to complete work. With instant snapshots, you can enjoy fast restores without having to resort to off-site backups.
-
Verify state before backing up: Ensure the backups you take are usable and in a desired state. Use instant snapshots to checkpoint and clone disks to verify on secondary machines before initiating long-term backup.
-
Increase backup frequency: With instant snapshots, you can take frequent backups of high-volume, business-critical databases that can’t afford large backup windows.
Key facets of instant snapshots
Compute Engine instant snapshots provide a number of benefits over traditional snapshots.
-
In-place backups: Instant snapshots are taken on zonal or regional disks, and reside in the same location as their source disk and media type. That means that if a disk is of type SDD (or HDD), an instant snapshot created from that disk will be stored on the same disk . Instant snapshots are metered with a fixed charge on creation, as well as for any incremental storage they retain.
-
Fast and incremental: Instant snapshots are created in seconds, and each instant snapshot only stores changed, incremental data blocks since the previous instant snapshot. This allows for high-frequency snapshotting, so you can snapshot much more often compared to backup snapshots.
-
Fast restore to disks: Each instant snapshot can be restored to a new disk in seconds. These new disks are in the same zone as the snapshot and inherit its disk type.
-
Convertible to backup or archive: You can move instant snapshots off to secondary point of presence for long-term, geo-redundant storage.
How instant snapshots work
Because they only store the data that changed compared to an initial checkpoint, instant snapshots can be created in seconds.
So, in the above example, instant snapshots creates a checkpoint upon creation, but only actually saves data when the third block of this disk is overwritten with new data. Note that even appending data, as we do here, by writing to blocks 6 and 7, doesn’t require additional storage on the checkpoint.
This leads to very high storage efficiency, but also efficient performance: There is no performance impact to the underlying disk when you create an instant snapshot, and as already mentioned, creating and restoring from an instant snapshot can happen in seconds.
With their storage, performance and cost-efficiency, you can incorporate instant snapshots into your standard change-management and maintenance flows, making it easy and efficient to protect your workloads.
Comparing snapshot types
Here is an overview of the different performance characteristics of instant snapshots compared with standard, backup snapshots.
An example use case
Maya is an application administrator who needs to upgrade software hosted on persistent disk volumes. She has a one-hour maintenance window within which to accomplish this task. At the end of the window, the volumes must contain the new software, which will be deemed a successful upgrade; or, if the upgrade failed, they must be in exactly the same state as they were just before the maintenance window kicked off — a rollback.
Consider Maya’s workflow using regular vs. instant snapshots:
Use it today
Instant snapshots provide near-instantaneous on-device point-in-time backups for rapid restores and high-frequency backup. They help optimize application upgrade windows, reduce time for off-site backup, and enable fast restores in the event of failed upgrades or user errors.
To leverage the Persistent Disk instant snapshots please visit the instant snapshots documentation page.