Snapshot-based data management

Last reviewed 2024-04-25 UTC

Cloud Volumes Service (CVS) includes snapshot capabilities that you can use to improve your data management. Unlike Google Persistent Disk snapshots, Cloud Volumes Service snapshots aren't a physical copy of your data in Cloud Storage. Snapshots are an alternative, logical view of your data blocks within the volume, giving you instant point-in-time versions of your data. Writing new data or overwriting existing data does not overwrite the existing data blocks but instead writes to a new, different block and updates internal metadata to give you a new view on your data.

The following diagram illustrates how Cloud Volumes Service snapshots work.

snapshots

In this diagram, the file points to data blocks A, B, and C. The snapshot is a copy of the pointers to data blocks A, B, and C. When block C changes to C1, the snapshot continues to point to blocks A, B, and C. C remains stored until the snapshot is deleted.

Snapshot attributes

  • Snapshots are instant and atomic. Snapshots instantly capture data within a volume at an exact point in time. This forms the foundation of creating instant, consistent application backups.

  • Snapshots are performance-neutral. Using snapshots doesn't affect the performance of the volume.

  • Snapshots are space-efficient. A fresh snapshot doesn't consume any additional capacity except for a small amount of metadata. When existing data is overwritten, old data blocks are retained for as long as any snapshot pointing to these data blocks exists. In other words, deleted or modified data still referenced by a snapshot consumes extra space in the volume. When the last snapshot referencing a data block is deleted, the block becomes available.

  • Snapshots are read-only. Snapshots can be accessed by the client through standard file system interfaces. The client can access all snapshots of different point-in-time versions of the volume and read their content.

  • Snapshots can be used for fast clones. For volumes of the CVS-Performance service type, a new volume can be created from any snapshot stored in the same region. Creating a new volume from a snapshot takes the same amount of time as creating a new empty volume independent of the volume or snapshot size. For example, a 100 TiB volume can be cloned within a few seconds. The clone is a new volume and is charged for its capacity.

  • Volumes can quickly revert to a snapshot. Within seconds, a volume can be restored to a snapshot version, regardless of the volume size. Changes made to a volumes after the snapshot was created will be undone. This includes newer snapshots.

  • Snapshots are cost-efficient. Snapshots offer a complete view of multiple point-in-time versions of the volume while only requiring extra capacity for the changed data. Snapshot capacity is counted towards the used space of a volume.

Use cases

This section describes scenarios in which you can use snapshots to address data management challenges.

Fast application cloning

Cloning stateful applications is a common use case across many workloads. Many workloads are separated into development, test, staging, and production environments to successfully transition new application versions from development into production. Ideally, data from production would be used in staging to simulate the upgrade of production reliably. Creating data copies from production can be very time-consuming, which results in fewer test iterations.

The same is true for CI/CD pipelines that automate testing of new code. If a test is modifying complex data structures (for example, a database), then resetting to a clean state for individual tests can be time-consuming, resulting in longer test cycles and fewer iterations per day.

Volumes of the CVS-Performance service type offer fast clones, which can be used to create copies of entire volumes in a few seconds, independent of their size and data structure. Calling a REST API brings integration into automated processes.

Volume clones are independent new volumes and are charged like normal volumes.

Volume backup and recovery

Data backups serve two purposes:

  • Quick recovery of individual files or directories if data is corrupted or deleted
  • Restoration of a full backup of the latest data if a volume is lost

You can use snapshots to quickly restore lost data. Snapshots are instant and only consume space for modified data. Because data changes over time, snapshots usually consume more space the older they get. The number of snapshots taken plays a minor role in environments with normal rates of change. For most datasets, adding 20% additional capacity is enough to keep snapshots for four weeks. Users generally restore the latest data. The older the data becomes, the less likely it is to be used for restorations.

The following is a typical snapshot schedule:

  • 48 hourly snapshots
  • 30 daily snapshots
  • Optional weekly snapshots

Hourly snapshots satisfy a recovery point objective (RPO) of one hour, which is a substantial improvement over the common target of 24 hours. Because snapshots are read-only accessible through the client, users can restore data immediately, which provides a substantial recovery time objective (RTO) improvement.

Because snapshots exist within the volume, they don't offer complete protection from lost volumes. To create the second data copy in a different location, use volume replication for volumes of the CVS-Performance service type and use the integrated CVS backup feature for volumes of the CVS service type. Both features employ snapshots internally to transfer only changed data on an ongoing basis. Alternatively, you can use third-party file-based (NFS or SMB) backup software.

Data versioning

Some workloads, such as machine learning, require keeping multiple versions or generations of the same dataset accessible. Instead of keeping multiple copies of data generations, you can use snapshots to do the versioning. This capability helps to save capacity because only changed data consumes extra capacity. Unchanged data is re-used for every version.

Application and data upgrades

Upgrading applications can be risky. If the upgrade has flaws, it may leave you with a broken app and modified data with no way back (for example, in the case of a database schema upgrade). Taking an instant snapshot before starting an upgrade is a way to back up the current data. If the upgrade succeeds, you can delete the snapshot. If the upgrade fails, you can use the snapshot to quickly recover individual files from before the upgrade, or revert the whole volume to the state before the upgrade.

Ransomware protection

Ransomware encrypting all of your data is a serious threat. Snapshots can help to defend against losing data from a ransomware attack. Although snapshots cannot protect against malicious software, they can help to defend against encryption of your data. Snapshots are read-only and cannot be encrypted. Snapshots can be accessed instantly and used to restore encrypted files. If a large number of files is affected, you can use a snapshot to revert a whole volume to an older state in a few seconds. Another option is create a volume clone from a snapshot to start working on the older version of your data again, while maintaining access to the latest (but potentially encrypted) version of your data for further investigation. Whatever you do, snapshots can make all of your data usable within minutes.