Best practices for persistent disk snapshots

You can create persistent disk snapshots at any time, but you can create snapshots more quickly and with greater reliability if you use the following best practices.

Before you begin

Preparing for consistent snapshots

If you create a snapshot of your persistent disk while your application is running, the snapshot might not capture pending writes that are in transit from memory to disk. Because of these inconsistencies, the snapshot might not reflect the exact state of your application at the time you captured the snapshot. In this scenario, the snapshot is considered crash consistent because it captures the state of the application as if the machine crashed at the time the snapshot was taken.

Optionally, you can pause the application, so that all application transactions complete and the system can flush all pending writes from memory to disk before the snapshot is captured. In this scenario, the snapshot is considered application consistent.

Creating crash consistent snapshots

When you take a snapshot of a persistent disk, you don't need to take any additional steps to make your snapshot crash consistent. In particular, you do not need to pause your workload.

If your workload cannot tolerate a temporary pause, consider the following process for creating crash consistent snapshots:

  1. Capture a snapshot while applications are running, assuming there will be some application data inconsistencies.
  2. Verify that you can restore your workload to an acceptable application state from the snapshot.
  3. Based on the previous step, either retain or delete the snapshot.

Crash-consistent snapshots will likely require replaying file system and application-level journals before use. Thus the quality of your snapshot depends on your application's ability to quickly recover from a crash-consistent state back to serving.

Creating application consistent snapshots

  • Windows Server users: For persistent disks that are attached to Windows Server instances, use VSS snapshots.
  • Linux users: To achieve application consistency for snapshots of disks attached to Linux instances, create pre and post snapshot shell scripts to prepare your system for application consistency. Then create a snapshot with the guest-flush option enabled. This runs the pre and post scripts before and after the snapshot is captured. For instructions, see Creating Linux application consistent snapshots.

Manually creating application consistent snapshots

In some scenarios, you might need to manually pause your applications to achieve application consistent snapshots.

For example, use this option if you require application consistency between multiple persistent disks. In this case, you must freeze all of the file systems on each disk and complete all of the snapshots for those disks before you resume your apps.

You do not need to stop your VM instances. The application pause can involve, for example, freezing and unmounting your file system. After you manally pause your applications, resume your workloads only after the snapshot resource reaches the UPLOADING status.

When you request a snapshot, check the status of the operation by calling the globalOperations.get method. The following table shows the relationship between the status of the snapshot operation and the status of the snapshot resource.

Operation status Snapshot resource status
PENDING No snapshot resource exists yet.

CREATING: Snapshot creation is not yet complete.
UPLOADING: Snapshot has been created but is not yet saved to Cloud Storage.

Snapshot frequency limits

Creating snapshots from persistent disks

You can snapshot your disks at most once every 10 minutes. If you want to issue a burst of requests to snapshot your disks, you can issue at most 6 requests in 60 minutes.

If the limit is exceeded, the operation fails and returns the following error:

"message": "Operation rate exceeded for resource 'projects/project-id/zones/zone-id/disks/disk-name'. Too frequent operations from the source resource."

This limit applies to the following operations:

This limit does not apply to the following operations:

As a best practice, take a snapshot of the disk once per hour. Avoid taking snapshots more often than that. The easiest way to achieve this is to set up a snapshot schedule.

Creating new zonal persistent disks from snapshots

You can create a new zonal persistent disk from a given snapshot per target zone at most once every ten minutes. The target zone refers to the storage location of the new persistent disk created from the snapshot. Google doesn't guarantee that you will be able to create disks from a snapshot at a rate faster than that, though you might be able to create disks more frequently if you haven't created any disks from the snapshot in the past hour.

Note that multiple snapshots of the same persistent disks are considered distinct snapshots with respect to this frequency limit.

If this limit is exceeded, the operation fails and returns the following error:

"message": "Operation rate exceeded for resource 'projects/project-id/zones/zone-id/disks/disk-name'. Too frequent operations from the source resource."

This limit applies to the following operations:

This limit does not apply to the following operations:

To create multiple disks from a snapshot, use the snapshot to create an image then create your disks from the image:

  1. Create an image from the snapshot.
  2. Create persistent disks from the image. In the Google Cloud Console, select Image as the disk Source type. With the gcloud tool, use the image flag. In the API, use the sourceImage parameter.

Use existing snapshots as a baseline for subsequent snapshots

If you have existing snapshots of a persistent disk, the system automatically uses them as a baseline for any subsequent snapshots that you create from that same disk.

  • Create a new snapshot from a persistent disk before you delete the previous snapshot from that same persistent disk. The system can create the new snapshot more quickly if it can use the previous snapshot and reads only the new or changed data from the persistent disk.
  • Wait for new snapshots to finish before you take subsequent snapshots from the same persistent disk. If you run two snapshots simultaneously on the same persistent disk, they both start from the same baseline and duplicate effort. If you wait for the new snapshot to finish, any subsequent snapshots run more quickly because they only obtain the data that has changed since the last snapshot finished.

Schedule snapshots during off-peak hours

If you schedule regular snapshots for your persistent disks, you can reduce the time that it takes to complete each snapshot by creating them during off-peak hours when possible.

  • Schedule automated snapshots during the business day in the zone where your persistent disk is located. Snapshot creation typically peaks at the end of the business day.
  • Schedule automated snapshots early in the morning in the zone where your persistent disk is located rather than immediately at midnight. Snapshot creation typically peaks at midnight.

Organize your data on separate persistent disks

If you create a snapshot of a persistent disk, any data that you store on the disk is included in the snapshot. Larger amounts of data create larger snapshots, which cost more and take longer to create. To ensure that you create a snapshot of only the data you need, organize your data on separate persistent disks.

  • Store critical data on a secondary persistent disk rather than your boot disk. This lets you create a snapshot of your boot disks only when necessary or on a less frequent schedule.
  • If you do create snapshots of your boot disks, store swap partitions, pagefiles, cache files, and non-critical logs on a separate persistent disk. These files and partitions change frequently, and the snapshot process is likely to identify them as changed data that must be included in an incremental snapshot.
  • Reduce the number of snapshots that you need to create by keeping similar data together on one persistent disk. Keep your operating system and volatile data separate from the data that you want to snapshot, but you don't need to distribute your critical data across multiple persistent disks like you would for a physical machine. One large persistent disk is able to achieve the same performance as multiple smaller persistent disks of the same total size.

Enable the discard option or run fstrim on your persistent disk

On Linux instances, if you didn't format and mount your persistent disk with the discard option, run the fstrim command on the instance before you create a snapshot. The command removes blocks that the file system no longer needs, so that the system can create the snapshot more quickly and with a smaller size. To learn how to configure the discard option on your persistent disks, see formatting and mounting a persistent disk.

Create an image of a frequently used snapshot

If you are repeatedly using a snapshot in the same zone to create a persistent disk, save networking costs by using the snapshot once and creating an image of that snapshot. Store this image and use it to create your disk and start a VM instance. For instructions, see Creating a custom image.

As a best practice, take a snapshot of the disk once per hour. Avoid taking snapshots more often than that. The easiest way to achieve this is to set up a snapshot schedule.

Other best practices

  • Use journaling file systems like ext4 to reduce the risk that data is cached without actually being written to the persistent disk.
  • Create a snapshot of your data on a regular schedule to minimize data loss due to unexpected failure.

What's next