About backup and restore

Overview

Cloud Spanner backup and restore features let you create backups of Cloud Spanner databases on demand, and restore them to provide protection against operator and application errors, which can result in logical data corruption. Backups are highly available, encrypted, and can be retained for up to a year from the time they are created. If you need longer retention times, we recommend exporting your database.

You can perform backup and restore in the following ways:

For logical data corruption, Cloud Spanner also offers point-in-time recovery.

Key features

  • Data consistency: Backups are a transactionally and externally consistent copy of a Cloud Spanner database at the version_time of the backup.

  • Replication: Backups reside in the same instance as their source database and are replicated in the same geographic locations. For regional instances, a copy of the backup is stored in each of the three read-write zones. For multi-regional instances, a copy is stored in all zones that contain either a read-write or read-only replica.

  • Automatic expiration: All backups have a user-specified expiration date which determines when it will be automatically deleted. Cloud Spanner deletes expired backups asynchronously, so there can be a lag between when a backup is expired and when it's actually deleted.

Choose between backup and restore or import and export

Cloud Spanner Import and Export serve similar use cases as Backup and Restore. The following table describes similarities and differences between them to help you decide which one to use.

Backup and RestoreImport and Export
Data consistency Both backups and exported databases are transactionally and externally consistent.
Performance impact Backups have no impact on an instance's performance. Cloud Spanner performs backups using dedicated jobs that do not draw upon an instance's server resources. Export runs as a medium-priority task to minimize impact on database performance. For more information, see task priority.
Storage format Uses a proprietary, encrypted format designed for fast restore. Supports both CSV and Avro file formats.
Portability Backups reside in the same instance as their source database and cannot be moved.

You can restore a database to any instance in the project with the same instance configuration as the backup.
Exported databases reside in Google Cloud Storage and the data can be migrated to any system that supports CSV or Avro.
Retention Backups can be retained for up to 1 year. Exported databases are stored in Cloud Storage where, by default, they are retained until they are deleted. You can customize lifecycle and retention policies.
Billing Backups are billed to your Cloud Spanner project based on the storage used per unit time. For more details, see the Billing section. Billing for import and export is more complicated due to its use of Google Cloud Storage and Dataflow. For more information, see Database export and import pricing.
Restore time Restore happens in two operations: restore and optimize. The restore operation offers fast time-to-first-byte because the database directly mounts the backup without copying the data. After the restore operation completes, the database is ready for use, though read latency might be slightly higher while it is optimizing. For more information, see How restore works. Import is slower. You need to wait for all the data to be written into the database.

How backup works

Contents

Users can create a backup of any Cloud Spanner database. These backups are complete, in the sense that they contain all of the data in the database (including the schema and secondary indexes) at the version_time of the backup. Any modifications to the data or schema after the version_time will not be included in the backup.

Backups include all database options that are set with the ALTER DATABASE SET OPTIONS command, but do not include Identity and Access Management (IAM) policies.

Backups also include the schema of a database's change streams, but not any existing change records. Change stream data is meant to be streamed out and consumed near-simultaneously with the changes it describes. As such, Spanner excludes this data from backups.

Encryption

Cloud Spanner backups, like databases, are encrypted by either Google-managed or customer-managed (CMEK) encryption. By default, a backup uses the same encryption config as its database, but you can override this behavior by specifying a different encryption config when creating the backup. If the backup is CMEK-enabled, it is encrypted using the primary version of the KMS key at the time of backup creation. Once the backup is created, its key and key version cannot be modified, even if the KMS key is rotated. For more information, see create a CMEK-enabled backup.

Creation process

When you create a backup, you must specify a source database, a name for the backup resource, and an expiration date (up to 1 year from backup creation time). You can also optionally specify a version_time, which lets you backup your database at an earlier point in time. The version_time field is typically used to either synchronize the backups of multiple databases or recover data using point-in-time recovery. If version_time is not specified, then it is set to the create_time of the backup. The system creates a backup resource and a long-running backup operation to track the progress of the backup.

To ensure external consistency of the backup, Cloud Spanner pins the contents of the database at create time. This prevents the garbage collection system from removing the relevant data values for the duration of the backup operation. Then, every read/write and read-only zone in the instance begins copying the data in parallel. If any zone is temporarily unavailable, the backup is not complete until the zone comes back online and finishes. Backups are restorable as soon as the operation is done. For multi-region instances, all read/write and read-only zones in all regions must complete their backup replicas before the backup is marked as restorable.

Resource hierarchy

Backups are resources in Cloud Spanner. Each backup resource is organized under the same instance as its source database in the resource hierarchy and has a resource path in the form projects/<project>/instances/<instance>/backups/<backup>. A backup continues to exist even after its source database has been deleted, but cannot outlive its parent instance. To prevent accidental deletion of backups, you cannot delete a Cloud Spanner instance if there are backups. For users who want to delete the instance, we recommend restoring the backup and then exporting the restored database, before deleting the backup and the instance.

Backup time and performance

When performing a backup, Cloud Spanner creates a backup job to copy data directly from the database to backup storage, and sizes this job based on the size of the database. This backup job does not use CPU resources allocated to database's instance and so does not affect the instance's performance. Moreover, compute load on the database's instance does not affect the speed of the backup operation.

To track progress and completion of a backup operation, see Show backup progress.

If a backup is taking longer than usual when no other factors have changed, it might be due to a delay in scheduling the backup task in a zone. This can sometimes take up to 30 minutes. We recommend that you do not cancel and restart the backup, as it's likely you'll encounter the same scheduling delay with the new backup as well.

How restore works

When you restore a Cloud Spanner database, you must specify a source backup and a new target database. You cannot restore to an existing database. The new database must be in the same project as the backup and be in an instance with the same instance configuration as the backup. For example, if a backup is in an instance configured us-west3, it can be restored to any instance in the project that is also configured us-west3. The compute capacity of the instances does not need to be the same.

The restored database will have all the data and schema from the original database at the create_time of the backup, including all database options that are set with the ALTER DATABASE SET OPTIONS command, and all change stream configurations. It will not have any IAM permissions (except for those inherited from the instance containing the restored database), and you must apply appropriate IAM permissions after the restore completes. It will not include the internal data of any change streams.

The restore process is designed for high-availability. The database can be restored provided that the majority quorum of the regions and zones in the instance is available.

To restore a CMEK-enabled backup, both the key and key version must be available to Cloud Spanner. The restored database, by default, uses the same encryption config as the backup. You can override this behavior by specifying a different encryption config when restoring the database. For more information, see restore from a CMEK-enabled backup.

A restored database transitions through three states, tracked by two long-running operations.

  • CREATING: Cloud Spanner begins the restoration by creating a new database and mounting files from the backup. This typically takes ten minutes or less to complete. During this initial CREATING state, the restored database is not yet ready for use.

    To track the progress of this state, you can query the long-running restore operation that Cloud Spanner makes available during this process. It returns a RestoreDatabaseMetadata object.

    Please note the following caveats regarding the CREATING state:

    • If you are restoring to a different instance, the restore operation belongs to the instance containing the restored database, not the instance containing the backup.
    • Cloud Spanner will not allow you to delete the backup while it is being restored. You can delete it after the restore completes and the database enters the READY state.
    • An instance can have at most one database in the CREATING state due to a restoration from backup. You will not be able to restore another backup to the instance until the restored database transitions to the READY_OPTIMIZING or READY state, described below.
  • READY_OPTIMIZING: After Cloud Spanner mounts the backup, it starts to copy the backup's data into the new database while optimizing its stored size. Your database is ready for use during this process. Depending on the amount of data involved, this phase of the restore might take days to complete.

    While you can use your database as usual during READY_OPTIMIZING, the following caveats apply:

    • Read latencies might be slightly higher than usual.
    • Storage metrics display the size of the new database, not the backup. Therefore, with the data transfer still in progress, Cloud Spanner storage metrics might show results that do not reflect the total size of all your data.
    • As with the CREATING state, Cloud Spanner will not allow you to delete the mounted backup.

    Cloud Spanner makes another long-running restore operation available during this state, this time returning a OptimizeRestoredDatabaseMetadata metadata object.

  • READY: Once the copy-and-optimize operation completes, the database transitions to the READY state. The database is fully restored, and no longer references or requires the backup.

Billing

You are billed based on the amount of storage used by your backups per unit time. Billing begins once the backup operation is complete and will continue until the backup has been deleted. There is no charge for restoring from a backup.

Backups are stored and billed separately. Backup storage does not affect billing for database storage or database storage limits.

A completed backup is billed for a minimum of 24 hours. If you create a backup, then delete it a minute after it finishes, you are still billed for 24 hours.

For more complete information on backup costs, see the Cloud Spanner Pricing page.

Access control with IAM

IAM lets you control access to Cloud Spanner resources, which include backups and restored databases. If you are new to IAM, roles, and permissions, see IAM Overview for an introduction.

Backup resources are organized under instances in the Cloud Spanner resource hierarchy. We recommend applying IAM policies at the project level or instance level. If you need finer grain control, IAM policies can also be applied at the backup and database level as well, but this is not recommended due to complexity. Remember that backups do not contain database metadata such as IAM policies, so when you restore a database, the database will initially inherit policies from its parent instance.

This section describes the predefined roles that have access to backup and restore.

The following roles are designed specifically for backup and restore:

  • spanner.backupAdmin: has access to create, view, update, delete backups. This role can also view and manage a backup's IAM policy. This role cannot restore a database from a backup.
  • spanner.restoreAdmin: has access to restore databases from backups. If you need to restore a backup to a different instance, apply this role at the project level or to both instances. This role cannot create backups.
  • spanner.backupWriter: has access to create backups, but cannot update, or delete them. This role is intended to be used by scripts that automate backup creation.

The following roles also have access to backup and restore:

  • spanner.admin: has full access to backup and restore. This role has complete access to all Cloud Spanner resources.
  • owner: has full access to backup and restore
  • editor: has full access to backup and restore
  • viewer: has access to view backups, backup operations, and restore operations. This role cannot create, update, delete, or restore a backup.

For more information, see Cloud Spanner IAM.