This page provides an overview of Cloud Bigtable backups.
Bigtable backups let you save a copy of a table's schema and data, then restore from the backup to a new table at a later time. Before you read this page, you should be familiar with the Overview of Bigtable and Managing tables.
What backups are for
Backups can help you recover from application-level data corruption or from operator errors such as accidentally deleting a table. You can restore from a backup to a new table in either the instance the backup was created in or a different instance.
What backups are not for
Backups are not intended for protection against regional or zonal failures. Use replication if you need the ability to fail over to different regions or zones.
Backups are not readable, so they are not useful for offline analytics.
- Fully integrated: Backups are handled entirely by the Bigtable service, with no need to import or export.
- Cost effective: Using Bigtable backups lets you avoid the costs associated with exporting, storing, and importing data using other services.
- Automatic expiration: Each backup has a user-defined expiration date that can be up to 30 days after the backup is created.
- Flexible restore options: You can restore from a backup to a table in a different instance from where the backup was created.
Reasons why you might restore to a different instance include the following:
You want to query or run tests against a table that has been restored from a backup, but you don't want your testing to affect the instance where the backup was created.
You want to restore to an instance that has different access settings than the source instance. For instance, you can grant developers access to perform testing, debugging, or development using a table created from a backup of a production table, while you maintain restricted access to the production table.
You want to move data to a different region but don't want to change the replication configuration on the instance that contains the backup. You can restore the backup to an instance that has a cluster in the region where you want your data.
You want to copy some data from a table restored from a backup and write it to the source table. For instance, you can restore to a different instance, then write an application using a Bigtable client library or Dataflow that reads from the new table and writes the data back to the source table. This might be helpful in a case where only some data has become corrupt or you have another reason to only need to restore part of a table.
You want a copy of your data on a lower-cost instance than the one you use in production. For example, let's say you have a 700 TB production table in an instance that has three 300-node SSD clusters (300 nodes X 2.5 TB storage per node). If you don't need replication or low latency for the copy, you can restore a new table from the backup on a single-cluster HDD instance that has 88 nodes (700 TB ÷ 8 TB storage per node). In contrast, if you restore a copy of this 700 TB table to the same instance that the source table is in, you need to scale up to 1,800 nodes to accommodate the copy, increasing the cost of the production instance.
You want to switch the storage type that your data is stored on. You can use backups to move your data from SSD to HDD storage or the other way around. You can accomplish this by creating a backup of the table that you want to move, then restoring it to an instance that uses the storage type that you want.
Working with backups
See Managing backups for step-by-step instructions on backing up and restoring a table, as well as operations such as updating and deleting backups.
Use the following to work with Bigtable backups:
- The Cloud Console
- Bigtable client libraries
You can also access the API directly, but we strongly recommend that you do so only if you cannot use a Bigtable client library that makes backup calls to the API.
How backups work
A table backup is a cluster-level resource. Even if a table is in an instance with multiple clusters (meaning the cluster is using replication), a backup is created and stored on only one cluster in that instance.
A backup of a table includes all the data that was in the table at the time the backup was created, on the cluster where the backup is created. A backup is never larger than the size of the source table at the time the backup is created. You can create up to 50 backups per table per cluster.
You can delete a table that has a backup. To protect your backups, you cannot delete a cluster that contains a backup, and you cannot delete an instance that has one or more backups in any cluster.
A backup still exists after it is restored to a new table. You can delete it or let it expire when you no longer need it. Backup storage does not count toward a project's node storage limit.
Data in backups is encrypted and stored using a proprietary format.
There is no charge to create or restore a backup.
To store a backup, you are charged the standard backup storage rate for the region that the cluster containing the backup is in.
A backup is a complete logical copy of a table. Behind the scenes, Bigtable optimizes backup storage utilization, which means that a backup shares physical storage with the original table or other backups of the table whenever possible. Because of Bigtable's built-in storage optimizations, the cost of the backup might sometimes be less than the cost of a full physical copy of the table backup.
If you restore a table in an instance that uses replication, you are charged a one-time replication cost for the data to be copied to all clusters in the instance.
If you restore to a different instance than where the backup was created, and the backup's instance and the destination instance do not have at least one cluster in the same region, you are charged a one-time cost for the initial data copy to the destination cluster at the standard network rates.
When you create a backup in an instance that is protected by a customer-managed encryption key (CMEK), the backup is pinned to the primary version of the table's CMEK at the time it is taken. Once the backup is created, its key and key version cannot be modified, even if the KMS key is rotated.
When you restore a table from a backup, the key version that the backup is pinned to must be enabled for the backup decryption process to succeed. The new table is protected with the latest primary version of the destination instance's CMEK key. This means that if you want to restore from a CMEK-protected backup to a different instance, the destination instance must use the same CMEK key as the source instance.
This section describes additional concepts to understand when backing up and restoring a table in an instance that uses replication.
When you take a backup of a table in a replicated instance, you choose the cluster where you want to create and store the backup. There's no need to stop writing to the cluster that contains the backup, but you should be aware of how replicated writes to the cluster are handled.
A backup is a copy of the table in its state on the cluster where the backup is stored, at the time the backup is created. Table data that has not yet been replicated from another cluster in the instance is not included in the backup.
Each backup has a start and end time. Writes that are sent to the cluster shortly before or during the backup operation might not be included in the backup. Two factors contribute to this uncertainty:
- A write might be sent to a section of the table that the backup has already copied.
- A write to another cluster might not have replicated to the cluster that contains the backup.
In other words, there's a chance that some writes with a timestamp before the time of the backup might not be included in the backup. If this is unacceptable for your business requirements, you can use a consistency token with your write requests to ensure that all replicated writes are included in a backup.
When you restore a backup to a new table, replication to and from the other clusters in the instance starts immediately after the restore operation has completed on the cluster where the backup was stored.
Creating a backup usually takes less than a minute, although it can take up to one hour. Under normal circumstances, backup creation does not affect serving performance.
For optimal performance, do not create a backup of a single table more than once every five minutes. Creating backups more frequently can potentially lead to an observable increase in serving latency.
Restoring a backup to a table in a single-cluster instance takes a few minutes. In multi-cluster instances, restoration takes longer because the data has to be copied to all the clusters. Bigtable always chooses the most efficient route to copy data.
If you restore to a different instance from where the backup was created, the restore operation takes longer than if you restore to the same instance. This is especially true if the destination instance does not have a cluster in the same zone as the cluster where the backup was created.
A bigger table takes longer to restore than a smaller table.
If you store your tables in SSD clusters, you may initially experience higher read latency, even after a restore is complete, while the table is optimized. You can check the status at any time during the restore operation to see if optimization is still in process.
If you restore to a different instance from where the backup was created, the destination instance can use HDD or SSD storage. It does not need to use the same storage type as the source instance.
IAM permissions control access to backup and restore operations. Backup permissions are at the instance level and apply to all backups in the instance.
The account that you use to create a backup of a table must have permission to read the table and create backups in the instance that the table is in (the source instance).
The account that you use to restore a new table from a backup must have permission to create a table in the instance that you are restoring to.
|Action||Required IAM permission|
|Create a backup||bigtable.tables.readRows, bigtable.backups.create|
|Get a backup||bigtable.backups.get|
|Delete a backup||bigtable.backups.delete|
|Update a backup||bigtable.backups.update|
|Restore a backup to a table||bigtable.tables.create, bigtable.backups.restore|
|Get an operation||bigtable.instances.get|
- Don't back up a table more frequently than once every five minutes.
- When you back up a table that uses replication, choose the cluster to
store the backup after considering the following factors:
- Cost. One cluster in your instance may be in a lower-cost region than the others.
- Proximity to your application server. You might want to store the backup as close to your serving application as possible.
- Storage utilization. You need enough storage space to keep your backup as it grows in size. Depending on your workload, you could have clusters of different sizes or with different disk usage. This may factor into which cluster you choose.
- If you need to ensure that all replicated writes are included in a backup when you back up a table in an instance that uses replication, use a consistency token with your write requests.
- Plan ahead what you will name the new table if you need to restore from a backup. The key point is to be prepared ahead of time so that you don't have to decide when you're dealing with a problem.
- If you are restoring a table for a reason other than accidental deletion, make sure all reads and writes are going to the new table before you delete the original table.
- If you plan to restore to a different instance, create the destination instance before you initiate the backup restore operation.
Quotas and limits
Backup and restore requests and backup storage are subject to Bigtable quotas and limits.
The following limitations apply to Bigtable backups:
- You cannot read directly from a backup.
- You cannot restore from a backup to an existing table.
- You can only restore to an instance that already exists. Bigtable does not create a new instance when restoring from a backup. If the destination instance specified in a restore request does not exist, the restore operation fails.
- If you restore from a backup to a table in an SSD cluster and then delete the newly restored table, the table deletion might take a while to complete because Bigtable waits for table optimization to finish.
- Backups are zonal and share the same availability guarantees as the cluster where the backup is created. Backups do not protect against regional outages.
- A backup is a version of a table in a single cluster at a specific time. Backups do not represent a consistent state. The same also applies to backups of the same table in different clusters.
- You cannot back up more than one table in a single operation.
- You cannot export, copy, or move a Bigtable backup to another service, such as Cloud Storage.