Availability and durability

This page provides information about the BigQuery disaster resilience for datasets and the disaster recovery system. Note that this information describes the system as designed and does not provide any guarantees.

Failure domains

The following are the types of failure domains for failures that could occur in Google Cloud data centers.

Machine-level: Failures impacting a single or few, but not all, machines within a Google Cloud zone. An example of a machine-level failure is hardware failure for a single machine.

Zonal: Failures that render a single Google Cloud zone unavailable while other zones in the same Google Cloud region are still available. Google Cloud zones have different failure domains but multiple zones can be co-located in the same geographical location. Examples are a building fire, power outage, cut fiber-optic cable, and network partitions.

Regional: Failures affecting an entire Google Cloud region that consists of multiple zones. Examples are hurricanes and large-scale earthquakes.

Types of failures

There are two types of failures, soft failures and hard failures.

Soft failure is an operational deficiency where hardware is not destroyed. Examples include power failure, network partition, or a machine crash. In general, BigQuery should never lose data for a soft failure, even if the failure damages some hardware.

Hard failure is an operational deficiency where hardware is destroyed. Hard failures are more severe than soft failures. Hard failure examples include damage from floods, terrorist attacks, earthquakes, and hurricanes.

Availability and durability for single regions

A region is a specific geographical location, such as Iowa (us-central1) or Montréal (northamerica-northeast1), where you can host your data.

In a single region, data is stored only in the region. There is no Google Cloud–provided backup or replication to another region. If you want to use a single region for your datasets but consider the lack of backup or replication too risky, you can create cross-region dataset copies to enhance your disaster recovery guarantees.

Independent of resilience, to learn more about BigQuery single regions, see Location considerations.

In the event of a machine-level failure, BigQuery will continue running with no more than a few milliseconds delay. All queries should still succeed.

In the event of a zonal failure, there may be some data loss depending on the type of failure. If a hard failure destroys the zone, any unreplicated data may be lost. Usually data is replicated in around 90 seconds but it can take up to 1 hour. A soft failure, such as from a power outage, destroyed transformer, or network partition, would not be expected to cause any data loss. Soft zonal failover is a well-tested path.

If a hard regional failure occurs, for example, if a disaster destroys the region, all data in that region will be lost. A soft regional failure will result in loss of availability until the region is brought back online, but it will not result in lost data.

Availability and durability for multi-regions

A multi-region is a large geographic area, such as the United States (US) or Europe (EU), that contains two or more geographic places. In a multi-region, data is stored in a single region but is backed up in a geographically-separated region to provide resilience to a regional disaster. The recovery and failover process is managed by BigQuery.

Independent of resilience, to learn more about BigQuery multi-regions, see Location considerations.

In the event of a machine-level failure, BigQuery will continue running with no more than a few milliseconds delay. All queries should still succeed.

In the event of a zonal failure, there may be some data loss depending on the type of failure. If a hard failure destroys the zone, any unreplicated data may be lost. Usually data is replicated in around 90 seconds but it can take up to 1 hour. A soft failure, such as from a power outage, destroyed transformer, or network partition, would not be expected to cause any data loss.

If a hard regional failure occurs, for example, if a disaster destroyed a region, recent data may be lost. Specifically, this would be any data that has not yet been backed up offsite to a different region. Offsite data backups may be up to 48 hours stale. Additionally, it may take some time to rebuild the data from the backup. When backups are used for disaster recovery, data is recovered in priority order, with data for platinum customers recovered at highest priority. The recovery time may be 7-30 days.

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.