Availability and durability

This page provides information about the BigQuery disaster resilience for datasets and the disaster recovery system. Note that this information describes the system as designed and does not provide any guarantees.

Failure domains

The following are the types of failure domains for failures that could occur in Google Cloud data centers.

Machine-level: Failures impacting a single or few, but not all, machines within a Google Cloud zone. An example of a machine-level failure is hardware failure for a single machine.

Zonal: Failures that render a single Google Cloud zone unavailable while other zones in the same Google Cloud region are still available. Google Cloud zones have different failure domains but multiple zones can be co-located in the same geographical location. Examples are a building fire, power outage, cut fiber-optic cable, and network partitions.

Regional: Failures affecting an entire Google Cloud region that consists of multiple zones. Examples are hurricanes and large-scale earthquakes.

Types of failures

There are two types of failures, soft failures and hard failures.

Soft failure is an operational deficiency where hardware is not destroyed. Examples include power failure, network partition, or a machine crash. In general, BigQuery should never lose data for a soft failure, even if the failure damages some hardware.

Hard failure is an operational deficiency where hardware is destroyed. Hard failures are more severe than soft failures. Hard failure examples include damage from floods, terrorist attacks, earthquakes, and hurricanes.

Availability and durability for single regions

A region is a specific geographical location, such as Iowa (us-central1) or Montréal (northamerica-northeast1), where you can host your data.

In a single region, data is stored only in the region. There is no Google Cloud–provided backup or replication to another region. If you want to use a single region for your datasets but consider the lack of backup or replication too risky, you can create cross-region dataset copies to enhance your disaster recovery guarantees.

Independent of resilience, to learn more about BigQuery single regions, see Location considerations.

In the event of a machine-level failure, BigQuery will continue running with no more than a few milliseconds delay. All queries should still succeed.

In the event of a zonal failure, no data loss is expected. Soft zonal failure, such as resulting from a power outage, destroyed transformer, or network partition, is a well-tested path.

If a hard regional failure occurs, for example, if a disaster destroys the region, data in that region might be lost. A soft regional failure will result in loss of availability until the region is brought back online, but it will not result in lost data.

Availability and durability for multi-regions

A multi-region is a large geographic area, such as the United States (US) or Europe (EU), that contains two or more geographic places. In a multi-region, data is stored in a single region but is backed up in a geographically-separated region to provide resilience to a regional disaster. The recovery and failover process is managed by BigQuery.

Independent of resilience, to learn more about BigQuery multi-regions, see Location considerations.

In the event of a machine-level failure, BigQuery will continue running with no more than a few milliseconds delay. All queries should still succeed.

In the event of a zonal failure, no data loss is expected. Soft zonal failure, such as resulting from a power outage, destroyed transformer, or network partition, is a well-tested path.

If a hard regional failure occurs, for example, if a disaster destroyed a region, recent data might be lost. Specifically, this would be any data that has not yet been backed up offsite to a different region. Offsite data backups are typically up to an hour stale, although there is no SLA or guarantee. Additionally, it might take some time to rebuild the data from the backup.