Availability and durability

This page provides information about BigQuery disaster resilience for datasets and the disaster recovery system.

Failure domains

The following are the types of failure domains for failures that could occur in Google Cloud data centers.

Machine-level: Failures impacting a single or few, but not all, machines within a Google Cloud zone. An example of a machine-level failure is hardware failure for a single machine.

Zonal: Failures that render a single Google Cloud zone unavailable while other zones in the same Google Cloud region are still available. Google Cloud zones have different failure domains but multiple zones can be colocated in the same geographical location. Examples are a building fire, power outage, cut fiber-optic cable, and network partitions.

Regional: Failures affecting an entire Google Cloud region that consists of multiple zones. Examples are hurricanes and large-scale earthquakes.

Types of failures

There are two types of failures, soft failures and hard failures.

Soft failure is an operational deficiency where hardware is not destroyed. Examples include power failure, network partition, or a machine crash. In general, BigQuery should never lose data for a soft failure, even if the failure damages some hardware.

Hard failure is an operational deficiency where hardware is destroyed. Hard failures are more severe than soft failures. Hard failure examples include damage from floods, terrorist attacks, earthquakes, and hurricanes.

Availability and durability

When you create a BigQuery dataset, you select a location in which to store your data. This location is one of the following:

  • A region: a specific geographical location, such as Iowa (us-central1) or Montréal (northamerica-northeast1).
  • A multi-region: a large geographic area that contains two or more geographic places, such as the United States (US) or Europe (EU).

In either case, BigQuery automatically stores copies of your data in two different Google Cloud zones within the selected location.

In addition to storage redundancy, BigQuery also maintains redundant compute capacity across multiple zones. By combining redundant storage and compute across multiple availability zones, BigQuery provides both high availability and durability.

In the event of a machine-level failure, BigQuery continues to run with no more than a few milliseconds of delay. All currently running queries continue processing. In the event of either a soft or hard zonal failure, no data loss is expected. However, currently running queries might fail and need to be resubmitted. A soft zonal failure, such as resulting from a power outage, destroyed transformer, or network partition, is a well-tested path and is automatically mitigated within a few minutes.

A soft regional failure, such as a region-wide loss of network connectivity, results in loss of availability until the region is brought back online, but it doesn't result in lost data. A hard regional failure, for example, if a disaster destroys the entire region, could result in loss of data stored in that region. BigQuery does not automatically provide a backup or replica of your data in another geographic region. You can create cross-region dataset copies to enhance your disaster recovery strategy.

To learn more about BigQuery locations, see Location considerations.

Dataset security

To control access to datasets in BigQuery, see Controlling access to datasets. For information about data encryption, see Encryption at rest.