Data availability and durability

This page discusses concepts related to data availability and durability in Cloud Storage, including how Cloud Storage redundantly stores data, the default replication behavior for dual-regions and multi-regions, and the turbo replication feature for dual-regions.

Key concepts

  • Cloud Storage is designed for 99.999999999% (11 9's) annual durability.

    • To achieve this, Cloud Storage uses erasure coding and stores data pieces redundantly across multiple devices located in multiple availability zones.

    • Cloud Storage redundantly stores objects that are written to it in at least two different availability zones before considering the write to be successful.

    • Checksums are stored and regularly revalidated to proactively verify the integrity of all data at rest as well as to detect corruption of data in transit. If required, corrections are automatically made using redundant data.

  • The monthly availability of data stored in Cloud Storage depends on the storage class of the data and the location type of the bucket. For more information, see available storage classes.

  • Objects stored in a dual-region or multi-region bucket are stored redundantly in at least two separate geographic places.

    • For dual-regions, you select the specific regions in which your objects are stored.

    • For multi-regions, the specific data centers used for storing your data are determined by Cloud Storage as needed, but are located within the geographic boundary of the multi-region and are separated by at least 100 miles. This provides redundancy across regions at a lower storage cost than dual-regions.

    • In the unlikely event of a region-wide outage, such as one caused by a natural disaster, dual-region and multi-region buckets remain available, with no need to change storage paths.

  • Objects stored in dual-region and multi-region buckets are typically replicated across geographic places using default replication.

    • If one of the places an object is stored becomes unavailable after the object is successfully uploaded but prior to it being replicated to the second location, Cloud Storage's strong consistency ensures that stale versions of the object won't be served and that subsequent overwrites aren't reverted when the region becomes available again.

    • Objects stored in dual-regions can optionally use turbo replication to achieve a faster, more predictable replication across regions.

  • To achieve redundancy between a region pairing not available as a dual-region, consider creating a separate bucket in each region and using Storage Transfer Service Event-driven transfers to keep the buckets in sync.

Redundancy across regions

While traditional storage models often rely on an active-passive approach with "primary" and "secondary" geographic locations, Cloud Storage provides an active-active architecture based on a single bucket with redundancy across regions. This simplifies the disaster recovery process by eliminating the need for users to replicate data from one bucket to another or manually failover to a secondary bucket in the case of primary region downtime.

Cloud Storage always understands the current state of a bucket and transparently serves objects from an available region as required. As a result, dual-region and multi-region buckets are designed to have a recovery time objective (RTO) of zero, and temporary regional failures are normally invisible to users; in the case of a regional outage, dual-region and multi-region buckets automatically continue serving all data that has been replicated across regions.

However, redundancy across regions occurs asynchronously, and any data that does not finish replicating across regions prior to a region becoming unavailable is inaccessible until the downed region comes back online. Data could potentially be lost in the very unlikely case of physical destruction of the region.

Default replication in Cloud Storage is designed to provide redundancy across regions for 99.9% of newly written objects within a target of one hour and 100% of newly written objects within a target of 12 hours. Newly written objects include uploads, rewrites, copies, and compositions.

Turbo replication

Turbo replication provides faster redundancy across regions for data in your dual-region buckets, which reduces the risk of data loss exposure and helps support uninterrupted service following a regional outage.

  • When enabled, turbo replication is designed to replicate 100% of newly written objects to both regions that constitute the dual-region within the recovery point objective of 15 minutes, regardless of object size.

Note that even for default replication, most objects finish replication within minutes.

While redundancy across regions and turbo replication help support business continuity and disaster recovery (BCDR) efforts, administrators should plan and implement a full BCDR architecture that's appropriate for their workload.

For more information, see the Step-by-step guide to designing disaster recovery for applications in Google Cloud.

Limitations

  • Turbo replication is only available for buckets in dual-regions.

  • Turbo replication cannot be managed through the XML API, including creating a new bucket with turbo replication enabled.

  • When turbo replication is enabled on a bucket, it can take up to 10 seconds before it begins to apply to newly written objects.

  • Object writes that began prior to enabling turbo replication on a bucket replicate across regions at the default replication rate.

    • Object composition that uses any source objects written using default replication in the last 12 hours creates a composite object that also uses default replication.

Performance monitoring

Cloud Storage monitors the oldest unreplicated objects. If an object remains unreplicated for longer than its RPO (Recovery Point Objective) time, it's considered to be out of RPO. Each minute in which one or more objects are out of RPO is counted as a "bad" minute.

For example, if one object yielded 20 bad minutes from 9:00-9:20 AM, and another object yielded 10 bad minutes from 9:15-9:25 AM, then there are two objects for the month that are out of RPO. The total number of bad minutes for the month is 25 minutes, because from 9:00 AM to 9:25 AM there was at least one object that was missing its RPO.

  • For buckets using turbo replication, the RPO for objects is 15 minutes.

Within the Google Cloud console, the Number of minutes missing RPO graph lets you monitor the bad minutes during the past 30 days for your bucket. This service level indicator can be used to monitor your bucket's Monthly Replication Time Conformance. Similarly, the graph Object replications with turbo tracks object replications that occur within the RPO. This service level indicator can be used to monitor the bucket's Monthly Replication Volume Conformance. For more information, see Cloud Storage monitoring and Cloud Storage SLA.

What's next