Replication

This page describes how data is replicated in Spanner, the different types of Spanner replicas and their roles in reads and writes, and the benefits of replication.

Overview

Spanner automatically replicates at the byte level. As described in Life of Spanner Reads and Writes, it takes advantage of this capability in the underlying file system that it's built on. Spanner writes database mutations to files in this file system, and the file system takes care of replicating and recovering the files when a machine or disk fails.

Even though the underlying distributed file system that Spanner is built on already provides byte-level replication, Spanner also replicates data to provide the additional benefits of data availability and geographic locality. At a high level, all data in Spanner is organized into rows. Spanner creates multiple copies, or "replicas," of these rows, then stores these replicas in different geographic areas. Spanner uses a synchronous, Paxos-based replication scheme, in which voting replicas take a vote on every write request before the write is committed. This property of globally synchronous replication gives you the ability to read the most up-to-date data from any Spanner read-write or read-only replica.

Spanner creates replicas of each database split. A split holds a range of contiguous rows, where the rows are ordered by primary key. All of the data in a split is physically stored together in the replica, and Spanner serves each replica out of an independent failure zone. For more information, see About schemas.

A set of splits is stored and replicated using Paxos. Within each Paxos replica set, one replica is elected to act as the leader. Leader replicas are responsible for handling writes, while any read-write or read-only replica can serve a read request without communicating with the leader (though if a strong read is requested, the leader will typically be consulted to ensure that the read-only replica has received all recent mutations).

Benefits of Spanner replication

The benefits of Spanner replication include:

Data availability: Having more copies of your data makes the data more available to clients that want to read it. Also, Spanner can still serve writes even if some of the replicas are unavailable, because only a majority of voting replicas are required in order to commit a write.
Geographic locality: Having the ability to place data across different regions and continents with Spanner means data can be geographically closer, and hence faster to access, to the users and services that need it.
Single database experience: Spanner can deliver a single database experience because of its synchronous replication and global strong consistency.
Easier application development: Because Spanner is ACID-compliant and offers global strong consistency, developers working with Spanner don't have to add extra logic in their applications to deal with eventual consistency, making application development and subsequent maintenance faster and easier.

Replica types

Spanner has three types of replicas: read-write replicas, read-only replicas, and witness replicas. The regions and replication topologies that form base instance configurations are fixed. Base regional instance configurations only use read-write replicas, while base multi-region instance configurations use a combination of all three replica types. You can create custom instance configurations and add additional read-only replicas for both regional and multi-region instance configurations.

The following table summarizes the types of Spanner replicas and their properties:.

Replica type	Can vote	Can become leader	Can serve reads	Can configure replica manually
Read-write	yes	yes	yes	no
Read-only	no	no	yes	yes^*
Witness	yes	no	no	no

^* For more information, see how to create an instance with a custom instance configuration.

Read-write replicas

Read-write replicas support both reads and writes. These replicas:

Maintain a full copy of your data.
Serve reads.
Can vote whether to commit a write.
Participate in leadership election.
Are eligible to become a leader.
Are the only type used in single-region instances.

Read-only replicas

Read-only replicas only support reads, but not writes. These replicas don't vote for leaders or for committing writes, so they allow you to scale your read capacity without increasing the quorum size needed for writes. Read-only replicas:

Maintain a full copy of your data, which is replicated from read-write replicas.
Serve reads.
Don't participate in voting to commit writes. Hence, the location of the read-only replicas never contributes to write latency.
If it is the nearest replica to your application, the read-only replica can usually serve stale reads without needing a round trip to the default leader region, assuming staleness is at least 15 seconds. You can also use directed reads to route read-only transactions and single reads to a specific replica type or region in a multi-region instance configuration. For more information, see Directed reads.

Strong reads might require a round trip to the leader replica. The round trip is just for negotiating the timestamp, not shipping the actual data from the leader. The timestamp negotiation is a CPU efficient operation at the leader, and typically the data is already en-route. This communication is handled automatically by the system.

For more information about stale and strong reads, see the In reads section.
Are not eligible to become a leader.

You can create a custom regional or multi-region instance configuration and add optional read-only replicas to scale reads and support low latency stale reads. You can add locations listed under Optional Region as an optional read-only replica. If you don't see your chosen read-only replica location, you can request a new optional read-only replica region. Note that you cannot change the replication topology of base instance configurations, which are fixed.

All optional read-only replicas are subject to compute capacity and storage costs. Furthermore, adding read-only replicas to a custom instance configuration doesn't change the Spanner SLAs of the instance configuration. If you choose to add a read-only replica to a continent that is in a different continent than the leader region, we recommend adding a minimum of two read-only replicas. This helps maintain low read latency in the event one of the read-only replicas becomes unavailable.

When you add read-only replica(s), the leader replica experiences more replication load, which might affect performance. As a best practice, test performance workloads in non-production instances in the custom instance configuration first. You can refer to the Inter-Region Latency and Throughput benchmark dashboard for median inter-region latency data. For example, if you create a custom instance configuration with the eur6 multi-region base configuration and an optional read-only replica in us-east1, the expected strong read latency for a client in us-east1 is about 100 milliseconds due to the round trip time to the leader region in europe-west4. Stale reads with sufficient staleness don't incur the round trip and are therefore much faster. You can also use the Latency by transaction type metric to view latency data for read-write and read-only type transactions.

For instructions, see Create a custom instance configuration.

Witness replicas

Witness replicas don't support reads but do participate in voting to commit writes. These replicas make it easier to achieve quorums for writes without the storage and compute resources that are required by read-write replicas to store a full copy of data and serve reads. Witness replicas:

Are only used in multi-region instances.
Don't maintain a full copy of data.
Don't serve reads.
Vote whether to commit writes.
Participate in leader election but are not eligible to become leader.

The role of replicas in writes and reads

This section describes the role of replicas in Spanner writes and reads, which is helpful in understanding why Spanner uses witness replicas in multi-region configurations.

In writes

Client write requests are always processed at the leader replica first, even if there is a non-leader replica that's closer to the client, or if the leader replica is geographically distant from the client. If you use a multi-region instance configuration and your client application is located in a non-leader region, Spanner uses leader-aware routing to route read-write transactions dynamically to reduce latency in your database. For more information, see Leader-aware routing.

The leader replica logs the incoming write, and forwards it, in parallel, to the other replicas that are eligible to vote on that write. Each eligible replica completes its write, and then responds back to the leader with a vote on whether the write should be committed. The write is committed when a majority of voting replicas (or "write quorum") agree to commit the write. In the background, all remaining (non-witness) replicas log the write. If a read-write or read-only replica falls behind on logging writes, it can request the missing data from another replica to have a full, up-to-date copy of the data.

In reads

Client read requests might be executed at or require communicating with the leader replica, depending on the concurrency mode of the read request.

Reads that are part of a read-write transaction are served from the leader replica, because the leader replica maintains the locks required to enforce serializability.
Single read methods (a read outside the context of a transaction) and reads in read-only transactions might require communicating with the leader, depending on the concurrency mode of the read. (Learn more about these concurrency modes in Read types.)
- Strong read requests can go to any read-write or read-only replica. If the request goes to a non-leader replica, that replica must communicate with the leader in order to execute the read.
- Stale read requests go to the closest available read-only or read-write replica that is caught up to the timestamp of the request. This can be the leader replica if the leader is the closest replica to the client that issued the read request.

Aside: Why read-only and witness replicas?

Base multi-region configurations use a combination of read-write, read-only, and witness replicas, whereas base regional configurations use only read-write replicas. The reasons for this difference have to do with the varying roles of replicas in writes and reads. For writes, Spanner needs a majority of voting replicas to agree on a commit in order to commit a mutation. In other words, every write to a Spanner database requires communication between voting replicas. To minimize the latency of this communication, it is desirable to use the fewest number of voting replicas, and to place these replicas as close together as possible. That's why base regional configurations contain exactly three read-write replicas, each of which is in its own availability zone, contains a full copy of your data, and is able to vote. If one replica fails, the other two can still form a write quorum, and because replicas in this configuration are in the same geographic region, network latencies are minimal.

Base multi-region configurations contain more replicas by design, and these replicas are in different data centers (so that clients can read their data quickly from more locations). What characteristics should these additional replicas have? They could all be read-write replicas, but that would be undesirable because adding more read-write replicas to a configuration increases the size of the write quorum (which means potentially higher network latencies due to more replicas communicating with each other, especially if the replicas are in geographically distributed locations) and also increases the amount of storage needed (because read-write replicas contain a full copy of data). Instead of using more read-write replicas, base multi-region configurations contain two additional types of replicas that have fewer responsibilities than read-write replicas.

Read-only replicas don't vote for leaders or for committing writes, so they allow you to scale your read capacity without increasing the quorum size needed for writes.
Witness replicas vote for leaders and for committing writes, but don't store a full copy of the data, can't become the leader, and can't serve reads. They make it easier to achieve quorums for writes without the storage and compute resources that are required by read-write replicas to store a full copy of data and serve reads.