Replication

This page describes how data is replicated in Spanner, the different types of Spanner replicas and their roles in reads and writes, and the benefits of replication.

Overview

Spanner automatically replicates at the byte level. As described in Life of Spanner Reads and Writes, it takes advantage of this capability in the underlying file system that it's built on. Spanner writes database mutations to files in this file system, and the file system takes care of replicating and recovering the files when a machine or disk fails.

Even though the underlying distributed file system that Spanner is built on already provides byte-level replication, Spanner also replicates data to provide the additional benefits of data availability and geographic locality. At a high level, all data in Spanner is organized into rows. Spanner creates multiple copies, or replicas, of these rows, then stores these replicas in different geographic areas. Spanner uses a synchronous, Paxos-based replication scheme, in which voting replicas take a vote on every write request before the write is committed. This property of globally synchronous replication lets you read the most up-to-date data from any Spanner read-write or read-only replica.

Spanner creates replicas of each database split. A split holds a range of contiguous rows, where the rows are ordered by primary key. All of the data in a split is physically stored together in the replica, and Spanner serves each replica out of an independent failure zone. For more information, see the Schemas overview.

A set of splits is stored and replicated using Paxos. Within each Paxos replica set, one replica is elected to act as the leader. Leader replicas handle writes, while read-write or read-only replicas can serve a read request without communicating with the leader. If a strong read is requested, the leader typically is consulted to ensure that the read-only replica has received all recent mutations. To monitor the rate of change and amount of data that is replicated from your leader replica to the cross region replicas in your instance configuration, see Monitor data replication.

Benefits of Spanner replication

The benefits of Spanner replication include:

  • Data availability: Having more copies of your data makes the data more available to clients that want to read it. Also, Spanner can still serve writes even if some of the replicas are unavailable, because only a majority of voting replicas are required in order to commit a write.

  • Geographic locality: Having the ability to place data across different regions and continents with Spanner means data can be geographically closer, and hence faster to access, to the users and services that need it.

  • Single database experience: Spanner can deliver a single database experience because of its synchronous replication and global strong consistency.

  • Easier application development: Because Spanner is ACID-compliant and offers global strong consistency, developers working with Spanner don't have to add extra logic in their applications to deal with eventual consistency, making application development and subsequent maintenance faster and easier.

Replica types

Spanner has three types of replicas: read-write replicas, read-only replicas, and witness replicas. The regions and replication topologies that form base instance configurations are fixed:

You can create custom instance configurations and add additional read-only replicas for regional and multi-region instance configurations.

The following table summarizes the types of Spanner replicas and their properties:

Replica type Can vote Can become leader Can serve reads Can configure replica manually
Read-write yes yes yes no
Read-only no no yes yes*
Witness yes no no no

* For more information, see how to create an instance with a custom instance configuration.

Read-write replicas

Read-write replicas support both reads and writes. These replicas:

  • Maintain a full copy of your data.
  • Serve reads.
  • Can vote whether to commit a write.
  • Participate in leadership election.
  • Are eligible to become a leader.
  • Are the only replica type used in regional instances.

Read-only replicas

Read-only replicas only support reads, but not writes. These replicas don't vote for leaders or for committing writes, so they allow you to scale your read capacity without increasing the quorum size needed for writes. Read-only replicas:

  • Maintain a full copy of your data, which is replicated from the leader read-write replica.
  • Don't participate in voting to commit writes. Hence, the location of the read-only replicas never contributes to write latency.
  • Aren't eligible to become a leader.
  • Serve reads.
  • If it is the nearest replica to your application, the read-only replica can usually serve stale reads without needing a round trip to the leader region, assuming staleness is at least 15 seconds. You can also use directed reads to route read-only transactions and single reads to a specific replica type or a region in a multi-region instance configuration. For more information, see Directed reads.

    Strong reads might require a round trip to the leader replica. The round trip is just for negotiating the timestamp, not shipping the actual data from the leader. The timestamp negotiation is a CPU efficient operation at the leader, and typically the data is already en-route. This communication is handled automatically by the system.

    For more information about stale and strong reads, see the In reads section.

Optional read-only replicas

You can create a custom regional or multi-region instance configuration and add optional read-only replicas to scale reads and support low latency stale reads. You can add locations listed under Optional Region as an optional read-only replica. If you don't see your chosen read-only replica location, you can request a new optional read-only replica region.

All optional read-only replicas are subject to compute capacity, storage, and replication costs. Furthermore, adding read-only replicas to a custom instance configuration doesn't change the Spanner SLAs of the instance configuration. If you choose to add a read-only replica to a continent that's in a different continent than the leader region, we recommend adding a minimum of two read-only replicas. This helps maintain low read latency in the event that one of the read-only replicas becomes unavailable.

As a best practice, test performance workloads in non-production instances in the custom instance configuration first. You can refer to the Inter-Region Latency and Throughput benchmark dashboard for median inter-region latency data. For example, if you create a custom instance configuration with the eur6 multi-region base configuration and an optional read-only replica in us-east1, the expected strong read latency for a client in us-east1 is about 100 milliseconds due to the round trip time to the leader region in europe-west4. Stale reads with sufficient staleness don't incur the round trip and are therefore much faster. You can also use the lock insights and transaction insights to identify transactions that lead to high latencies.

For instructions on how to add optional read-only replicas, see Create a custom instance configuration.

Witness replicas

Witness replicas don't support reads but do participate in voting to commit writes. These replicas make it easier to achieve quorums for writes without the storage and compute resources that are required by read-write replicas to store a full copy of data and serve reads. Witness replicas:

  • Are used in dual-region and multi-region instances.
  • Don't maintain a full copy of data.
  • Don't serve reads.
  • Vote whether to commit writes.
  • Participate in leader election but aren't eligible to become a leader replica.

The role of replicas in writes and reads

This section describes the role of replicas in Spanner writes and reads, which is helpful in understanding why Spanner uses witness replicas in dual-region and multi-region configurations.

In writes

Client write requests are always processed at the leader replica first, even if there is a non-leader replica that's closer to the client, or if the leader replica is geographically distant from the client. If you use a dual-region or multi-region instance configuration and your client application is located in a non-leader region, Spanner uses leader-aware routing to route read-write transactions dynamically to reduce latency in your database. For more information, see Leader-aware routing.

The leader replica logs the incoming write, and forwards it, in parallel, to the other replicas that are eligible to vote on that write. Each eligible replica completes its write, and then responds back to the leader with a vote on whether the write should be committed. The write is committed when a majority of voting replicas (or write quorum) agree to commit the write. In the background, all remaining (non-witness) replicas log the write. If a read-write or read-only replica falls behind on logging writes, it can request the missing data from another replica to have a full, up-to-date copy of the data.

In reads

Client read requests might be executed at or require communicating with the leader replica, depending on the concurrency mode of the read request.

  • Reads that are part of a read-write transaction are served from the leader replica, because the leader replica maintains the locks required to enforce serializability.

  • Single read methods (a read outside the context of a transaction) and reads in read-only transactions might require communicating with the leader, depending on the concurrency mode of the read. For more information about concurrency modes, see Read types.

    • Strong read requests can go to any read-write or read-only replica. If the request goes to a non-leader replica, that replica must communicate with the leader in order to execute the read.

    • Stale read requests go to the closest available read-only or read-write replica that's caught up to the timestamp of the request. This can be the leader replica if the leader is the closest replica to the client that issued the read request.

Monitor data replication

You can monitor the rate of change and the amount of data that's replicated from your leader replica to the cross region replicas in your instance configuration. The rate of change is in bytes per second and the amount of data is in bytes. To do this, use the Cross region replicated bytes (instance/cross_region_replicated_bytes_count) monitoring metric.

To view this metric in the Google Cloud console, follow these steps:

  1. In the Google Cloud console, go to Monitoring:

    Go to Monitoring

  2. In the navigation menu, select Metrics explorer.

  3. In the Metric field, click the Select a metric drop-down.

  4. In the Filter by resource or metric name field, select Cloud Spanner Instance > Instance > Cross region replicated bytes, and then click Apply.

    This metric is available only under Active metrics if there is cross region replication activity in your instance. Otherwise, it appears under Inactive metrics. By default, the UI filters and shows only Active metrics. Clear the Active checkmark to view both active and inactive metrics.

    The chart shows the rate of change (in bytes per second) of replicated data across all Spanner instances within the specified time range.

  5. Optional: To show the amount of data (in bytes) that's replicated instead of the rate of change:

    1. In the Aggregation field, click the Sum drop-down, and select Configure aligner.
    2. In the Alignment function field, click the Rate drop-down, and select Delta.
    3. Select Table or Both as the table type instead of Chart.

      The table shows the amount of data (in bytes) that was replicated within the specified time range.

  6. Optional: To view usage for a particular instance or attribute:

    1. Use the Filter field to add filters, such as an instance ID, database ID, source region, destination region, or a tag.
    2. Click Add filter to add multiple filters.

To see a full list of Google Cloud metrics, see Google Cloud metrics.

What's next