About change streams

Stay organized with collections Save and categorize content based on your preferences.

A change stream watches and streams out a Cloud Spanner database's data changes—inserts, updates, and deletes—in near real-time.

This page offers a high-level overview of Spanner change streams: what they do, and how they work. To learn how to create and manage change streams in your database and connect them with other services, follow the links in What's Next.

Purpose of change streams

Change streams provide a flexible, scalable way to stream data changes to other services. Common use cases include:

  • Replicating Spanner data changes to a data warehouse, such as BigQuery, for analytics.

  • Triggering application logic based on data changes sent to a message queue, such as Pub/Sub.

  • Storing data changes in Cloud Storage, for compliance or archival purposes.

Change stream configuration

You can add change streams to any Google Standard SQL-dialect database.

Spanner treats change streams as schema objects, much like tables and indexes. As such, you create, modify, and delete change streams using DDL statements, and you can view a database's change streams just like other DDL-managed schema objects.

You can configure a change stream to watch data changes across an entire database, or limit its scope to specific tables and columns. A database can have multiple change streams, and a particular table or column can have multiple streams watching it, within limits.

Issuing the DDL that creates a change stream starts a long-running operation. When it completes, the new change stream immediately begins to watch the tables and columns assigned to it.

Implicitly watching tables and columns

Change streams that watch an entire table implicitly watch all the columns in that table, even when that table definition is updated. For example, when you add new columns to that table, the change stream automatically begins to watch those new columns, without requiring any modification to that change stream's configuration. Similarly, the change stream automatically stops watching any columns that are dropped from that table.

Whole-database change streams work the same way. They implicitly watch every column in every table, automatically watching any tables or columns added after the change stream's creation, and ceasing to watch any tables or columns dropped.

Explicitly watching tables and columns

If you configure a change stream to watch only particular columns in a table, and you later add columns to that table, the change stream will not begin to watch those columns unless you reconfigure that change stream to do so.

The database's schema treats change streams as dependent objects of any columns or tables that they explicitly watch. Before you can drop any such column or table, you must manually remove it from the configuration of any change stream explicitly watching it.

Types of data changes that change streams watch

The data changes that a change stream watches include all inserts, updates, and deletes made to the tables and columns that it watches. These changes can come from:

Change streams can watch data changes only in user-created columns and tables. They do not watch generated columns, indexes, views, other change streams, or system tables such as the information schema or statistics tables.

Furthermore, change streams do not watch schema changes or any data changes that directly result from schema changes. For example, a change stream watching a whole database would not treat dropping a table as a data change, even though this action removes all of that table's data from the database.

How Spanner writes and stores change streams

Every time Spanner detects a data change in a column watched by a change stream, it writes a data change record to its internal storage. It does so synchronously with that data change, within the same transaction. Spanner co-locates both of these writes so they are processed by the same server, minimizing write processing.

Content of a data change record

Every data change record written by a change stream includes the following information about the data change:

  • The name of the affected table

  • The names, values, and data types of the primary keys identifying the changed row

  • The names and data types of the row's modified columns

  • The old values of the row's modified columns (for updates and deletes)

  • The new values of the row's modified columns (for updates and inserts)

  • The modification type (insert, update, or delete)

  • The commit timestamp

  • The transaction ID

  • The record sequence number

  • The data change record's value capture type—always OLD_AND_NEW_VALUES

To see all of a row's values at the moment a change took place, perform a stale read of that row, based on the change record's commit timestamp. The change stream-to-BigQuery Dataflow template provides an example of including this behavior in a data processing pipeline.

For a deeper look at the structure of data change records, see Data change records.

Data retention

A change stream retains its data change records for a period of time between one and seven days. You can use DDL to specify a data-retention limit other than the one-day default when initially creating a change stream, or adjust it at any future time. Note that reducing a change stream's data retention limit will make all historical change data older than the new limit immediately and permanently unavailable to that change stream's readers.

This data retention period presents a trade-off; a longer retention period carries greater storage demands on the stream's database.

Reading change streams

Spanner offers two ways to read a change stream's data:

  • Through Dataflow, using the Apache Beam SpannerIO connector. This is our recommended solution for most change stream applications. Google also provides Dataflow templates for common use cases.

  • Directly, using the Spanner API. This trades away the abstraction and capabilities of Dataflow pipelines for maximum speed and flexibility.

Using Dataflow

Use the Apache Beam SpannerIO connector to build Dataflow pipelines that read from change streams. After you configure the connector with details about a particular change stream, it automatically outputs new data change records into a single, unbounded PCollection data set, ready for further processing by subsequent transforms in the Dataflow pipeline.

Google also provides templates that let you rapidly build Dataflow pipelines for common change stream use cases, including sending all of a stream's data changes to a BigQuery dataset, or copying them to a Cloud Storage bucket.

For a more detailed overview of how change streams and Dataflow work together, see Build change streams connections with Dataflow.

Using the API

As an alternative to using Dataflow to build change stream pipelines, you can instead write code that uses the Spanner API to read a change stream's records directly. This allows you to read data change records in the same way that the SpannerIO connector does, trading away the abstraction it provides in exchange for the lowest possible latencies when reading change stream data.

To learn more, see Query change streams. For a more detailed discussion on how to query change streams and interpret the records returned, see Change streams partitions, records, and queries.

Limits

There are several limits on change streams, including the maximum number of change streams a database can have, and the maximum number of streams that can watch a single column. For a full list, see Change stream limits.

Permissions

Change streams use a couple of Spanner's standard IAM permissions:

  • Creating, updating, or dropping change streams requires spanner.databases.updateDdl.

  • Reading a change stream's data requires spanner.databases.select.

If using the SpannerIO connector, then the owner of the Dataflow job that reads change stream data requires additional IAM permissions, either on your application database or on a separate metadata database; see Create a metadata database.

Best practices

Benchmark new change streams, and resize if needed

Before adding new change streams to your production instance, consider benchmarking a realistic workload on a staging instance with change streams enabled. This lets you determine whether you need to add nodes to your instance, in order to increase its compute and storage capacities.

Run these tests until CPU and storage metrics stablize. Optimally, the instance's CPU utilization should remain under the recommended maximums, and its storage usage should not exceed the instance's storage limit.

Use different regions to load-balance

When using change streams in a multi-region instance, consider running their processing pipelines in a different region than the default leader region. This helps to spread the streaming load among non-leader replicas. If you need to prioritize the lowest possible streaming delay over load balancing, however, do run the streaming load in the leader region.

What's next