Setting up logical replication and decoding

You can use logical replication and decoding features in Cloud SQL for PostgreSQL. These features enable logical replication workflows and change data capture (CDC) workflows.

For general information about replication, see Replication in Cloud SQL.

Introduction

When PostgreSQL performs logical replication, the changes that are streamed to replicas are extracted from the WAL logs using logical decoding. The decoded changes are independent of the underlying physical storage format. The changes reflect only the changes in data from a SQL level, in terms of INSERTs, UPDATEs, and DELETEs. This independence from the storage layer provides great flexibility and enables a wide array of functionality by consumers of the change streams.

Logical replication is the flagship feature built on logical decoding.

Unlike PostgreSQL's physical replication feature, which requires both the source and destination databases to be the same version, logical replication enables replication across PostgreSQL major versions. Logical replication in Cloud SQL is supported by the pglogical extension, available in all PostgreSQL versions, and PostgreSQL's native logical replication, added in PostgreSQL 10.

The format in which changes are streamed can be configured with different plugins. This allows for flexible change data capture (CDC) architectures. For example, the wal2json extension allows streaming all changes in a database to a consumer, formatted as JSON. Cloud SQL supports the built-in pgoutput decoder, the test_decoding contrib module, and wal2json. These plugins enable replication to non-PostgreSQL databases.

Configuring your PostgreSQL instance

PostgreSQL supports logical decoding by writing additional information to its write-ahead log (WAL).

In Cloud SQL, you enable this feature by setting the cloudsql.logical_decoding flag to on. This setting is different from the setting used in standard PostgreSQL. If you change an external PostgreSQL instance, you enable this feature by setting the wal_level configuration parameter to logical.

If you plan to use the pglogical extension, pglogical must be added to shared_preload_libraries. Since Cloud SQL does not allow direct modification of this flag, pglogical is enabled by setting cloudsql.enable_pglogical to on.

If you are using pglogical to replicate between two PostgreSQL instances, logical decoding only needs to be enabled on the primary instance, and not on the replica instance (unless that instance itself is a primary for other replicas). However, the pglogical extension must be enabled on both instances. For examples of how the terms "primary" and "replica" are used and their meanings, see Replication in Cloud SQL.

Enable network connectivity

Ensure that your primary instances accept connections from the replica instance.

Get outgoing IP address of a replica instance

If the replica instance is a Cloud SQL instance, you can get its outgoing IP as described below.

gcloud

You can use the following gcloud command:

gcloud sql instances describe [REPLICA_NAME] --format="default(ipAddresses)"

Console

  1. Open the Cloud SQL Instances page.

  2. Next to the Cloud SQL replica's public IP address, hover over the More info tool tip and retrieve the outgoing IP address. Note that outgoing IP address is not the IP address displayed in the main listing for the replica in the Cloud Console.

If the replica instance is not a Cloud SQL instance, please consult the provider-specific documentation.

For more information about how to get an instance's public IP, see Get the Cloud SQL replica's outgoing IP address.

Allowing connections

If the primary instance is a Cloud SQL instance, you can allow access from the replica's outgoing IP address by adding it as an authorized network.

Enabling replication connections for PostgreSQL 9.6 and earlier

If your primary instance isn't running in Cloud SQL, and is running PostgreSQL 9.6 or earlier, you must ensure that the instance's pg_hba.conf file is set to accept replication connections. Add the following line to that file, using all all for initial testing only. For more security, limit users and IP addresses to just those required, as in this example:

host replication all all md5

For additional information, see The pg_hba.conf File.

Create a replication user

To use logical decoding features, you create a PostgreSQL user with the REPLICATION attribute.

Examples

CREATE USER replication_user WITH REPLICATION
IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'secret';

Alternatively, you can set this attribute on an existing user:

ALTER USER existing_user WITH REPLICATION;

PostgreSQL resources

When logical decoding is used, a background process on the primary PostgreSQL instance transforms WAL changes into logical changes, using the selected decoding plugin, and relays them to a consumer (which even could be a non-PostgreSQL instance). This background process is called a WAL sender. The number of concurrent WAL senders that can be active in a PostgreSQL instance is limited by the max_wal_senders flag. This flag defaults to 10 and its limit grows linearly with the memory of your Cloud SQL instance, allowing 8 WAL senders per GB of memory.

To ensure that WAL segments are not discarded before being sent to all consumers, PostgreSQL uses replication slots to track which data was sent to which consumer. The number of replication slots that can be created in a PostgreSQL instance is limited by the max_replication_slots flag. This flag defaults to 10 and its limit grows linearly with the memory of your Cloud SQL instance, allowing 8 replication slots per GB of memory.

There generally is one replication slot and WAL sender per consumer, so these flags should be set to roughly equal values. However, PostgreSQL recommends providing a small buffer for max_wal_senders to handle when connections unexpectedly die and new connections are made. Physical replication, as used by Cloud SQL read replicas, also uses a replication slot and WAL sender, so count those when calculating how many of each resource you need.

PostgreSQL native logical replication and pglogical require additional background processes to run, both on the primary and replica instances. The number of background processes that can run is limited by the max_worker_processes flag. The default is eight and its limit grows linearly with the memory of your Cloud SQL instance, allowing two additional processes per GB of memory. The exact number of worker processes used with these approaches is explained in their respective sections.

If this flag is set too low, and replication fails with the error message worker registration failed in your logs, you likely need to increase the max_worker_processes setting.

Note that WAL senders do not count as worker processes. Workers spawned for parallel query execution do count, so if the value of max_worker_processes is set too low, you may experience poor performance because PostgreSQL cannot leverage parallel query execution.

Setting up logical replication with pglogical

To set up logical replication with pglogical, logical decoding must be enabled on the primary instance. Set cloudsql.logical_decoding=on on the Cloud SQL instance, or wal_level=logical on an external instance. Additionally, pglogical must be enabled on both the primary and replica instance; set cloudsql.enable_pglogical=on on a Cloud SQL instance, or add pglogical to shared_preload_libraries on an external instance. Note that changing these flags requires a restart of both the primary and replica instances.

If you encounter issues with these steps, see Troubleshooting pglogical.

Create a user with replication privileges

You need a user with replication privileges and the cloudsqlsuperuser role on both the primary and replica instances when using pglogical. Any commands described below should be executed by that user.

Install the pglogical extension

You need to install the pglogical extension on both the primary and replica instances. On the primary, the replication user (that is, the user connecting to the database) must install it.

CREATE EXTENSION pglogical;

Create a pglogical node on each instance

A pglogical node represents a physical PostgreSQL instance, and stores connection details for that instance. Both the primary and the replica instance must register themselves as nodes:

source-instance$ SELECT pglogical.create_node(
    node_name := 'primary',
    dsn := 'host=<primary-public-ip> port=5432 dbname=postgres user=replication_user password=secret'
);

dest-instance$ SELECT pglogical.create_node(
    node_name := 'replica',
    dsn := 'host=<replica-public-ip> port=5432 dbname=postgres user=replication_user password=secret'
);

Create a table with data to replicate

The pglogical extension allows for replicating only a subset of tables to a destination. As an example, we'll create a dummy table on the primary instance and populate it with some data to test:

CREATE TABLE replica_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO replica_test (data) VALUES ('apple'), ('banana'), ('cherry');

The table must also be created on the replica instance.

Add the table to a replication set

To support replicating different sets of data to different destinations, pglogical has the concept of a replication set. We can add our test table to the default replication set.

SELECT pglogical.replication_set_add_table('default', 'replica_test', true);

Create the pglogical subscription

Create the pglogical subscription on the destination instance by providing connection details to the primary instance.

SELECT pglogical.create_subscription(
    subscription_name := 'test_sub',
    provider_dsn := 'host=<primary-public-ip> port=5432 dbname=postgres user=replication_user password=replicapassword'
);

SELECT * FROM pglogical.show_subscription_status('test_sub');

If the status appears as "replicating", then the setup is successful. Query the replica_test table to ensure that data has been replicated. Insert and modify records on the primary instance and verify that they then appear on the replica instance.

On the primary, query the pg_replication_slots table to see the replication slot created by the subscription.

Cleanup

After your test is successful, drop the subscription on the replica using pglogical.drop_subscription('test_sub'). Verify that the replication slot is also dropped on the primary. Otherwise, WAL segments continue to accumulate on the replica instance.

For more about replication sets, partial data replication, DDL replication, other advanced configuration and limitations, see the pglogical documentation.

Resource usage

The pglogical extension runs multiple background processes that count towards the max_worker_processes limit.

In the steady state, it runs one "supervisor" process when enabled, one "manager" process per PostgreSQL database that has installed the extension (for example, there could be D of these), and one "apply" process per pglogical subscription on the replica instance (for example, there could be S of these). The extension, however, may spawn additional worker processes when performing an initial sync, and actually does spawn "manager" processes for every database in the instance, but if the database doesn't have the extension installed, it exits immediately.

Therefore, allocate a handful more worker processes than needed in the steady state. Worker processes are used by PostgreSQL for other purposes, such as parallel query processing. If max_worker_processes is set too low, replication may silently fail, or PostgreSQL may be unable to perform parallel query processing.

In summary, these settings are recommended:

max_worker_processes
  >= 1 + D + 8 (on the source instance)
  >= 1 + D + S + 8 (on the destination instance)
max_wal_senders >= S + 2 (on the source instance)
max_replication_slots >= S (on the source instance)

Troubleshooting pglogical

Unable to create the pglogical extension

When trying to install the pglogical extension, you may see the error:

ERROR:  pglogical is not in shared_preload_libraries

When you install pglogical on a Cloud SQL instance, ensure that you have set cloudsql.enable_pglogical=on. If using an external instance, add it directly to the shared_preload_libraries flag, for example, shared_preload_libraries=pg_stat_statements,pglogical. These modifications require a restart of the primary instance.

Unable to create pglogical subscription

When creating a subscription, pglogical first checks that it can use the connection details to connect to the instance. It first tries to create a regular connection, and if this fails, an error occurs: ERROR: could not connect to the postgresql server.

If this error occurs, ensure that the primary instance is configured to allow connections from the replica instance, and ensure that the connection details you provided are correct. Additional details are provided about why PostgreSQL was unable to establish a connection.

After creating a regular connection, pglogical tries to make a special replication connection. In PostgreSQL 9.6 and earlier, this type of connection could have a different authentication configuration. You need to update the pg_hba.conf file on the source instance if you see this error: ERROR: could not connect to the postgresql server in replication mode.

The pg_hba.conf file used by Cloud SQL already has the necessary changes; this error only occurs when connecting to an external instance that isn't managed by Cloud SQL.

Alternatively, the replication mode connection may fail if the source instance does not allow enough WAL senders. If you see FATAL: number of requested standby connections exceeds max_wal_senders, then increase max_wal_senders on the primary instance.

pglogical subscription is down

A pglogical subscription may fail to replicate. To address this issue, first ensure that a background process is running on the replica instance. Query pg_stat_activity to verify that a pglogical apply process is running. If not, check the logs on the destination node. If you see the message worker registration failed, you can increase the max_worker_processes setting.

Then, ensure a replication slot was created on the primary instance. On the replica instance, the row in pglogical.subscription contains the name of the slot the subscription tries to create, and on the primary instance you can query pg_replication_slots to verify that the slot was successfully created.

If no replication slot was created, check the logs on the primary instance.

An error of ERROR: logical decoding requires wal_level >= logical implies that the wal_level flag wasn't set to logical. Resolve this by setting cloudsql.logical_decoding=on, on the primary instance if it is a Cloud SQL instance.

Alternatively, if the instance is an external instance, set wal_level=logical.

Otherwise, you may see ERROR: all replication slots are in use, along with the helpful HINT: Free one or increase max_replication_slots.

Setting up native PostgreSQL logical replication

Since PostgreSQL 10, PostgreSQL supports native built-in logical replication. To set up native logical replication, logical decoding must be enabled on the primary instance, by setting cloudsql.logical_decoding=on on a Cloud SQL instance, or wal_level=logical on an external instance. Note that modification of these flags requires a restart of the primary instance.

Ensure that your instances are properly configured by completing the steps in Configuring your PostgreSQL instance to use logical decoding. If you encounter any issues when following those steps, see the Troubleshooting section below. The official PostgreSQL documentation is available here, and this document provides steps for a proof-of-concept.

Create a table with data to replicate

Native PostgreSQL logical replication supports an entire database or just individual tables. As an example, we'll create a dummy table on the primary instance and populate it with data to test.

CREATE TABLE native_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO native_test (data) VALUES ('apple'), ('banana'), ('cherry');

The table must also be created on the replica instance.

Create a publication on the primary instance

Native PostgreSQL logical replication deals with publishers and subscribers. Create a publication of the data in native_test:

CREATE PUBLICATION pub FOR TABLE native_test;

Create a subscription on the replica instance

Here is an example of creating a subscription on the replica instance:

CREATE SUBSCRIPTION sub
    CONNECTION 'host=<primary-public-ip> port=5432 dbname=postgres user=replication_user password=replicapassword'
    PUBLICATION pub;

Creating the subscription on the replica instance requires the cloudsqlsuperuser role. After creating the subscription, query the native_test table to verify that the data has appeared in the replica instance.

On the primary, you can query the pg_replication_slots table to see the replication slot created by the subscription.

Cleanup

Once your test is successful, drop the subscription on the replica using DROP SUBSCRIPTION sub;. Verify that the replication slot is also dropped on the primary. Otherwise WAL segments continue to accumulate on the primary instance.

Limitations on native PostgreSQL logical replication

Access to the subconninfo column of the pg_subscription system table is unavailable.

Running pg_dump cannot dump information about subscriptions because it checks whether the connecting user has superuser permissions.

Receiving decoded WAL changes for change data capture (CDC)

As an alternative use case for CDC, logical decoding can stream changes from a PostgreSQL instance. The standard tool used for this is pg_recvlogical.

You can use the pg_recvlogical tool to create a replication slot and to stream changes tracked by that slot. The format of the changes is determined by your choice of the decoding plugin. You can use:

  • wal2json, to stream changes formatted as JSON, or

  • test_decoding, to stream changes formatted with a barebones text format

Create replication slot

To create a replication slot, run:

pg_recvlogical
  -h <instance_ip> \
  -U <replication_user> \
  -p 5432 \
  -d postgres \
  --slot test_slot \
  --create-slot \
  -P <decoder_plugin>

Stream changes

In one terminal, run:

pg_recvlogical
  -h <instance_ip> \
  -U <replication_user> \
  -p 5432 \
  -d postgres \
  --slot test_slot \
  --start \
  -f -

While in another terminal, connect to your database and run a series of commands:

CREATE TABLE cdc_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO cdc_test (data) VALUES ('apple', 'banana');
UPDATE cdc_test SET data = 'cherry' WHERE id = 2;
DELETE FROM cdc_test WHERE id = 1;
DROP TABLE cdc_test;

If you are using the wal2json decoder plugin, output similar to the following is displayed in the first terminal:

{"change":[]}
{"change":[{"kind":"insert","schema":"public","table":"cdc_test","columnnames":["id","data"],"columntypes":["integer","text"],"columnvalues":[1,"apple"]},{"kind":"insert","schema":"public","table":"cdc_test","columnnames":["id","data"],"columntypes":["integer","text"],"columnvalues":[2,"banana"]}]}
{"change":[{"kind":"update","schema":"public","table":"cdc_test","columnnames":["id","data"],"columntypes":["integer","text"],"columnvalues":[2,"cherry"],"oldkeys":{"keynames":["id"],"keytypes":["integer"],"keyvalues":[2]}}]}
{"change":[{"kind":"delete","schema":"public","table":"cdc_test","oldkeys":{"keynames":["id"],"keytypes":["integer"],"keyvalues":[1]}}]}
{"change":[]}

If you are using the test_decoding decoder plugin, output similar to the following is displayed in the first terminal:

BEGIN 19460
COMMIT 19460
BEGIN 19461
table public.cdc_test: INSERT: id[integer]:1 data[text]:'apple'
table public.cdc_test: INSERT: id[integer]:2 data[text]:'banana'
COMMIT 19461
BEGIN 19462
table public.cdc_test: UPDATE: id[integer]:2 data[text]:'cherry'
COMMIT 19462
BEGIN 19463
table public.cdc_test: DELETE: id[integer]:1
COMMIT 19463
BEGIN 19464
COMMIT 19464

(Your transaction IDs may differ.)

Cleanup

After you complete your testing, drop the replication slot you created by running:

pg_recvlogical
  -h <instance_ip> \
  -U <replication_user> \
  -p 5432 \
  -d postgres \
  --slot test_slot \
  --drop-slot

General limitations and alternatives

The limitations in this section apply to the logical replication and decoding features of Cloud SQL for PostgreSQL.

  • When you restore an instance that acts as a publisher for logical replication, the replication state is dropped. You must restart replication manually.

  • Cloud SQL read replica instances cannot act as publishers for logical replication because standard PostgreSQL does not support this architecture.

  • You should disable all active logical replication when you are restoring a backup into an instance. Otherwise, restoring into the instance (with ongoing active replication) fails with an error, but currently the details of the error are not visible.

  • When you restore a backup of an instance that has cloudsql.logical_decoding or cloudsql.enable_pglogical enabled, and you are restoring it into a new instance, these flags must also be enabled to the target instance before starting the restore. Otherwise, the restore will fail.