Set up logical replication and decoding

You can use logical replication and decoding features in Cloud SQL for PostgreSQL. These features enable logical replication workflows and change data capture (CDC) workflows.

For general information about replication, see About replication in Cloud SQL.

Introduction

When PostgreSQL performs logical replication, the changes that are streamed to replicas are extracted from the WAL logs using logical decoding. The decoded changes are independent of the underlying physical storage format. The changes reflect only the changes in data from a SQL level, in terms of INSERTs, UPDATEs, and DELETEs. This independence from the storage layer provides great flexibility and enables a wide array of functionality by consumers of the change streams.

Logical replication is the flagship feature built on logical decoding.

Unlike PostgreSQL's physical replication feature, which requires both the source and destination databases to be the same version, logical replication enables replication across PostgreSQL major versions. Logical replication in Cloud SQL is supported by the pglogical extension, available in all PostgreSQL versions, and PostgreSQL's native logical replication, added in PostgreSQL 10.

The format in which changes are streamed can be configured with different plugins. This allows for flexible change data capture (CDC) architectures. For example, the wal2json extension allows streaming all changes in a database to a consumer, formatted as JSON. Cloud SQL supports the built-in pgoutput decoder, the test_decoding contrib module, and wal2json. Cloud SQL currently supports both wal2json variants of JSON output - format-version 1 which encodes the whole transaction as a single JSON object and format-version 2 which outputs one JSON object per command. These plugins enable replication to non-PostgreSQL databases.

Configure your PostgreSQL instance

PostgreSQL supports logical decoding by writing additional information to its write-ahead log (WAL).

In Cloud SQL, you enable this feature by setting the cloudsql.logical_decoding flag to on. This setting is different from the setting used in standard PostgreSQL. If you change an external PostgreSQL instance, you enable this feature by setting the wal_level configuration parameter to logical.

If you plan to use the pglogical extension, pglogical must be added to shared_preload_libraries. Since Cloud SQL does not allow direct modification of this flag, pglogical is enabled by setting cloudsql.enable_pglogical to on. (On a VM, sudo apt-get install postgresql-13-pglogical ) and restart the database.

If you are using pglogical to replicate between two PostgreSQL instances, logical decoding only needs to be enabled on the primary instance, and not on the replica instance (unless that instance itself is a primary for other replicas). However, the pglogical extension must be enabled on both instances. For examples of how the terms "primary" and "replica" are used and their meanings, see About replication in Cloud SQL.

Enable network connectivity

Ensure that your primary instances accept connections from the replica instance.

Primary Replica Configuration
Cloud SQL (public IP) Cloud SQL (public IP) Add the replica's outgoing IP address to the primary's authorized networks.
Cloud SQL (private IP) Cloud SQL (private IP) If both instances are in the same Google Cloud project, add the allocated IP range of the replica's VPC network to the both instance's authorized networks.

To find the allocated IP range in the Google Cloud console:

  1. Navigate to the VPC networks page.
  2. Select the VPC network you're using.
  3. Select the Private service connection tab.
  4. Select the Allocated IP ranges tab.
External Cloud SQL You can use the Database Migration Service.
Cloud SQL External See Configure external replicas for more information.

Get the outgoing IP address of a replica instance

If the replica instance is a Cloud SQL instance and has a public IP address, perform the following steps to get its outgoing IP address.

Console

  1. Open the Cloud SQL Instances page.

  2. Next to the Cloud SQL replica's public IP address, hover over the More info tool tip and retrieve the outgoing IP address. Note that the outgoing IP address is not the IP address displayed in the main listing for the replica in the Cloud console.

If the replica instance isn't a Cloud SQL instance, refer to the relevant documentation.

For more information about how to get an instance's public IP address, see Get the Cloud SQL replica's outgoing IP address.

gcloud

You can use the following gcloud command:

gcloud sql instances describe [REPLICA_NAME] --format="default(ipAddresses)"

Allow connections

If the primary instance is a Cloud SQL instance, you can allow access from the replica's outgoing IP address by adding it as an authorized network.

Enable replication connections for PostgreSQL 9.6 and earlier

If your primary instance isn't running in Cloud SQL, and is running PostgreSQL 9.6 or earlier, you must ensure that the instance's pg_hba.conf file is set to accept replication connections. Add the following line to that file, using all all for initial testing only. For more security, limit users and IP addresses to just those required, as in this example:

host replication all all md5

For additional information, see The pg_hba.conf File.

Create a replication user

To use logical decoding features, you create a PostgreSQL user with the REPLICATION attribute.

Examples

CREATE USER replication_user WITH REPLICATION
IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'secret';

Alternatively, you can set this attribute on an existing user:

ALTER USER existing_user WITH REPLICATION;

PostgreSQL resources

When logical decoding is used, a background process on the primary PostgreSQL instance transforms WAL changes into logical changes, using the selected decoding plugin, and relays them to a consumer (which could even be a non-PostgreSQL instance). This background process is called a WAL sender. The number of concurrent WAL senders that can be active in a PostgreSQL instance is limited by the max_wal_senders flag. This flag defaults to 10 and its limit grows linearly with the memory of your Cloud SQL instance, allowing 8 WAL senders per GB of memory.

To ensure that WAL segments are not discarded before being sent to all consumers, PostgreSQL uses logical replication slots to track which data was sent to which consumer (and physical replication slots for read replicas). The number of replication slots that you can create for a PostgreSQL instance is limited by the max_replication_slots flag. This flag defaults to 10 and its limit grows with the memory of your Cloud SQL instance, allowing between 2 and 8 replication slots per GB of memory.

The following table shows the relationship between the maximum memory of a Cloud SQL instance and the maximum replication slots for the instance.

Maximum memory (GB)
Maximum replication slots
4
10
16
32
32
128
64
256
128
512
256
1024
512
2048
512+
4096

There generally is one replication slot and WAL sender per consumer, so these flags should be set to roughly equal values. However, PostgreSQL recommends providing a small buffer for max_wal_senders to handle when connections unexpectedly die and new connections are made. Physical replication, as used by Cloud SQL read replicas, also uses a replication slot and WAL sender, so count those when calculating how many of each resource you need.

PostgreSQL native logical replication and pglogical require additional background processes to run, both on the primary and replica instances. The number of background processes that can run is limited by the max_worker_processes flag. The default is eight and its limit grows linearly with the memory of your Cloud SQL instance, allowing two additional processes per GB of memory. The exact number of worker processes used with these approaches is explained in their respective sections.

If this flag is set too low, and replication fails with the error message worker registration failed in your logs, you likely need to increase the max_worker_processes setting.

Note that WAL senders do not count as worker processes. Workers spawned for parallel query execution do count, so if the value of max_worker_processes is set too low, you may experience poor performance because PostgreSQL cannot leverage parallel query execution.

Using the pg_ls_waldir () function, you can determine the WAL disk usage. This function is restricted to cloudsqlsuperuser users such as the default admin user postgres. This function is only available in PostgreSQL version 10 and above.

To calculate the total WAL disk usage:

postgres=> select * from pg_ls_waldir();
name size modification
00000001000000000000000A 16777216 2021-08-11 15:16:49+00
000000010000000000000009 16777216 2021-08-12 06:23:24+00

(2 rows)

postgres=> select pg_size_pretty(sum(size)) as "Total WAL disk usage" from pg_ls_waldir();
Total WAL disk usage
32 MB

(1 row)

Set up logical replication with an external replica

See Configuring external replicas for a complete example using pglogical and logical decoding.

Set up logical replication with pglogical

To set up logical replication with pglogical, logical decoding must be enabled on the primary instance. Set cloudsql.logical_decoding=on on the Cloud SQL instance, or wal_level=logical on an external instance. Additionally, pglogical must be enabled on both the primary and replica instance; set cloudsql.enable_pglogical=on on a Cloud SQL instance, or add pglogical to shared_preload_libraries on an external instance. Note that changing these flags requires a restart of both the primary and replica instances.

If you encounter issues with these steps, see Troubleshoot pglogical.

Create a user with replication privileges

You need a user with replication privileges and the cloudsqlsuperuser role on both the primary and replica instances when using pglogical. Any commands described below should be executed by that user.

Install the pglogical extension

You need to install the pglogical extension on both the primary and replica instances. On the primary, the replication user (that is, the user connecting to the database) must install it.

CREATE EXTENSION pglogical;

Create a pglogical node on each instance

A pglogical node represents a physical PostgreSQL instance, and stores connection details for that instance. Both the primary and the replica instance must register themselves as nodes:

source-instance$ SELECT pglogical.create_node(
    node_name := 'primary',
    dsn := 'host=<primary-ip> port=5432 dbname=postgres user=replication_user password=secret'
);

dest-instance$ SELECT pglogical.create_node(
    node_name := 'replica',
    dsn := 'host=<replica-ip> port=5432 dbname=postgres user=replication_user password=secret'
);

Create a table with data to replicate

The pglogical extension allows for replicating only a subset of tables to a destination. As an example, we'll create a dummy table on the primary instance and populate it with some data to test:

CREATE TABLE replica_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO replica_test (data) VALUES ('apple'), ('banana'), ('cherry');

The table must also be created on the replica instance.

Add the table to a replication set

To support replicating different sets of data to different destinations, pglogical has the concept of a replication set. We can add our test table to the default replication set.

SELECT pglogical.replication_set_add_table('default', 'replica_test', true);

Create the pglogical subscription

Create the pglogical subscription on the destination instance by providing connection details to the primary instance.

SELECT pglogical.create_subscription(
    subscription_name := 'test_sub',
    provider_dsn := 'host=<primary-ip> port=5432 dbname=postgres user=replication_user password=replicapassword'
);

SELECT * FROM pglogical.show_subscription_status('test_sub');

If the status appears as "replicating", then the setup is successful. Query the replica_test table to ensure that data has been replicated. Insert and modify records on the primary instance and verify that they then appear on the replica instance.

On the primary, query the pg_replication_slots table to see the replication slot created by the subscription.

Cleanup

After your test is successful, drop the subscription on the replica using pglogical.drop_subscription('test_sub'). Verify that the replication slot is also dropped on the primary. Otherwise, WAL segments continue to accumulate on the replica instance.

For more about replication sets, partial data replication, DDL replication, other advanced configuration and limitations, see the pglogical documentation.

Resource usage

The pglogical extension runs multiple background processes that count towards the max_worker_processes limit.

In the steady state, it runs one "supervisor" process when enabled, one "manager" process per PostgreSQL database that has installed the extension (for example, there could be D of these), and one "apply" process per pglogical subscription on the replica instance (for example, there could be S of these). The extension, however, may spawn additional worker processes when performing an initial sync, and actually does spawn "manager" processes for every database in the instance, but if the database doesn't have the extension installed, it exits immediately.

Therefore, allocate a handful more worker processes than needed in the steady state. Worker processes are used by PostgreSQL for other purposes, such as parallel query processing. If max_worker_processes is set too low, replication may silently fail, or PostgreSQL may be unable to perform parallel query processing.

In summary, these settings are recommended:

max_worker_processes
  >= 1 + D + 8 (on the source instance)
  >= 1 + D + S + 8 (on the destination instance)
max_wal_senders >= S + 2 (on the source instance)
max_replication_slots >= S (on the source instance)

Troubleshoot pglogical

Unable to create the pglogical extension

When trying to install the pglogical extension, you may see the error:

ERROR:  pglogical is not in shared_preload_libraries

When you install pglogical on a Cloud SQL instance, ensure that you have set cloudsql.enable_pglogical=on. If using an external instance, add it directly to the shared_preload_libraries flag, for example, shared_preload_libraries=pg_stat_statements,pglogical. These modifications require a restart of the primary instance.

Unable to create pglogical subscription

When creating a subscription, pglogical first checks that it can use the connection details to connect to the instance. It first tries to create a regular connection, and if this fails, an error occurs: ERROR: could not connect to the postgresql server.

If this error occurs, ensure that the primary instance is configured to allow connections from the replica instance, and ensure that the connection details you provided are correct. Additional details are provided about why PostgreSQL was unable to establish a connection.

After creating a regular connection, pglogical tries to make a special replication connection. In PostgreSQL 9.6 and earlier, this type of connection could have a different authentication configuration. You need to update the pg_hba.conf file on the source instance if you see this error: ERROR: could not connect to the postgresql server in replication mode.

The pg_hba.conf file used by Cloud SQL already has the necessary changes; this error only occurs when connecting to an external instance that isn't managed by Cloud SQL.

Alternatively, the replication mode connection may fail if the source instance does not allow enough WAL senders. If you see FATAL: number of requested standby connections exceeds max_wal_senders, then increase max_wal_senders on the primary instance.

pglogical subscription is down

A pglogical subscription may fail to replicate. To address this issue, first ensure that a background process is running on the replica instance. Query pg_stat_activity to verify that a pglogical apply process is running. If not, check the logs on the destination node. If you see the message worker registration failed, you can increase the max_worker_processes setting.

Then, ensure a replication slot was created on the primary instance. On the replica instance, the row in pglogical.subscription contains the name of the slot the subscription tries to create, and on the primary instance you can query pg_replication_slots to verify that the slot was successfully created.

If no replication slot was created, check the logs on the primary instance.

An error of ERROR: logical decoding requires wal_level >= logical implies that the wal_level flag wasn't set to logical. Resolve this by setting cloudsql.logical_decoding=on, on the primary instance if it is a Cloud SQL instance.

Alternatively, if the instance is an external instance, set wal_level=logical.

Otherwise, you may see ERROR: all replication slots are in use, along with the helpful HINT: Free one or increase max_replication_slots.

Set up native PostgreSQL logical replication

Since PostgreSQL 10, PostgreSQL supports native built-in logical replication. To set up native logical replication, logical decoding must be enabled on the primary instance, by setting cloudsql.logical_decoding=on on a Cloud SQL instance, or wal_level=logical on an external instance. Note that modification of these flags requires a restart of the primary instance.

Ensure that your instances are properly configured (for network connectivity, etc.) by reviewing the sections in Configure your PostgreSQL instance. This page provides steps for a proof-of-concept. If you encounter any issues while following steps in those sections, see Troubleshoot pglogical. For more information, see Logical Replication in the PostgreSQL documentation.

Create a table with data to replicate

Native PostgreSQL logical replication supports an entire database or just individual tables. As an example, we'll create a dummy table on the primary instance and populate it with data to test.

CREATE TABLE native_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO native_test (data) VALUES ('apple'), ('banana'), ('cherry');

The table must also be created on the replica instance.

Create a publication on the primary instance

Native PostgreSQL logical replication deals with publishers and subscribers. Create a publication of the data in native_test:

CREATE PUBLICATION pub FOR TABLE native_test;

Create a subscription on the replica instance

Here is an example of creating a subscription on the replica instance:

CREATE SUBSCRIPTION sub
    CONNECTION 'host=<primary-ip> port=5432 dbname=postgres user=replication_user password=replicapassword'
    PUBLICATION pub;

Creating the subscription on the replica instance requires the cloudsqlsuperuser role. After creating the subscription, query the native_test table to verify that the data has appeared in the replica instance.

On the primary, you can query the pg_replication_slots table to see the replication slot created by the subscription.

Cleanup

Once your test is successful, drop the subscription on the replica using DROP SUBSCRIPTION sub;. Verify that the replication slot is also dropped on the primary. Otherwise WAL segments continue to accumulate on the primary instance.

Limitations on native PostgreSQL logical replication

Access to the subconninfo column of the pg_subscription system table is unavailable.

Running pg_dump cannot dump information about subscriptions because it checks whether the connecting user has superuser permissions.

Receive decoded WAL changes for change data capture (CDC)

As an alternative use case for CDC, logical decoding can stream changes from a PostgreSQL instance. The standard tool used for this is pg_recvlogical.

You can use the pg_recvlogical tool to create a replication slot and to stream changes tracked by that slot. The format of the changes is determined by your choice of the decoding plugin. You can use:

  • wal2json, to stream changes formatted as JSON, or

  • test_decoding, to stream changes formatted with a barebones text format

Create replication slot

To create a replication slot, run:

pg_recvlogical
  -h <instance_ip> \
  -U <replication_user> \
  -p 5432 \
  -d postgres \
  --slot test_slot \
  --create-slot \
  -P <decoder_plugin>

Stream changes

In one Cloud Shell terminal, run:

pg_recvlogical
  -h <instance_ip> \
  -U <replication_user> \
  -p 5432 \
  -d postgres \
  --slot test_slot \
  --start \
  -f -

While in another Cloud Shell terminal, connect to your database and run the following commands:

CREATE TABLE cdc_test (id SERIAL PRIMARY KEY, data text);
INSERT INTO cdc_test (data) VALUES ('apple', 'banana');
UPDATE cdc_test SET data = 'cherry' WHERE id = 2;
DELETE FROM cdc_test WHERE id = 1;
DROP TABLE cdc_test;

If you are using the wal2json decoder plugin, output similar to the following is displayed in the first Cloud Shell terminal:

{"change":[]}
{"change":[{"kind":"insert","schema":"public","table":"cdc_test","columnnames":["id","data"],"columntypes":["integer","text"],"columnvalues":[1,"apple"]},{"kind":"insert","schema":"public","table":"cdc_test","columnnames":["id","data"],"columntypes":["integer","text"],"columnvalues":[2,"banana"]}]}
{"change":[{"kind":"update","schema":"public","table":"cdc_test","columnnames":["id","data"],"columntypes":["integer","text"],"columnvalues":[2,"cherry"],"oldkeys":{"keynames":["id"],"keytypes":["integer"],"keyvalues":[2]}}]}
{"change":[{"kind":"delete","schema":"public","table":"cdc_test","oldkeys":{"keynames":["id"],"keytypes":["integer"],"keyvalues":[1]}}]}
{"change":[]}

If you are using the test_decoding decoder plugin, output similar to the following is displayed in the first Cloud Shell terminal:

BEGIN 19460
COMMIT 19460
BEGIN 19461
table public.cdc_test: INSERT: id[integer]:1 data[text]:'apple'
table public.cdc_test: INSERT: id[integer]:2 data[text]:'banana'
COMMIT 19461
BEGIN 19462
table public.cdc_test: UPDATE: id[integer]:2 data[text]:'cherry'
COMMIT 19462
BEGIN 19463
table public.cdc_test: DELETE: id[integer]:1
COMMIT 19463
BEGIN 19464
COMMIT 19464

(Your transaction IDs may differ.)

Cleanup

After you complete your testing, drop the replication slot you created by running:

pg_recvlogical
  -h <instance_ip> \
  -U <replication_user> \
  -p 5432 \
  -d postgres \
  --slot test_slot \
  --drop-slot

Notes and limitations

The notes and limitations in this section apply to the logical replication and decoding features of Cloud SQL for PostgreSQL.

  • When you restore an instance that has cloudsql.logical_decoding or cloudsql.enable_pglogical enabled, and is currently acting as a publisher for logical replication, you must disable replication to all target instances first. Otherwise, restoring to the instance fails with an error, but currently the details of the error are not visible.

  • When you restore a backup of an instance that had cloudsql.logical_decoding or cloudsql.enable_pglogical enabled (at the time of the backup), and you are restoring it to a new instance, the replication state is not restored to the new instance. You must reconfigure replication manually afterwards.

  • On a Cloud SQL instance with one or more Cloud SQL read replicas (using physical replication), if you enable cloudsql.logical_decoding or cloudsql.enable_pglogical, those flags are also enabled on the read replica.

    • Cloud SQL read replica instances can't act as publishers for logical replication because PostgreSQL doesn't support logical decoding in read replicas. However, the flags are enabled on the read replica instance to ensure that it can serve as a replacement for the primary if and when it is promoted.

    • Enabling cloudsql.logical_decoding or cloudsql.enable_pglogical on the primary instance causes the flags to be enabled on all read replicas, and this causes the primary and read replicas to be restarted in close succession. To avoid this situation and control when each instance is restarted, you can (1) set the flags on each read replica in turn, and only then (2) set the flags on the primary instance.

    • Disabling cloudsql.logical_decoding or cloudsql.enable_pglogical on the primary instance does not cause the flags to be disabled on all read replicas. To disable the flags across the instances, you must perform the inverse of the steps described above: (1) disable the flags on the primary instance and then (2) disable the flags on each read replica in turn.