This section contains information about:
- The behavior of how Datastream handles data that's being pulled from a source PostgreSQL database
- The versions of PostgreSQL database that Datastream supports
- An overview of how to setup a source PostgreSQL database so that data can be streamed from it to a destination
- Known limitations for using PostgreSQL database as a source
Behavior
The source PostgreSQL database relies upon its logical decoding feature. Logical decoding exposes all changes committed to the database and allows consuming and processing these changes in a user-friendly format using an output plugin. Datastream uses the pgoutput
plugin, which is the standard PostgreSQL logical decoding plugin for PostgreSQL 10 and later.
- All schemas or specific schemas from a given PostgreSQL source, as well as all tables from the schema or specific tables, can be selected.
- All historical data is replicated.
- All data manipulation language (DML) changes, such as inserts, updates, and deletes from the specified databases and tables, are replicated.
- Only committed changes are replicated.
- If you define a REPLICA IDENTITY on a table, Datastream treats the specified columns as primary keys.
Versions
Datastream supports PostgreSQL version 10 and later.
Datastream supports the following types of PostgreSQL database:
- Self-hosted PostgreSQL
- Cloud SQL for PostgreSQL
- AlloyDB for PostgreSQL
- AlloyDB Omni
- Amazon RDS for PostgreSQL
- Amazon Aurora PostgreSQL
Known limitations
Known limitations for using Datastream with a PostgreSQL database as a source include:
- Streams are limited to 10,000 tables.
- A table that has more than 500 million rows can't be backfilled unless the following conditions are met:
- The table has a unique B-tree index.
- The index doesn't include columns of the following types:
DOUBLE
,FLOAT
,MONEY
,REAL
,JSON
,JSONB
,BYTEA
,TXID
,XML
or geometric data types. - None of the columns of the index are nullable.
- All columns of the index are in ascending order, or all columns of the index are in descending order.
- All columns of the index are included in the stream.
- Tables without primary keys must have a REPLICA IDENTITY. Otherwise, only
INSERT
events are replicated to the destination. - Datastream can't replicate from a read replica instance for PostgreSQL versions earlier than PostgreSQL 16.
- Not all changes to the source schema can be detected automatically, in which case data corruption may occur. The following schema changes may cause data corruption or failure to process the events downstream:
- Dropping columns.
- Adding columns to the middle of a table.
- Changing the data type of a column.
- Reordering columns.
- Dropping tables (relevant if the same table is then recreated with new data added).
- Datastream doesn't support columns of the
geometric
data types. - Datastream doesn't support columns of the
range
data types. - Datastream doesn't support arrays of unsupported data types, arrays of user-defined data types (including
ENUM
) or arrays ofDATE
,TIMESTAMP
orTIMESTAMP WITH TIME ZONE
data types. Such columns are ignored. - Datastream doesn't support replicating
UPDATE
events for rows which includeTOAST
values in columns that are part of the table's replica identity. Such events are discarded. - Datastream doesn't support replicating rows which include
JSON
orJSONB
values with more than 2950 nested objects. Events containing suchJSON
orJSONB
values aren't replicated to the destination database. - Datastream doesn't support replicating rows which include
NaN
values inNUMERIC (precision, scale)
columns. The values in such columns are replaced withNULL
values. - Datastream doesn't support replicating columns of the hstore data type. The values in such columns are replaced with
NULL
values. - Datastream doesn't support SSL/TLS connections which require client certificates.
- Datastream doesn't support replicating non-ASCII records from a SQL_ASCII encoded source database. Such records are discarded.
- Datastream doesn't support replicating tables with Row-Level Security (RLS) policies defined. For information about how to bypass this limitation, see PostgreSQL source behavior and limitations.
- Datastream doesn't capture changes made to generated columns.
What's next
- Learn how to configure a PostgreSQL source for use with Datastream.