What's new in PostgreSQL 15
Hannu Krosing
Database Engineer
Cloud SQL is excited to announce support for PostgreSQL 15. PostgreSQL 15 has many new and valuable enhancements and we thought it would be great to deep dive into these for our users. One can broadly categorize the improvements into five areas:
Security
Developer experience
Performance
Tooling
Observability
Let’s deep dive.
Security
Schema `public` is now private by default
While historically PostgreSQL has had good security defaults when it comes to connecting to the database, the default for new databases was to have the schema 'public' open to everybody. Also the PostgreSQL superuser who created the cluster was the owner of the schema 'public', so if the database owner was not a superuser they were unable to change the schema access grants or drop the schema.
Starting PostgreSQL 15 this default access is set to "database owner only" and also the schema is owned by the database owner. This is done using the role pg_database_owner
which always resolves to the owner of the current database and not through changing the owner of the schema "public" during database creation.
If the database is upgraded from the older version via pg_upgrade or loaded from a pg_dump archive, the access rights stay as they were in the previous version of PostgreSQL.
Also, nothing has changed for the pseudo-role 'public' which still means "all users".
SECURITY INVOKER Views
Another security-related improvement is a new option for a view to *not* act as a security provider where any user who has SELECT rights on the view will automatically have the rights of the owner of the view for any data accessed through this view.
This option is activated using option 'security_invoker' when creating the view.CREATE VIEW … WITH ( security_invoker ) AS <query>;
Conceptually this is similar to SECURITY INVOKER
functions, except that the default for functions in PostgreSQL has always been SECURITY INVOKER
and SECURITY DEFINER
had to be explicitly specified. For views the default is reversed.
Another difference from functions is that in the function call of a SECURITY DEFINER function, the system fully switches to the security context of the definer / owner of the function and anything accessed inside the function will have privileges of the definer.
For the views the behavior is different — even if the main view is not defined as 'security_invoker' any sub-views used by that view can still be and so any object access in such views will be checked against callers privileges.
More flexible security in logical replication
Logical replication can now also run as the subscription owner, in which case some extra checks are performed against this user's privileges. When running this way the replication user must be either a superuser, table owner, or have bypassrls
set to be able to replicate into a table with row-level security activated. The user also must have the SELECT
privilege for a table in which it tries to replicate UPDATE
or DELETE
events.
New role to run CHECKPOINT manually
Before PostgreSQL 15, only superusers could run the CHECKPOINT
; command. Now any user granted the pre-defined role pg_checkpoint
can too.
This is part of an on-going push for more fine-grained control of who can do what.
Other similar predefined roles added over last few versions granting previously superuser-only abilities are pg_read_all_settings
, pg_read_all_stats
, pg_stat_scan_tables
and pg_signal_backend
.
Developer experience
MERGE
PostgreSQL now provides the SQL standard command MERGE for delegating to the database the decision of whether to INSERT a new or UPDATE an existing row.
As an example let’s do the famous FizzBuzz exercise using MERGE.
1. Create a Table with all numbers from 0 to 15 which are multiples of 3 with line number in field "i" and string 'fizz' as value of field "say"
2. Next let's MERGE in new lines with line number and 'buzz' in the "say" field
And let's remove any lines where i <= 0
PostgreSQL already has a way to do some of this in the form of INSERT … ON CONFLICT DO INSTEAD … but this is a non-standard PostgreSQL-specific extension and it also has some "interesting" handling of transaction isolation levels where the behavior there does not follow exactly any of the SQL-standard isolation levels.
Also it depends on Unique Key violations, and it can not do multiple conditional actions.
So even after adding the PRIMARY KEY to the table, we still can’t remove the row 0 :
Conclusion: MERGE is much more versatile and powerful than the old way!
Multirange improvements
One of the big improvements in PostgreSQL 14 was support for multirange types. Multirange is a set of non-overlapping ranges and they were needed to make support for range types complete.
For example before multiranges you could not add range(1,3) + range(4,7) as the result was not a single range. With multirange you can. And you can then add range(2,5) to the previous range to get back to a single range(1,7).
When multiranges were added to PostgreSQL 14, somehow the support for the range union aggregator function rang_agg() was left out.
This was fixed in PostgreSQL 15, so now you can:
ICU collations can be set as the default for clusters and databases
Previously, only libc-based collations could be selected at the cluster and database levels. ICU collations could only be used via explicit COLLATE clauses.
Now you can do the following to have new database use a specified ICU locale:
New implementation of CREATE DATABASE
The CREATE DATABASE
command was rewritten to WAL-log all the writes it does when it makes a new database as a copy of the template database.
It does much more WAL writing than the old version but as it avoids the CHECKPOINT at the start and end of the command, it is in most cases faster and has less impact on concurrent workloads.
This can be slower than the old version in case of a very large template database — for example in a multi-tenant cluster where the template has lot of schemas, tables and initial data — so the old way of doing it is still available and can be selected by specifying STRATEGY = FILE_COPY
in the CREATE DATABASE
command. The default of STRATEGY = WAL_LOG
is the better one to use in most cases.
Performance
There are more but these are the most interesting new features.
Faster sorting
First, the handling of cases where the sorted data did not fit in work_mem is improved by switching to disk-based sorting with more sort streams.
More cases where sorting can be avoided
Second, improvement for sorting is the ability to allow ordered scans of partitions to avoid sorting in more cases than before so sorting can be replaced by already pre-ordered index scans.
Previously, a partitioned table with a DEFAULT partition or a LIST partition containing multiple values could not be used for ordered partition scans. Now they can be used if such partitions are pruned during planning.
Smarter postgres_fdw
Postgres_fdw is a "foreign data wrapper" which allows exposing tables from other PostgreSQL databases as local tables.
In PostgreSQL 15 there are a few new options:
First, now the query optimizer can send CASE
expressions to be executed in the foreign database, lowering the need to fetch more data or even more rows for local processing.
There already was support for pushdown of simpler filters and joins when the wrapper could prove that it was possible to process them fully on the remote side. This, together with the ability to have foreign tables as partitions of local partitioned tables, opens up more ways to use PostgreSQL with distributed data.
Another new feature related to above is the ability to do commits in all foreign servers involved in a transaction in parallel. This will be really helpful in cases of large numbers of foreign tables, which can easily happen in the case of partitioned tables with foreign partitions. This is enabled with the CREATE SERVER
option parallel_commit
.
Yet another new option, this time not performance related, for foreign tables is postgres_fdw.application_name
, which allows setting the application_name used when establishing connections to foreign servers. This lets DBAs and users easily see which connections are opened by postgres_fdw. There are even escape sequences available for customization of the application_name used. Previously the remote session's application_name
could only be set on the remote server or via a postgres_fdw connection specification.
New options in logical replication
Native logical replication has been improved in multiple ways.
First, it now has support for row filtering and column lists.
While row filtering has a set of rules you have to follow for different replication strategies, at a high level, it is specified the same way as one would do for a query:
CREATE PUBLICATION pub2 FOR TABLE table1 WHERE (name like 'TX%');
And just rows who have TX in their name will be replicated.
Column lists work in a similar way, allowing one to specify a subset of table columns that are replicated:
CREATE PUBLICATION pub1 FOR TABLE table1 (id, a, c);
Also new is the option FOR TABLES IN SCHEMA
, which publishes all current and future tables in a specified schema. Earlier the ALL
option was available only database-wide.
And we now have support for proper two-phase commits. For this the replication slot needs to be created with an option called TWO_PHASE.
One sample user of this is pg_recvlogical
, which has added a --two-phase option to be used during slot creation.
Logical replication also no longer sends empty transactions. When it finds that there are no DML statements in a decoded transaction for a certain slot, it sends nothing and moves directly on to the next transaction.
It also now detects the case of a partially streamed transaction which has crashed on source and sends info about this to the subscriber. Before, this case caused subscriber to keep such transactions open until the subscriber restarted.
There are now functions to monitor the directory contents of logical replication slots:
pg_ls_logicalsnapdir()
, pg_ls_logicalmapdir()
, and pg_ls_replslotdir()
.
They can be run by members of the predefined pg_monitor
role.
And although partitioned tables can have foreign tables as partitions, replicating into such a partition isn't currently supported. The logical replication worker used to crash if it was attempted. Now, an error is thrown.
Comparison with the pglogical extension
While there have been lots of improvements, there are still cases where the pglogical extension is needed.
Native replication has no support for filtering by replication origin, meaning that setting up a bi-directional replication will fail, either resulting in an infinite loop for UPDATE, or in case of INSERT into a table with Primary Key, the replication stops with key violation when trying to replicate the same insert back. In case of insert-only publication, replication will keep inserting the new row over and over again, resulting in unlimited table growth.
(See Setting bi-directional replication for Cloud SQL for PostgreSQL | Google Cloud Blog for how you can do this in Cloud SQL with pglogical)
You can’t define primary to have multiple IP addresses (pg_logical has the concept of "interfaces" for this).
Bi-directional replication support is the most useful of the three. The others are for really rare use cases but perhaps worth mentioning in case you happen to have one of them.
And of course if you need some of the new options when replicating *into* PostgreSQL 15 from an older version, you also still need to use pg_logical, as PostgreSQL core only gets a new feature in latest version.This is different from extensions, where you can often use the latest extension version on many PostgreSQL versions.
Tooling
Improvements to pgbench
The bundled performance testing tool pgbench
can now retry serializability errors, including deadlocks. This is good news if you want to test workloads, which occasionally do deadlock or have other serialization violations that could be fixed by re-running the transaction.
For example the standard TPC-C tests define that 10% of transactions are aborted.
Now this should be possible to be tested using pgbench
with custom scripts.
Improved psql experience
While psql is already quite amazing, PostgreSQL 15 managed to add even more features for advanced users.
Multi-statement commands
Now psql will return results for all statements in a multi-statement query string.
Pre-15 versions of psql
emulated the behavior of sending the whole string to the server and returned only the result of the last statement, even though psql
does parse the strings given to it and sends them as separate statements. Now results for each individual statement are returned. To get old behavior, set SHOW_ALL_RESULTS psql variable to off.
(The only way to ask psql to send "select 1; select 2; select 3;" as a single string is to escape the ;, so "select 1\; select 2\; select 3;" will be sent as a single string)
Faster \copy
Now the \copy command in psql uses larger chunks to send data thus improving the speed of the copy.
Easier way to show a set of server variables
A new command \dconfig
is added to show server variables.
This can also handle wildcards, so now \dconfig *log*
shows all variables with 'log' in their names.
Earlier you had to manually run
SELECT name, settings unit FROM pg_settings WHERE name like '%log%'
to get this.
Observability
New statistics collection subsystem
The Cumulative Statistics System was rewritten to use shared memory.
In earlier versions there was a special statistics collector process that got the stats from individual back-ends via UDP packets. And the collected stats became available to back-ends after transferring them via file system.
The new system should:
be faster
need less configuration
not randomly lose some collected statistics in case of high workloads (UDP is by design lossy), so counts in pg_stats_*
views should be more trustworthy
Monitoring and new monitoring roles
A new statistics view pg_stat_subscription_stats
is added for monitoring subscriptions.
Also a view pg_stat_recovery_prefetch
which tracks pre-fetching in recovery.
Now pg_stat_statements
has new fields for temporary file I/O and JIT counters.
And lastly there are two new server variables:
shared_memory_size
to check the size of allocated shared memory
shared_memory_size_in_huge_pages
for the number of huge memory pages required
Preparing for larger data volumes
As an interesting feature, the functions to pg_size_pretty()
and pg_size_bytes()
were updated to be able to convert to Petabytes. Before version 15 the largest unit they knew about was Terabytes :
Summary
PostgreSQL continues to innovate and deliver much needed features for the most demanding applications. PostgreSQL 15 is no different and we’re very excited to support it in Cloud SQL.
If you want to learn more about Cloud SQL go to Cloud SQL For PostgreSQL Managed Database.
Links
PostgreSQL 15 release notes - this has the main info for new functionality.
PostgreSQL 15.1 release notes - this and the next link for 15.2 list bug fixes and smaller improvements, like allowing more than 100 columns in column list for logical replications.