This page describes what you can do when your database runs into Transaction
ID Wraparound protection in PostgreSQL. It manifests as an ERROR
message, as
follows:
database is not accepting commands to avoid wraparound data loss in database dbname. Stop the postmaster and vacuum that database in single-user mode. You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
Alternatively, a WARNING
message as follows might appear:
database dbname must be vacuumed within 10985967 transactions. To avoid a database shutdown, execute a database-wide VACUUM in that database.
Overview of steps
- Find out which database and which tables are causing the wraparound.
- Check if there's anything holding back (AUTO)VACUUM (for example, a stuck transaction ID).
- Measure the speed of AUTOVACUUM. If it is slow, optionally, you can try to speed it up.
- If needed, run a few more VACUUM commands manually.
- Investigate other ways to speed up the vacuum. Sometimes the fastest way is to drop the table or some indexes.
Many of the recommendations for values of flags are purposefully not exact because they depend on many database parameters. Read the documents linked at the end of this page for a deeper analysis on this topic.
Find the database and table causing the wraparound
Finding the database
To find out which database or databases contain the tables that are causing the wraparound, run the following query:
SELECT datname,
age(datfrozenxid),
2^31-1000000-age(datfrozenxid) as remaining
FROM pg_database
ORDER BY 3
The database with the remaining
value close to 0 is the one causing the problem.
Finding the table
Connect to that database and run the following query:
SELECT c.relnamespace::regnamespace as schema_name,
c.relname as table_name,
greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age,
2^31-1000000-greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as remaining
FROM pg_class c
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
WHERE c.relkind IN ('r', 'm')
ORDER BY 4;
This query returns the table or tables causing the problem.
For TEMPORARY tables
If the schema_name
starts with pg_temp_
, then the only way to resolve the problem
is to drop the table because PostgreSQL doesn't let you VACUUM temporary tables created in other
sessions. Sometimes if that session is open and accessible, you can vacuum the
table there, but this is often not the case.
Use the following SQL statements to drop the temp table:
SET cloudsql.enable_maintenance_mode = 'on'; /* get extra transaction ids */
DROP TABLE pg_temp_<N>.<tablename>;
If this was the only blocker, then in about a minute, the autovacuum picks
up this change and moves the datfrozenxid
forward in pg_database
. This resolves
the wraparound protection read-only state.
Normal tables
For normal (that is non-temporary) tables, continue with the next steps below here to see if anything is blocking the clean-up, if the VACUUM is running fast enough, and it the most important table is being vacuumed.
Check for a stuck transaction ID
One possible reason why the system can run out of transaction IDs is that
PostgreSQL can't freeze (that is, mark as visible to all transactions)
any transaction IDs created after the oldest currently running transaction
started. This is because of multiversion concurrency control (MVCC) rules. In extreme
cases, such transactions can become so old that they make it impossible for
VACUUM to clean up any old transactions for the entire 2 billion
transaction ID wraparound limit and cause the whole system to stop accepting new
DML. You typically also see warnings in the log file, saying WARNING: oldest
xmin is far in the past
.
You should move on to optimization only after the stuck transaction ID has been remediated.
Here are four potential reasons why there might be a stuck transaction ID, with information on how to mitigate each of them:
- Long running transactions: Identify them and cancel or terminate the backend to unblock the vacuum.
- Orphaned prepare transactions: Roll back these transactions.
- Abandoned replication slots: Drop the abandoned slots.
- Long running transaction on replica, with
hot_standby_feedback = on
: Identify them and cancel or terminate the backend to unblock the vacuum.
For these scenarios, the following query returns the age of the oldest transaction and the number of transactions left until wraparound:
WITH q AS ( SELECT (SELECT max(age(backend_xmin)) FROM pg_stat_activity WHERE state != 'idle' ) AS oldest_running_xact_age, (SELECT max(age(transaction)) FROM pg_prepared_xacts) AS oldest_prepared_xact_age, (SELECT max(greatest(age(catalog_xmin),age(xmin))) FROM pg_replication_slots) AS oldest_replication_slot_age, (SELECT max(age(backend_xmin)) FROM pg_stat_replication) AS oldest_replica_xact_age ) SELECT *, 2^31 - oldest_running_xact_age AS oldest_running_xact_left, 2^31 - oldest_prepared_xact_age AS oldest_prepared_xact_left, 2^31 - oldest_replication_slot_age AS oldest_replication_slot_left, 2^31 - oldest_replica_xact_age AS oldest_replica_xact_left FROM q;
This query might return any of the *_left values reported close to or less than 1 million away from wraparound. This value is the wraparound protection limit when PostgreSQL stops accepting new write commands. In this case, see either Remove VACUUM blockers or Tune VACUUM.
For example, the preceding query might return:
┌─[ RECORD 1 ]─────────────────┬────────────┐ │ oldest_running_xact_age │ 2146483655 │ │ oldest_prepared_xact_age │ 2146483655 │ │ oldest_replication_slot_age │ ¤ │ │ oldest_replica_xact_age │ ¤ │ │ oldest_running_xact_left │ 999993 │ │ oldest_prepared_xact_left │ 999993 │ │ oldest_replication_slot_left │ ¤ │ │ oldest_replica_xact_left │ ¤ │ └──────────────────────────────┴────────────┘
where oldest_running_xact_left
and oldest_prepared_xact_left
are within the 1
million wraparound protection limit. In this case, you must first remove the blockers for the
VACUUM to be able to proceed.
Remove VACUUM blockers
Long-running transactions
In the preceding query, if oldest_running_xact
is equal to
oldest_prepared_xact
, then go to the
Orphaned prepare transaction section, because the latest running
value includes also the prepared transactions.
You might first need to run the following command as the postgres
user:
GRANT pg_signal_backend TO postgres;
If the offending transaction belongs to any of the system users (starting with
cloudsql...
), you can't cancel it directly. You must restart the
database to cancel it.
To identify a long-running query, and cancel or terminate it to unblock the
vacuum, first select a few of the oldest queries. The LIMIT 10
line helps
fit the result on the screen. You might need to repeat this after resolving
the oldest running queries.
SELECT pid, age(backend_xid) AS age_in_xids, now() - xact_start AS xact_age, now() - query_start AS query_age, state, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY 2 DESC LIMIT 10;
If age_in_xids
comes back as NULL
, this means the
transaction has not been allocated a permanent transaction ID and can be safely
ignored.
Cancel the queries where the xids_left_to_wraparound
is
approaching 1M.
If state
is active
, then the query can be cancelled
using SELECT pg_cancel_backend(pid);
. Otherwise, you need
to terminate the whole connection using SELECT pg_terminate_backend(pid);
,
where pid is the pid
from the previous query
Orphaned prepare transactions
List all prepared transactions:
DB_NAME=> SELECT age(transaction),* FROM pg_prepared_xacts ; ┌─[ RECORD 1 ]┬───────────────────────────────┐ │ age │ 2146483656 │ │ transaction │ 2455493932 │ │ gid │ trx_id_pin │ │ prepared │ 2021-03-03 16:54:07.923158+00 │ │ owner │ postgres │ │ database │ DB_NAME │ └─────────────┴───────────────────────────────┘
Roll back the oldest orphaned prepared transaction(s) by
using the gid
from the last query (in this case, trx_id_pin
) as
the transaction ID:
ROLLBACK PREPARED trx_id_pin;
Alternatively, commit it:
COMMIT PREPARED trx_id_pin;
See the SQL ROLLBACK PREPARED documentation for a full explanation.
Abandoned replication slots
In case the replication slot is abandoned because the existing
replica is either stopped, paused, or has some other issue, you can
delete the replica from gcloud
or Google Cloud console.
First, check that the replica is not disabled as described in Managing read replicas. If the replica is disabled, enable it again. If the lag still stays high, delete the replica,
The replication slots are visible in the pg_replication_slots
system view.
The following query fetches the relevant info:
SELECT *, age(xmin) AS age FROM pg_replication_slots; ┌─[ RECORD 1 ]────────┬─────────────────────────────────────────────────┐ │ slot_name │ cloudsql_1_355b114b_9ff4_4bc3_ac88_6a80199bd738 │ │ plugin │ ¤ │ │ slot_type │ physical │ │ datoid │ ¤ │ │ database │ ¤ │ │ active │ t │ │ active_pid │ 1126 │ │ xmin │ 2453745071 │ │ catalog_xmin │ ¤ │ │ restart_lsn │ C0/BEF7C2D0 │ │ confirmed_flush_lsn │ ¤ │ │ age │ 59 │ └─────────────────────┴─────────────────────────────────────────────────┘
In this example, the pg_replication_slots
value is healthy (age == 59).
If the age was near 2 billion, you would want to delete the
slot. There is no easy way to know which replica is which in case the query
returns multiple records. So, check them all in case there is a long-running
transaction on any replica.
Long-running transactions on replicas
Check replicas for the oldest running transaction with hot_standby_feedback
set to on
and disable it on the replica.
The backend_xmin
column in the pg_stat_replication
view has the oldest TXID
needed on the replica.
To move it forward, stop the query that holds it back on the replica. To discover which query is holding it back, use the query in Long running transactions, but this time, run it on the replica.
Another option is to restart the replica.
Configure VACUUM
Set the following two flags:
- autovacuum_vacuum_cost_delay = 0
- autovacuum_work_mem = 1048576
The first disables any disk throttling for vacuuming by PostgreSQL so VACUUM can run at full speed. By default, autovacuum is throttled so it does not use up all disk IO on the slowest servers.
The second flag, autovacuum_work_mem
, decreases the number of index
cleanup passes. If possible, it should be large enough to store all IDs of dead
rows in a table that VACUUM is going to clean up. When setting this value,
consider that this is the maximum amount of local memory each running VACUUM
can allocate. Make sure that you're not allowing more than is available, with
some left in reserve. If you leave the database running in read-only mode, then
also consider the local memory used for read-only queries.
On most systems, use the maximum value (1 GB or 1048576 kB, as shown in the sample). This value fits up to about 178 million dead tuples. Any more still causes multiple index scan passes.
These and other flags are explained in more detail in Optimizing, monitoring, and troubleshooting VACUUM operations in PostgreSQL.
After setting these flags, restart the database so that autovacuum starts with the new values.
You can use the pg_stat_progress_vacuum
view to monitor the progress of
autovacuum-started VACUUMs. This view shows VACUUMs running in all
databases, and for tables (relations) from other databases that you can't look up
the table name using the view column relid
.
To identify the databases and tables that need vacuuming next, use queries from Optimizing, monitoring, and troubleshooting VACUUM operations in PostgreSQL. If the server VM is powerful enough and has the bandwidth for more parallel VACUUM processes than started by autovacuum, you can start some manual vacuums.
Check VACUUM speed
This section describes how to check VACUUM speed and how to accelerate it, if needed.
Check running autovacuums
All backends running VACUUM are visible in the system view pg_stat_progress_vacuum.
If the current phase is scanning heap
, then you can monitor
progress by watching changes in the column heap_blks_scanned
.
Unfortunately, there is no easy way to determine scan speed in other phases.
Estimate the VACUUM scan speed
To estimate the scan speed, you need to first store the base values and then
calculate the change over time to estimate the completion time. First, you need
to save a snapshot of heap_blks_scanned
together with a timestamp
by using the following snapshot query:
SELECT set_config('save.ts', clock_timestamp()::text, false), set_config('save.heap_blks_scanned', heap_blks_scanned::text, false) FROM pg_stat_progress_vacuum WHERE datname = 'DB_NAME';
Since we can't save anything in tables that are already in wraparound, use
set_config(flag, value)
to set two user-defined flags - save.ts
and save.heap_blks_scanned
- to the current values from
pg_stat_progress_vacuum
.
In the next query, we use these two as the comparison base to determine speed and estimate completion time.
NOTE: WHERE datname = DB_NAME
restricts the investigation to one
database at a time. This number is enough if there is only one autovacuum running in
this database, with more than one row per database. Extra filter conditions
('AND relid= …'')
need to be added to WHERE to indicate a single
autovacuum row. This is also true for the next query.
Once you've saved the base values, you can run the following query:
with q as ( SELECT datname, phase, heap_blks_total, heap_blks_scanned, clock_timestamp() - current_setting('save.ts')::timestamp AS ts_delta, heap_blks_scanned - current_setting('save.heap_blks_scanned')::bigint AS scanned_delta FROM pg_stat_progress_vacuum WHERE datname = DB_NAME ), q2 AS ( SELECT *, scanned_delta / extract('epoch' FROM ts_delta) AS pages_per_second FROM q ) SELECT *, (heap_blks_total - heap_blks_scanned) / pages_per_second AS remaining_time FROM q2 ;
┌─[ RECORD 1 ]──────┬──────────────────┐ │ datname │ DB_NAME │ │ phase │ scanning heap │ │ heap_blks_total │ 9497174 │ │ heap_blks_scanned │ 18016 │ │ ts_delta │ 00:00:40.30126 │ │ as_scanned_delta │ 11642 │ │ pages_per_second │ 288.87434288655 │ │ remaining_time │ 32814.1222418038 │ └───────────────────┴──────────────────┘
This query compares the current values to the save base values and calculates
pages_per_second
and remaining_time
, which lets us decide if
VACUUM is running fast enough or if we want to speed it up. The
remaining_time
value is only for the scanning heap
phase.
Other phases also take time, sometimes even more. You can read more on vacuuming
and view blog posts on the internet discussing some of the complex
aspects of vacuum.
Speed up VACUUM
The easiest and fastest way to make VACUUM scan faster is setting
autovacuum_vacuum_cost_delay=0
. This can be done from the
Google Cloud console.
Unfortunately, the already running VACUUM does not pick up this value and you might need to restart the database.
After a restart, you might see a result similar to the following:
┌─[ RECORD 1 ]──────┬──────────────────┐ │ datname │ DB_NAME │ │ phase │ scanning heap │ │ heap_blks_total │ 9497174 │ │ heap_blks_scanned │ 222382 │ │ ts_delta │ 00:00:21.422615 │ │ as_scanned_delta │ 138235 │ │ pages_per_second │ 6452.76031894332 │ │ remaining_time │ 1437.33713040171 │ └───────────────────┴──────────────────┘
In this sample, the speed increased from <300 pages/sec to ~6500 pages/sec, and the expected remaining time for the heap scanning phase decreased from 9 hours to 23 minutes.
The scan speed of the other phases is not as easy to measure, but they should show a similar speedup.
Also consider making autovacuum_work_mem
as large as possible to
avoid multiple passes over indexes. An index pass happens each time the memory
is filled with dead tuple pointers.
If the database is not being used otherwise, set autovacuum_work_mem
to have ~80% of memory free after allowing the required amount for shared_buffers
.
This is the upper limit for each of the autovacuum-started VACUUM processes. If you
want to continue running read-only workloads, use less memory.
Other ways to improve speed
Avoid vacuuming indexes
For huge tables, VACUUM spends most of the time cleaning up indexes.
PostgreSQL 14 has special optimizations for avoiding index cleanup if the system is in danger of wraparound.
In PostgreSQL 12 and 13, you can manually run the following statement:
VACUUM (INDEX_CLEANUP OFF, TRUNCATE OFF) <tablename>;
In versions 11 and older, you can DROP
the index before running vacuum and recreate it later.
Dropping the index when an autovacuum is already running on that table requires cancelling the running vacuum and then immediately executing the drop index command before the autovacuum manages to start vacuum on that table again.
First, run the following statement to find the PID of the autovacuum process you need to terminate:
SELECT pid, query
FROM pg_stat_activity
WHERE state != 'idle'
AND query ilike '%vacuum%';
Then run the following statements to terminate the running vacuum and drop one or more indexes:
SET cloudsql.enable_maintenance_mode = 'on'; /* get extra transaction ids */
SELECT pg_terminate_backend(<pid>);DROP INDEX <index1>;DROP INDEX <index2>; ...
Drop the offending table
In some rare cases, you can drop the table. For example, if it's a table that's easy to restore from another source like a backup or other database.
You still need to use cloudsql.enable_maintenance_mode = 'on'
and likely
also terminate the VACUUM on that table as shown in the previous section.
VACUUM FULL
In rare cases, it's faster to run VACUUM FULL FREEZE
,
usually when the table has only a small proportion of live tuples.
This can be checked from the pg_stat_user_tables
view (unless there has been a
crash which has wiped out the statistics).
The VACUUM FULL
command copies live tuples to a new file, so enough space
has to be available for the new file and its indexes.
What's next
- Learn more about VACUUM for wraparound
- Learn more about routine vacuuming.
- Learn more about automatic vacuuming
- Learn more about Optimizing, monitoring, and troubleshooting VACUUM operations in PostgreSQL