This document describes the fundamentals of the
VACUUM operation in
databases. It also describes the mechanisms to monitor and tune the database
engine that maintains the health of database instances.
PostgreSQL uses a snapshot-based concurrency protocol that creates multiple versions of data rows while modifying the data. These data row versions are used to read a visible version of the data using a computed snapshot without acquiring read-lock on the data row. PostgreSQL maintains transaction IDs (inserted and deleted transaction IDs) for every row of data and uses the transaction IDs along with the computed snapshot to determine the visibility of the row. As the data keeps growing due to old versions of data, the time taken to scan the data (table scan or index scan) increases. To optimize the response time of the scan operation and to use space efficiently, you need to reclaim the versions and the metadata (for example, transaction ID) that is used to maintain the versions.
VACUUM operation reclaims the deleted versions (garbage collection) and
transaction IDs (freeze transaction ID). The
VACUUM operation operates on data
in different modes with different levels of data availability. Freezing
transaction IDs is crucial to the health of the database system because the
system blocks writers whenever the used transaction ID space enters reserved
autovacuum jobs that you configure constantly try to reclaim the
transaction ID, but they can fail. This failure is either due to insufficient
configuration or because the creation rate for transaction IDs is so high that
autovacuum job cannot catch up with workload. The purpose of this document
is to show how to use the
VACUUM operations along with the mechanisms to tune
and monitor different aspects of
This document covers the following:
- Freezing transaction IDs.
- Monitoring transaction IDs.
- Reclaiming storage space.
- Configuring automated Cloud Monitoring alerts.
To read the full white paper, click the button: