AlloyDB for PostgreSQL under the hood: adaptive autovacuum
Sridhar Ranganathan
Product Manager, Google Cloud Databases
Yingjie He
Software Engineer, Google Cloud Databases
Google Cloud's AlloyDB is the next-generation managed PostgreSQL service that is designed to handle cloud-scale operational and analytical workloads. It has several autopilot features that allow it to self-update and self-tune, including automatic memory management and adaptive autovacuum. This blog focuses on how AlloyDB's adaptive autovacuum feature mitigates PostgreSQL VACUUM challenges and runs the process in the most efficient and uninterrupted manner.
PostgreSQL's Multi Version Concurrency Control (MVCC) enables multiple transactions to occur simultaneously without blocking each other by creating multiple versions of each row (tuples). It achieves this by creating multiple versions of each row, where each version corresponds to a different point in time. In MVCC, each transaction is assigned a unique Transaction ID (XID), which represents the order of the transaction's execution. PostgreSQL’s MVCC causes two issues.
1. Transaction ID wraparound: PostgreSQL transaction IDs or XIDs are 32-bit unsigned integers that are assigned to each transaction and also get incremented. When they reach their maximum value, it would wrap around to zero (similar to a ring buffer) and can lead to data corruption.
2. Bloat: The large accumulation of obsolete data in tables, indexes, and system catalogs leads to bloat. This over time reduces database performance as the query planner’s accuracy is impacted and also the read operations have to go through more pages.
To address these issues, PostgreSQL has a VACUUM process. When invoked manually, it scans the tables and eliminates inactive tuples and updates table statistics. Users usually configure AUTOVACUUM, an automated background process in PostgreSQL that triggers the VACUUM process based on parameters such as autovacuum_vacuum_cost_limit, autovacuum_vacuum_cost_delay, vacuum_cost_page_hit, vacuum_cost_page_miss, vacuum_cost_page_dirty.
However, there are some of the challenges of PostgreSQL AUTOVACUUM:
The default autovacuum settings may not be sufficient for all workloads. It can be challenging to determine the optimal settings for a particular database and even specific tables.
The autovacuum process can be a source of contention, especially on busy systems. If not properly configured or managed, the autovacuum process can have a negative impact on performance.
The autovacuum has fixed resource budgets (cost limit, number of vacuum worker processes, and memory). It cannot automatically adjust vacuum workload based on the customer workload and available resources.
For databases with very high transaction rates where the VACUUM process is not able to keep up, that can lead to increase in open XIDs, which will eventually lead to a XID wrap-around situation and cause long system downtime.
VACUUM that lags behind can also cause table space bloat with a lot of dead tuples and index entries. This increases unnecessary storage usage, which in turn impacts backup and restore times. This also affects query performance.
Autovacuum settings need to be adjusted carefully to suit the workload to avoid availability and performance issues. But it can be difficult to adjust vacuum settings, if the workload changes all the time.
AlloyDB’s adaptive autovacuum
AlloyDB for PostgreSQL is designed to handle mission-critical operational and analytical workloads. Large and dynamic workloads make it difficult to manually tune the autovacuum settings. AlloyDB’s adaptive autovacuum is a feature that automatically adjusts the frequency of vacuuming and analyzes operations based on the workload of the database. This helps to ensure that the database is always running at peak performance, even as the workload changes, without any interruption from the vacuum process. AlloyDB’s adaptive autovacuum goals are to:
Ensure reliable and consistent application transactional performance
Maintain high availability of the system by avoiding XID wraparound problems
Enable a hands-off approach to vacuum tuning, freeing the DBAs from having to manually tune the settings for each workload
Honor any autovacuum settings updated/tuned by users and adjust adaptive settings accordingly
AlloyDB’s adaptive autovacuum process monitors and updates the autovacuum-related PostgreSQL parameter values in real time. For example, multiple autovacuum workers can run concurrently on different tables, controlled by the autovacuum_max_workers parameter. The maintenance_work_mem parameter value defines the work memory used by each autovacuum worker. AlloyDB adjusts these parameters dynamically.
How does adaptive autovacuum work?
AlloyDB Adaptive autovacuum uses a number of factors to determine the frequency of vacuuming and analyze operations, including:
The size of the database
The number of dead tuples in the database
The age of the data in the database
The number of transactions per second vs estimated vacuum speed (for XID throttling, see #2 below)
The following are the adaptive autovacuum improvements and automatically adjusted settings in AlloyDB:
1. Dynamic vacuum resource management: Instead of using a fixed cost limit, AlloyDB uses real-time resource statistics to adjust the vacuum workers. When the system is busy, the vacuum process and resources are throttled. If enough memory is available, additional memory is allocated for vacuum workers to accelerate index vacuum.
2. Dynamic XID Throttling: AlloyDB automatically and continuously monitors the progress of vacuuming and the speed of XID consumption. If a risk of XID wraparound is detected, AlloyDB will gently begin to throttle XID consumption by slowing down transactions. It also allocates more resources to vacuuming so that vacuuming can catch up and return to the safe zone. During this process, the overall transactions per second will be reduced until the XIDs are in the safe zone. When the XID age increases, the vacuum workers are dynamically increased.
3. Efficient vacuuming for larger tables: The default vacuum is based on table-specific statistics stored in pg_stat_all_tables which has the dead tuple ratio. That works for small tables and may not work efficiently for larger, frequently updated tables. AlloyDB has an updated scan mechanism that helps trigger the autovacuum more often that scans chunks of large tables and helps remove dead tuples more efficiently.
4. Log warning messages: In AlloyDB, the vacuum blockers such as long-running transactions, orphaned prepared transactions, orphaned replication slots etc., are detected and warnings are registered in the PostgreSQL logs so that users can handle the cases in a timely manner.
- “Found a backend process XXX with a long running transaction whose transaction id age XXX is larger than or equal to the transaction age threshold XXX.”
- "Found an old prepared transaction XXX whose transaction id age XXX is larger than or equal to the transaction age threshold XXX, database oid: XXX, owner oid: XXX"
- "Found a replication slot XXX whose min transaction id age XXX is larger than or equal to the transaction id age threshold XXX."
Enabling AlloyDB adaptive autovacuum
Adaptive Autovacuum is enabled by default, but can be disabled (or enabled again in the future) using the flag enable_google_adaptive_autovacuum.
Benefits of AlloyDB adaptive autovacuum
AlloyDB adaptive autovacuum is designed to adjust the vacuum process based on the workload’s real-time resource utilization and without requiring users to tune the vacuum parameters. However, users can still tune the autovacuum-related parameters, and will AlloyDB honor the their settings.
There are a number of benefits to using adaptive autovacuum, including:
Increased performance: Adaptive autovacuum helps to ensure that the database is always running at peak performance by removing the bloat, even as the workload changes.
Reduced maintenance: Adaptive autovacuum automatically adjusts the frequency of vacuuming and analyze operations, so you don't have to worry about it.
Improved availability: Adaptive autovacuum helps to prevent XID wraparound, and thus avoids database outages and increases availability.
In our internal AlloyDB benchmarking tests with 100% cached TPC-C runs on 64vCPU SKU, by enabling Adaptive Autovacuum, we observed on average the Frozen XID age reduced from ~1.5B to ~1B and dead tuples count dropped from ~1.2B to 0.7B, allowing the database to perform optimally over a longer period of time.
Conclusion
Adaptive autovacuum is a powerful autopilot feature that manages the vacuum process efficiently without manual intervention and can help to improve the performance and availability of your AlloyDB database. If you're not already using adaptive autovacuum, we encourage you to enable it today.
To learn more about Adaptive Autovacuum, read Configure Adaptive Autovacuum documentation.
To learn about AlloyDB, read AlloyDB for PostgreSQL intelligent scalable storage | Google Cloud Blog
Start building on Google Cloud with $300 in free credits and 20+ always free products. https://cloud.google.com/free