High CPU utilization adversely affects the performance of your instance. Any activity that's performed on the instance uses CPU. Hence, if there's a high CPU utilization notice, you should first identify the root cause of the issue, whether that's poorly written queries, long running transactions, or any other database activity.
This document describes the ways to identify CPU bottlenecks in an instance and mitigate the CPU utilization issues in the instance.
Identify the CPU bottlenecks
Use query insights to identify queries that have high CPU consumption
Query insights helps you detect, diagnose, and prevent query performance problems for Cloud SQL databases.
Use the pg_proctab extension with the combination of the
to get operating system outputs that give per process CPU utilization information.
Identify the active connections by state
Each active connection to the database takes some amount of CPU, so if the instance has a high number of connections, cumulative utilization could be high. Use the following query to get the information about the number of connections by state.
SELECT state, usename, count(1) FROM pg_stat_activity WHERE pid <> pg_backend_pid() group by state, usename order by 1;
The output looks similar to the following:
state | usename | count ---------------------+---------------+------- active | ltest | 318 active | sbtest | 95 active | | 2 idle | cloudsqladmin | 2 idle in transaction | ltest | 32 idle in transaction | sbtest | 5 | cloudsqladmin | 3 | | 4 (8 rows)
If the active connections count is high, check for long running queries or wait events that are blocking the queries from executing.
If the idle connection count is high, execute the following query to terminate the connections, after taking the necessary approvals.
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE usename = 'sbtest' and pid <> pg_backend_pid() and state in ('idle');
You can also terminate the connections individually with
pg_terminate_backend using the following query:
SELECT pg_terminate_backend (<pid>);
Here, you can obtain the PID from
Identify the long running connections
Here's an example of a query that returns long running queries. In this case, you can identify the queries that have been active for more than 5 minutes.
SELECT pid, query_start, xact_start, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE ( now() - pg_stat_activity.query_start ) > interval '5 minutes' order by 4 desc;
Review the explain plan to identify poorly written queries
Use the EXPLAIN PLAN to investigate a poorly written query and rewrite the query, if necessary. Optionally, consider cancelling the long running query with the following command with necessary approvals.
Monitor VACUUM activity
The AUTOVACUUM activity that clears the dead tuples is a CPU-intensive operation. If your instance uses PostgreSQL version 11 or later, use the following query to check if there's any active AUTOVACUUM or VACUUM activity in progress.
SELECT relid :: regclass, pid, phase, heap_blks_total, heap_blks_scanned, heap_blks_vacuumed, index_vacuum_count, max_dead_tuples, num_dead_tuples FROM pg_stat_progress_vacuum;
Check if there's an ongoing VACUUM activity in an instance using the following query:
SELECT pid, datname, usename, query FROM pg_stat_activity WHERE query like '%vacuum%';
Additonally, you can optimize and troubleshoot VACUUM operations in PostgreSQL.
Add pg_stat_statements extension
Set up the
pg_stat_statements extension to get enhanced dictionary information about instance activity.
Frequent checkpoints degrade performance. Consider adjusting the
checkpoint_timeout flag if the PostgreSQL alert log reports the
checkpoint occurring too frequently warning.
Ensure that the query planner has the latest statistics about tables to choose the best plan for queries. The ANALYZE operation collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries. The AUTOVACUUM process automatically analyzes the tables periodically, so run the following command to check if all the tables have been analyzed and have the latest metadata available to the planner.
SELECT relname, last_autovacuum, last_autoanalyze FROM pg_stat_user_tables;
Inadequate system settings
There are other factors and flag settings or system factors that influence the performance of your query. Run the following query to check the wait events and wait event type to get the insights about the performance of other system settings.
SELECT datname, usename, ( case when usename is not null then state else query end ) AS what, wait_event_type, wait_event, backend_type, count(*) FROM pg_stat_activity GROUP BY 1, 2, 3, 4, 5, 6 ORDER BY 1, 2, 3, 4 nulls first, 5, 6;
The output looks similar to this:
.. | .. | what | wait_event_type | wait_event | .. | count -..--+-..-+----------------+-----------------+----------------------+-..----+------ .. .. | .. | active | IO | CommitWaitFlush | .. | 750 .. | .. | idle | IO | CommitWaitFlush | .. | 360 .. | .. | active | LWLock | BufferMapping | .. | 191
Monitor sequential scans
Frequent sequential scans over tables of more than a few tens of rows usually indicate a missing index. When the scans touch thousands or even hundreds of thousands of rows, it can cause excessive CPU usage.
Frequent sequential scans on tables with hundreds of thousands of rows can cause excessive CPU usage. Avoid sequential scans on such tables by creating necessary indices.
Run the following query to check the number of times sequential scans are initiated on any table.
SELECT relname, idx_scan, seq_scan, n_live_tup FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY n_live_tup desc;
Lastly, if the CPU is still high and you feel those queries are legitimate traffic then consider increasing the CPU resources in your instance to avoid database crash or downtime.