High CPU utilization adversely affects the performance of your instance. Any activity that's performed on the instance uses CPU. Hence, if there's a high CPU utilization notice, you should first identify the root cause of the issue, whether that's poorly written queries, long running transactions, or any other database activity.
This document describes the ways to identify CPU bottlenecks in an instance and mitigate the CPU utilization issues in the instance.
Identify the CPU bottlenecks
Use query insights to identify queries that have high CPU consumption
Query insights helps you detect, diagnose, and prevent query performance problems for Cloud SQL databases.
Use the pg_proctab
extension
Use the pg_proctab extension with the combination of the pg_top
utility
to get operating system outputs that give per process CPU utilization information.
Use queries
Identify the active connections by state
Each active connection to the database takes some amount of CPU, so if the instance has a high number of connections, cumulative utilization could be high. Use the following query to get the information about the number of connections by state.
SELECT
state,
usename,
count(1)
FROM
pg_stat_activity
WHERE
pid <> pg_backend_pid()
group by
state,
usename
order by
1;
The output looks similar to the following:
state | usename | count
---------------------+---------------+-------
active | ltest | 318
active | sbtest | 95
active | | 2
idle | cloudsqladmin | 2
idle in transaction | ltest | 32
idle in transaction | sbtest | 5
| cloudsqladmin | 3
| | 4
(8 rows)
If the active connections count is high, check for long running queries or wait events that are blocking the queries from executing.
If the idle connection count is high, execute the following query to terminate the connections, after taking the necessary approvals.
SELECT
pg_terminate_backend(pid)
FROM
pg_stat_activity
WHERE
usename = 'sbtest'
and pid <> pg_backend_pid()
and state in ('idle');
You can also terminate the connections individually with pg_terminate_backend
using the following query:
SELECT pg_terminate_backend (<pid>);
Here, you can obtain the PID from pg_stat_activity
.
Identify the long running connections
Here's an example of a query that returns long running queries. In this case, you can identify the queries that have been active for more than 5 minutes.
SELECT
pid,
query_start,
xact_start,
now() - pg_stat_activity.query_start AS duration,
query,
state
FROM
pg_stat_activity
WHERE
(
now() - pg_stat_activity.query_start
) > interval '5 minutes' order by 4 desc;
Review the explain plan to identify poorly written queries
Use the EXPLAIN PLAN to investigate a poorly written query and rewrite the query, if necessary. Optionally, consider cancelling the long running query with the following command with necessary approvals.
SELECT pg_cancel_backend(<pid>);
Monitor VACUUM activity
The AUTOVACUUM activity that clears the dead tuples is a CPU-intensive operation. If your instance uses PostgreSQL version 11 or later, use the following query to check if there's any active AUTOVACUUM or VACUUM activity in progress.
SELECT
relid :: regclass,
pid,
phase,
heap_blks_total,
heap_blks_scanned,
heap_blks_vacuumed,
index_vacuum_count,
max_dead_tuples,
num_dead_tuples
FROM
pg_stat_progress_vacuum;
Check if there's an ongoing VACUUM activity in an instance using the following query:
SELECT
pid,
datname,
usename,
query
FROM
pg_stat_activity
WHERE
query like '%vacuum%';
Additonally, you can optimize and troubleshoot VACUUM operations in PostgreSQL.
Add pg_stat_statements extension
Set up the pg_stat_statements
extension to get enhanced dictionary information about instance activity.
Frequent checkpoints
Frequent checkpoints degrade performance. Consider adjusting the checkpoint_timeout
flag if the PostgreSQL alert log reports the checkpoint occurring too frequently
warning.
Gather statistics
Ensure that the query planner has the latest statistics about tables to choose the best plan for queries. The ANALYZE operation collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries. The AUTOVACUUM process automatically analyzes the tables periodically, so run the following command to check if all the tables have been analyzed and have the latest metadata available to the planner.
SELECT
relname,
last_autovacuum,
last_autoanalyze
FROM
pg_stat_user_tables;
Inadequate system settings
There are other factors and flag settings or system factors that influence the performance of your query. Run the following query to check the wait events and wait event type to get the insights about the performance of other system settings.
SELECT
datname,
usename,
(
case when usename is not null then state else query end
) AS what,
wait_event_type,
wait_event,
backend_type,
count(*)
FROM
pg_stat_activity
GROUP BY
1,
2,
3,
4,
5,
6
ORDER BY
1,
2,
3,
4 nulls first,
5,
6;
The output looks similar to this:
.. | .. | what | wait_event_type | wait_event | .. | count
-..--+-..-+----------------+-----------------+----------------------+-..----+------
..
.. | .. | active | IO | CommitWaitFlush | .. | 750
.. | .. | idle | IO | CommitWaitFlush | .. | 360
.. | .. | active | LWLock | BufferMapping | .. | 191
Monitor sequential scans
Frequent sequential scans over tables of more than a few tens of rows usually indicate a missing index. When the scans touch thousands or even hundreds of thousands of rows, it can cause excessive CPU usage.
Frequent sequential scans on tables with hundreds of thousands of rows can cause excessive CPU usage. Avoid sequential scans on such tables by creating necessary indices.
Run the following query to check the number of times sequential scans are initiated on any table.
SELECT
relname,
idx_scan,
seq_scan,
n_live_tup
FROM
pg_stat_user_tables
WHERE
seq_scan > 0
ORDER BY
n_live_tup desc;
Lastly, if the CPU is still high and you feel those queries are legitimate traffic then consider increasing the CPU resources in your instance to avoid database crash or downtime.