Optimize high memory usage in instances

MySQL | PostgreSQL | SQL Server

This document covers how to identify high memory usage for Cloud SQL instances and provides recommendations on how to solve memory-related issues.

To learn how to configure memory usage for a Cloud SQL instance, see Best practices for managing memory usage.

Identify high memory usage

The following sections discuss high memory usage scenarios.

Use Metrics Explorer to identify the memory usage

You can review memory usage of the instance with the database/memory/components.usage metric in Metrics Explorer.

Use Query insights to analyse explain plan for queries that are consuming high resources

Query insights helps you detect, diagnose, and prevent query performance problems for Cloud SQL databases. Query insights gives you a list of long running queries along with their explain plan (PostgreSQL documentation). Review the explain plan and identify the part of the query that has a high memory usage scan method. Regardless of the query run time, query insights gives you the explain plan for all queries. Identify the complex queries that are taking more time so that you know which queries are blocking the memory for longer durations.

Common PostgreSQL scan methods that use high memory include the following:

Bitmap heap scan
Quick sort
Hash join or Hash

High memory usage and relevant logs for Gemini-enabled instances

If you have Gemini enabled, instead of an out-of-memory (OOM) failure which leads to database downtime, a connection executing a query with high memory usage is terminated thereby preventing database downtime. To identify the defaulting query, you can check the database logs for the following entries:

  (...timestamp….) db=postgres, user=customer FATAL: terminating connection due to administrator command

The following Cloud SQL for PostgreSQL database log is displayed which captures the high memory usage query that was terminated to prevent OOM. The query is a normalized version of the original query:

  db=postgres,user=customer LOG:  postgres process with PID 1734 for the query "SELECT COUNT(product) AS item_count FROM test_table WHERE product_type = $1 AND product LIKE $2 AND c6_2 IN ($3,$4,$5,$6,$7)" has been cancelled.

Notifications are also displayed on the Cloud SQL Instances page for the following events:

Memory utilization of the instance over the last 24 hours.
List of normalized queries that have been canceled within the past 24 hours.
A link to Google documentation about optimizing memory usage.

High Memory Usage - Recommendations

The following recommendations address the common memory-related problems. If the instance continues to use a high amount of memory, chances are high that it eventually gets an out of memory issue. If the memory demands of either PostgreSQL or another process cause the system to run out of memory, you see an Out of Memory kernel message in PostgreSQL logs and the PostgreSQL instance is eventually stopped. For example:

Out of Memory: Killed process 12345 (postgres)

The most common instance where you see an OOM issue is with a higher value of work_mem with a high number of active connections. Therefore, if you are getting frequent OOMs or to avoid OOMs in your Cloud SQL for PostgreSQL instance, you should consider following these recommendations:

Set work_mem

Queries that use quick sort are faster than the ones using external merge sort. However, the former may lead to memory exhaustion. To resolve this issue, set the work_mem value reasonable enough such that it balances both, the sort operations happening in the memory and disk. You can also consider setting work_mem on a session level rather than setting it for an entire instance.
Monitor the active sessions

Each connection uses a certain amount of memory. Use the following query to check the active connections count:
```
  SELECT
    state,
    usename,
    count(1)
  FROM
    pg_stat_activity
  WHERE
    pid <> pg_backend_pid()
  GROUP BY
    state,
    usename
  ORDER BY
    1;
```
If you have a large number of active sessions, analyze the root cause for a high number of active sessions; for example, transaction locks.
Set shared_buffers

If shared_buffers is set to a higher value, consider decreasing the shared_buffers value so that the memory can be used for other operations, such as work_mem, or for establishing new connections.

Cache hit ratio

PostgreSQL generally tries to keep the data you access most often in the cache. When the data is requested by a client, if it's already cached in shared buffers, it's directly given to the client. This is called a cache hit. If the data is not present in shared buffers, the data is first fetched to shared buffers from a disk and then given to the client. This is called a cache miss. Cache hit ratio measures how many content requests the cache has handled compared to the requests received. Run the following query to check the cache hit ratio for the table requests in the PostgreSQL instance:
```
SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit)  as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
  pg_statio_user_tables;
```
Run the following query to check the cache hit ratio for the index requests in the PostgreSQL instance:
```
  SELECT
    sum(idx_blks_read) as idx_read,
    sum(idx_blks_hit)  as idx_hit,
    (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit) as ratio
  FROM
    pg_statio_user_indexes;
```
Generally, 95 to 99% of cache hit ratio is considered to be a good value.
In Cloud SQL for PostgreSQL, the huge_pages flag is enabled by default for better memory management. To learn more about huge_pages, see PostreSQL documentation.
Set max_locks_per_transaction

The max_locks_per_transaction value indicates the number of database objects that can be locked simultaneously. In most cases, the default value of 64 is sufficient. However, if you are dealing with a large dataset, you may end up with OOMs. Consider increasing the value of max_locks_per_transaction high enough to avoid OOMs.

Note: The object-level locking happens in memory on an ongoing transaction, Increasing the max_prepared_transactions value unreasonably high can cause the database instance to request more shared memory.

The max_locks_per_transaction value should be max_locks_per_transaction * (max_connections + max_prepared_transactions) objects. This means that if you have 300 thousand objects, and if the value of max_connections is 200, then max_locks_per_transaction should be 1500.
Set max_pred_locks_per_transaction

The transaction might fail if you have clients that touch many different tables in a single serializable transaction. In that scenario, consider increasing max_pred_locks_per_transaction to a reasonably high value. Like max_locks_per_transaction, max_pred_locks_per_transaction also uses shared memory so don't set an unreasonable high value.
If the memory usage is still high and you feel those queries are legitimate traffic, then consider increasing the number of memory resources in your instance to avoid database crash or downtime.

What's next

Google Cloud recommenders