Best practices for managing memory usage

This page describes how to configure memory usage for a Cloud SQL instance.

Introduction

When you create a Cloud SQL instance, you select an amount of memory for the instance. As a PostgreSQL database's workload increases, the instance's memory usage increases. Instances that consume lots of memory can create a performance bottleneck that can sometimes lead to out-of-memory issues.

When a Cloud SQL instance runs out of memory because of an increased demand, it can cause database downtime. Therefore, it's important to configure the instance memory and the memory-related database flags properly and monitor the memory usage so that the instance operates in a healthy state.

PostgreSQL memory components are broadly divided into two sections:

  • Global memory: this is shared across all processes to execute queries; for example, shared_buffers and max_connections.
  • Local memory: this is dedicated memory assigned to each connection; for example, work_mem, maintenance_work_mem, and temp_buffers.

For other configuration considerations, see General best practices and Operational guidelines.

Memory usage and flags

Whenever there's high memory usage by Cloud SQL instances, the following questions might arise:

  • Which query or process is using high memory?
  • Are the memory settings adequate for the database activity?
  • How do you change the memory settings?

When a PostgreSQL database operates, most memory usage occurs in a few areas:

  • Shared buffer: this is the shared memory that PostgreSQL allocates to hold table data for read and write operations. For the read operation, any data that's requested from disk is first fetched to RAM and then it's given to the client. Similarly, in PostgreSQL, when the data is requested (for example, SELECT * from emp), it's first fetched from disk to shared_buffers for caching, and then it's given to the client. The same thing happens with the write operation.

    Shared buffer is also the shared memory area for all processes and connections for database activities such as data caching, connection caching, and Data Manipulation Language (DML) operations. The maximum that this area can allocate is specified by the shared_buffers flag and the default is 33% of the instance's memory. If the value of shared_buffers is high, the size of the data cached in memory is high.

  • Query work memory: as a query is run, PostgreSQL allocates local memory for each operation such as sorting and hashing. The maximum it can allocate for each operation of a query before writing to temporary disk files is configured by the work_mem flag, and the default value is 4 MB. If the value of work_mem is high, the amount of data that can be sorted in the memory is high.
  • Maintenance work memory: some maintenance operations such as VACUUM, CREATE INDEX, ALTER TABLE, and ADD FOREIGN KEY require separate local memory that PostgreSQL allocates. The maximum amount for the back-end process that these operations use can be configured by the maintenance_work_mem flag and the default value is 64 MB. Note that autovacuum workers also use maintenance work memory and the maximum can be overridden by the autovacuum_work_mem flag. If the value of maintenance_work_mem is high, the performance speed of the VACUUM operation is high.
  • Temporary buffers: when a temporary table is used in a database session, PostgreSQL allocates temporary buffers to hold the session-local temporary table. The maximum amount can be specified by the temp_buffers flag and the default value is 8 MB.
  • Database connection: when a client connects to the database, PostgreSQL creates a back-end process to serve the client session. Besides the memory to run the query, PostgreSQL allocates additional memory to maintain information such as system catalog cache and prepared query plans. The maximum number of concurrent connections allowed to the database server can be configured by the max_connections flag. Each idle connection uses approximately 2 MB to 3 MB of shared memory. If the value of max_connections is high, the instance can make more connections, but at the cost of the memory.

For the complete list of memory components in PostgreSQL, see the PostgreSQL documentation. To change or modify the flags listed in this section, see Configure database flags.

Monitor memory usage

Monitor your instance's memory in Cloud Monitoring regularly and keep it below the memory limit. A good practice is to set an alert in Cloud Monitoring to alert when the usage exceeds 90% of the limit for 6 hours. This alert can warn you when your memory usage is close to the limit constantly.

Additionally, monitor for out-of-memory incidents. To do that, set up a log-based metric for the server process .* was terminated by signal 9: Killed message in Cloud Monitoring to count the out-of-memory events, and then alert each time such an event happens.

If your instance operates constantly above 90% of the memory's limit or an out-of-memory event occurs, you can increase the instance's memory. Alternatively, you can reduce the memory usage by limiting the number of database connections or lowering database flags such as shared_buffers, work_mem, or max_connections. Lowering these flags can limit the performance of your instance.

Out of memory

When there's insufficient memory to handle the database workload, as a last resort, the underlying Linux operating system uses the out-of-memory (OOM) killer to end a process to release memory. Cloud SQL is configured so that the OOM killer targets only the PostgreSQL worker processes. The postmaster process is preserved in this situation so that it only has to end all existing database connections and run a recovery to protect the integrity of the database. If this happens, there are moments of service disruption and downtime to the database. In the PostgreSQL database log, messages like the following appear:

2021-10-24 23:34:22.265 UTC [7]: [663-1] db=,user= LOG: server process (PID 1255039) was terminated by signal 9: Killed
2021-10-24 23:34:22.265 UTC [7]: [664-1] db=,user= DETAIL: Failed process was running: SELECT * FROM tab ORDER BY col
2021-10-24 23:34:22.277 UTC [7]: [665-1] db=,user= LOG: terminating any other active server processes
2021-10-24 23:34:22.278 UTC [1255458]: [1-1] db=postgres,user=postgres WARNING: terminating connection because of crash of another server process
2021-10-24 23:34:22.278 UTC [1255458]: [2-1] db=postgres,user=postgres DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-10-24 23:34:22.278 UTC [1255458]: [3-1] db=postgres,user=postgres HINT: In a moment you should be able to reconnect to the database and repeat your command.
2021-10-24 23:34:22.278 UTC [1255458]: [4-1] db=postgres,user=postgres CONTEXT: while updating tuple (27,18) in relation "tab"
...
2021-10-24 23:34:22.558 UTC [1255477]: [1-1] db=postgres,user=postgres FATAL: the database system is in recovery mode
...
2021-10-24 23:34:25.579 UTC [7]: [666-1] db=,user= LOG: all server processes terminated; reinitializing
...
2021-10-24 23:34:25.691 UTC [1255482]: [1-1] db=,user= LOG: database system was interrupted; last known up at 2021-10-24 23:31:53 UTC
2021-10-24 23:34:25.776 UTC [1255482]: [2-1] db=,user= LOG: database system was not properly shut down; automatic recovery in progress
2021-10-24 23:34:25.789 UTC [1255482]: [3-1] db=,user= LOG: redo starts at 227/AB359400
2021-10-24 23:34:38.957 UTC [1255482]: [4-1] db=,user= LOG: redo done at 229/4621F508
2021-10-24 23:34:38.959 UTC [1255482]: [5-1] db=,user= LOG: last completed transaction was at log time 2021-10-24 23:34:18.5535+00
2021-10-24 23:34:39.290 UTC [7]: [667-1] db=,user= LOG: database system is ready to accept connections

What's next