General best practices

This page provides best practices for getting the best performance, durability, and availability from Cloud SQL.

If issues occur with your Cloud SQL instance, review the following during troubleshooting:

Instance configuration and administration

Best practice More information
Read and follow the operational guidelines to ensure that your instances are covered by the Cloud SQL SLA.
Configure a maintenance window for your primary instance to control when disruptive updates can occur. See Maintenance window.
For read-heavy workloads, add read replicas to offload traffic from the primary instance. Optionally, you can use a load balancer such as HAProxy to manage traffic to the replicas.
If you delete and recreate instances regularly, use a timestamp in the instance ID to increase the likelihood that new instance IDs are usable.
Don't start an administrative operation before the previous operation has completed.

Cloud SQL instances do not accept new operation requests until they have completed the previous operation. If you attempt to start a new operation prematurely, the operation request fails. This includes instance restarts.

The instance status in the console does not reflect whether an operation is running. The green check mark denotes only that the instance is in the RUNNABLE state. To see whether an operation is running, go to the Operations tab and check the status of the most recent operation.

Data architecture

Best practice More information
Split your large instances into smaller instances, where possible. When possible, using many smaller Cloud SQL instances is better than one large instance. Managing a large, monolithic instance presents challenges not posed by a group of smaller instances.
Don't use too many database tables.

Keep your instance's table count to fewer than 10,000. Too many database tables can impact database upgrade time.

Application implementation

Best practice More information
Use good connection management practices, such as connection pooling and exponential backoff. Using these techniques improves your application's use of resources and helps you stay within Cloud SQL connection limits. For more information and code samples, see Managing database connections.
Test your application's response to maintenance updates, which can happen at any time during the maintenance window. Try self-service maintenance to simulate a maintenance update. During maintenance, your instance becomes unavailable for a brief period, and existing connections are dropped. Testing maintenance rollouts gives you a better understanding of how your application handles scheduled maintenance and how quickly the system can recover.
Test your application's response to failovers, which can happen at any time. You can manually initiate a failover using the console, the gcloud, or the API. See Initiating failover.
Avoid large transactions. Keep transactions small and short. If a large database update is needed, do it in several smaller transactions rather than one large transaction.
If you are using the Cloud SQL Auth proxy, make sure you are using the most up-to-date version. See Keeping the Cloud SQL Auth proxy up-to-date.

Data import and export

Best practice More information
Use serverless exports. Serverless exports offload the export operation to a temporary instance, allowing the primary instance to continue to serve queries and perform operations at its usual performance. When the data export is complete, the temporary instance is automatically deleted.
Speed up imports for small instance sizes. For small instances, you can temporarily increase the CPU and RAM of an instance to improve performance when importing large datasets.
If you are exporting data for import into Cloud SQL, be sure to use the proper procedure. See Exporting data from an externally managed database server.

Backup and recovery

Best practice More information
Protect your data with the appropriate Cloud SQL functionality.

Backups, point-in-time recovery, and exports are ways to provide data redundancy and protection. They each protect against different scenarios and complement each other in a robust data protection strategy.

Backups are lightweight; they provide a way to restore the data on your instance to its state at the time you took the backup. However, backups have some limitations. If you delete the instance, the backups are also deleted. You can't back up a single database or table. And if the region where the instance is located is unavailable, you cannot restore the instance from that backup, even in an available region.

Point-in-time recovery helps you recover an instance to a specific point in time. For example, if an error causes a loss of data, you can recover a database to its state before the error occurred. A point-in-time recovery always creates a new instance; you cannot perform a point-in-time recovery to an existing instance.

Exports take longer to create, because an external file is created in Cloud Storage that can be used to recreate your data. Exports are unaffected if you delete the instance. In addition, you can export only a single database or even table, depending on the export format you choose.

Size instances to account for transaction (binary) log retention. High write activity to the database can generate a large volume of transaction (binary) logs, which can consume significant disk space, and lead to disk growth for instances enabled to increase storage automatically. We recommend sizing instance storage to account for transaction log retention.

Tune and monitor

Tuning and monitoring database instances can help in reducing or avoiding vacuum-related problems.

The VACUUM operation has different variants with different levels of locking: standard VACUUM and VACUUM FULL. The VACUUM FULL option can reclaim more disk space but exclusively locks the table and runs slowly. Instead, use the standard form of VACUUM, which can run in parallel with production database operations. When you use the VACUUM operation, commands such as SELECT, INSERT, UPDATE, and DELETE will continue to function normally. You will not be able to modify the definition of a table with commands such as ALTER TABLE while it is being vacuumed.

Here are some recommendations that might help in reducing the time it takes to complete the VACUUM operation:

  • Increase system memory and maintenance_work_mem. This batches more tuples in each iteration and completes the work as quickly as possible. Note that when autovacuum runs, up to autovacuum_max_workers times this memory may be allocated, so be careful not to set the default value too high.
  • The VACUUM operation generates a lot of write-ahead log (WAL) records. If it's possible to reduce the number of WAL records, such as by having no replicas configured for this instance, the operation completes more quickly.
  • If the table has reached the two billion transaction IDs limit because none of the tuples is frozen, then try to reduce the amount of work done in single-user mode. One possible option is to set vacuum_freeze_min_age=1,000,000,000 (the maximum allowed value, up from the default of 50,000,000). This new value reduces the number of tuples frozen up to two times.
  • PostgreSQL version 12.0 and later versions support cleanup and VACUUM operations without cleaning the index entries. This is crucial, as cleaning the index requires a complete index scan, and if there are multiple indexes, then the total time depends on index size.
  • Larger indexes consume a significant amount of time for the index scan, therefore INDEX_CLEANUP OFF is preferred to quickly clean up and freeze the table data. PostgreSQL versions before 12.0 need to tune the number of indexes. That is, if there are non-critical indexes, then it might be helpful to drop the NON-CRITICAL index to speed up the vacuum operation.