Best practices for improving AlloyDB performance and availability

This page introduces general best practices to help you improve AlloyDB for PostgreSQL performance, durability, and availability. This page is for database administrators and developers who are already familiar with AlloyDB and PostgreSQL.

Instance configuration and administration

Use AlloyDB tools to monitor database usage and status

Use the following table to learn about AlloyDB tools that help you monitor your database usage, status, and performance.

AlloyDB tool Description
The performance snapshot report Compares snapshots of system metrics between two different points in time.
Query insights Helps you detect, diagnose, and prevent query performance problems for AlloyDB databases. It provides self-service, intuitive monitoring, and diagnostic information that goes beyond detection to help you to identify the root cause of performance problems.
System insights Lets you monitor database resources and metrics including active node count, CPU usage, peak connections, log errors, transactions per second, and maximum replication lag.

Manage replica lag

AlloyDB has made several enhancements to improve replication lag. However, you might encounter scenarios in which log replay is blocked or can't keep up, which can cause increased replication lag.

For example, if your primary VM size is much larger than your read pool node size, under heavy write workloads the primary VM might generate log records faster than the read nodes can replay them, especially if there is also a heavy read workload running concurrently on the read nodes. In this scenario, it might help to increase the read node size to give it more resources.

Depending on your application needs, you might want to adjust the following parameters:

Follow operational guidelines

To ensure that your instances are covered by the AlloyDB for PostgreSQL SLA, follow the operational guidelines.

Configure a maintenance window for your primary instance

Configure a maintenance window for your primary instance to plan when disruptive updates can occur. For more information, see View and set maintenance times.

Add read pool instances to offload read traffic

For read-heavy workloads, add read pool instances to offload read traffic from the primary instance.

Configure one or more read pools for each database in the instance to help improve caching.

Consider adding additional node(s) per pool to facilitate automatic load balancing and high availability.

Don't start an administrative operation before the previous operation completes

AlloyDB instances can't accept new operation requests until the previous operation completes. If you attempt to start a new operation before the previous operation completes, the operation request fails. This includes instance restarts.

The instance status in the the Google Cloud console doesn't reflect whether an operation is running. The green check mark only indicates whether the instance is in the RUNNABLE state. To see whether an operation is running, click the Operations tab and check the status of the most recent operation.

Configure enough storage quota to accommodate critical database maintenance

By default, you can use storage up to 16TB per cluster. If you need more storage, consider increasing your storage quota.

Prevent over-utilization of your CPU

You can view the percentage of available CPU that your instance is using on the instance details page in the Google Cloud console. For more information, see Monitor instances. You can also monitor your CPU usage and receive alerts at a specified threshold using Create metric-threshold alerting policies.

To avoid over-utilization, you can scale your instance to a higher number of CPUs. Changing CPUs requires an instance restart. If your instance is already at the maximum number of CPUs, we recommend that you shard your database to multiple instances.

Avoid memory exhaustion

AlloyDB has automatic memory management to prevent out-of-memory issues. However, constant memory pressure could lead to performance issues. When looking for signs of memory exhaustion, you should primarily use the usage metric. We recommend that this metric remains under 90% for optimal performance.

You can also use the total_usage metric to observe the percentage of available memory that your AlloyDB instance is using, including memory used by the database container and memory allocated by the operating system cache.

By observing the difference between the usage and total usage metrics, you can identify how much memory is used by processes versus how much is used by the operating system cache. You can repurpose the memory in this cache.

Scale your AlloyDB instance to increase the size of its memory. Changing the instance's memory size requires you to restart the instance. If your instance is already at the maximum memory size, you must shard your database across multiple instances.

For more information about monitoring usage and total usage metrics in the Google Cloud console, see Monitor instances.

Make sure your instance has optimal transaction IDs

You can view the transaction ID usage of your instance on the Metrics Explorer page in the Google Cloud console by setting the Resource Type to AlloyDB for PostgreSQL Database and by setting Metric to Percentage of instance's transaction IDs consumed. For more information, see Create charts with Metrics Explorer.

AlloyDB has built-in adaptive autovacuum that helps mitigate vacuum-related problems.

Data architecture

Split your large instances into smaller instances where possible

When possible, use many smaller AlloyDB clusters rather than using one large instance. Managing a large, monolithic instance presents challenges that aren't created by a group of smaller instances.

Don't use too many database tables

Keep your instance's table count to fewer than 10,000. Too many database tables can impact database upgrade time.

Query performance

Enable Columnar Engine if you run analytical queries

Read an overview of the AlloyDB columnar engine. Check for the types of queries that benefit from enabling the columnar engine.

You can monitor columnar engine usage.

If you are new to the columnar engine, start by familiarizing yourself with auto columnization. Then, you can choose to manage the columns manually.

Scale up your instance to improve query performance

If you're experiencing low query performance, consider scaling up your instance.

Each SKU has limited vCPU and memory configurations, and each SKU also has limited fast cache. If your data size is large, and if you're experiencing poor query performance, consider scaling up to a larger instance.

Deploy read pools and offload read queries to the read pool

If your application performs heavy writes and reads, consider deploying read pools and offload read queries to the read pool.

For read-heavy workloads, add read pool instances to offload read traffic from the primary instance.

Application implementation

Use good connection management practices

Use good connection management practices, such as connection pooling and exponential backoff.

Using good connection management techniques improves your application's use of resources and helps you stay within AlloyDB connection limits.

Test your application's response to maintenance updates

Test your application's response to maintenance updates, which can happen at any time during the maintenance window.

You can simulate a maintenance update by performing compute scale operations or updating static PostgreSQL flags which triggers low downtime maintenance (LDTM).

During LDTM, your instance becomes unavailable for a brief period, and existing connections are dropped. Testing LDTM gives you a better understanding of how your application handles scheduled maintenance and how fast the system can recover.

Test your application's response to failovers

Test your application's response to failovers, which can happen at any time.

You can manually initiate a failover using the Google Cloud console, the Google Cloud CLI, or the API. For more information, see Initiating failover.

Avoid large transactions

Keep transactions small and short. If a large database update is needed, perform the update in several smaller transactions rather than executing one large transaction.

Avoid large numbers of subtransactions

Avoid large number of subtransactions in a transaction when long-running transactions are present.

In AlloyDB, performing a transaction in a PL/pgSQL error block creates subtransactions of the transaction corresponding to the error block. Overall system performance degrades if the number of subtransactions exceeds 64 in the presence of long running transactions.

Use the most recent version of Auth Proxy

If you're using the AlloyDB Auth Proxy, make sure that you use the most recent version. For more information, see Keep the Auth Proxy client up-to-date.

Data import and export

Restore from Cloud SQL for PostgreSQL backups for migration

To facilitate migration, see Migrate from Cloud SQL for PostgreSQL to AlloyDB.

To learn how to migrate your data from Cloud SQL for PostgreSQL to AlloyDB using continuous data replication, see Database Migration Service for PostgreSQL to AlloyDB.

Speed up imports for small instances

When you import large datasets for small instances, you can temporarily increase the CPU and RAM of an instance to improve performance.

Backup and recovery

Protect your data using the appropriate AlloyDB capabilities

Use backups, point-in-time recovery (PITR), and exports for redundancy and protection. They each protect against different scenarios and complement each other in a robust data protection strategy.

Backups are lightweight; they provide a way to restore the data on your instance to its state at the time you took the backup. However, AlloyDB's backup feature has some limitations. If you delete the instance, the backups are also deleted. You can't back up a single database or table. And if the region where the instance is located is unavailable, you cannot restore the instance from that backup, even in an available region.

Point-in-time recovery helps you recover an instance to a specific point in time. For example, if an error causes a loss of data, you can recover a database to its state before the error occurred. A point-in-time recovery always creates a new instance; you cannot perform a point-in-time recovery to an existing instance.

Exports take longer to create, because an external file is created in Cloud Storage that can be used to recreate your data. Exports are unaffected if you delete the instance. In addition, you can export only a single database or table, depending on your export format.

Protect your instance and backups from accidental deletion

To enable default accidental deletion prevention, create an AlloyDB instance using the Google Cloud console or Terraform.

Use the export feature in AlloyDB to export your data for additional protection. Use Cloud Scheduler with the Cloud Scheduler API to automate export management.

For more advanced scenarios, use Cloud Scheduler with Cloud Run functions for automation.