Jump to Content
Databases

Best practices for achieving high availability and scalability in Cloud SQL

March 3, 2025
Arkapravo Banerjee

Data Management Specialist

Join us at Google Cloud Next

April 9-11 in Las Vegas

Register

Cloud SQL, Google Cloud's fully managed database service for PostgreSQL, MySQL, and SQL Server workloads, offers strong availability SLAs, depending on which edition you choose: a 99.95% SLA, excluding maintenance for Enterprise edition; and a 99.99% SLA, including maintenance for Enterprise Plus. In addition, Cloud SQL offers numerous high availability and scalability features that are crucial for maintaining business continuity and minimizing downtime, especially for mission-critical databases. 

These features can help address some common database deployment challenges:

  1. Combined read/write instances: Using a single instance for both reads and writes creates a single point of failure. If the primary instance goes down, both read and write operations are impacted. In the event that your storage is full and auto-scaling is disabled, even a failover would not help. 

  2. Downtime during maintenance: Planned maintenance can disrupt business operations.

  3. Time-consuming scaling: Manually scaling instance size for planned workload spikes is a lengthy process that requires significant planning.

  4. Complex cross-region disaster recovery: Setting up and managing cross-region DR requires manual configuration and connection string updates after a failover.

In this blog, we show you how to maximize your business continuity efforts with Cloud SQL's high availability and scalability features, as well as how to use Cloud SQL Enterprise Plus features to build resilient database architectures that can handle workload spikes, unexpected outages, and read scaling needs.

Architecting a highly available and robust database 

Using the Cloud SQL high availability feature, which automatically fails over to a standby instance, is a good starting point but not sufficient: scenarios such as storage full issues, regional outages, or failover problems can still cause disruptions. Separating read workloads from write workloads is essential for a more robust architecture.

A best-practice approach involves implementing Cloud SQL read replicas alongside high availability. Read traffic should be directed to dedicated read-replica instances, while write operations are handled by the primary instance. You can enable high availability either on the primary, the read replica(s), or both, depending on your specific requirements. This separation helps ensure that the primary can serve production traffic predictably, and that read operations can continue uninterrupted via the read replicas even when there is downtime. 

Below is a sample regional architecture with high availability and read-replica enabled.

https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_EAskL0U.max-1500x1500.png

You can deploy this architecture regionally across multiple zones or extend it cross-regionally for disaster recovery and geographically-distributed read access. A regional deployment with a highly available primary and a highly available read replica that spans three availability zones provides resilience against zonal failures: Even if two zones fail, the database remains accessible for both read and write operations after failover. Cross-region read replicas enhance this further, providing regional DR capabilities.

Cloud SQL Enterprise Plus features

Cloud SQL Enterprise Plus offers significant advantages for performance and availability:

  1. Enhanced hardware: Run databases on high-performance hardware with up to 128 vCPUs and 824GB of RAM.

  2. Data cache: Enable data caching for faster read performance.

  3. Near-zero downtime operations: Experience near-zero downtime maintenance and sub-second (<1s) downtime for instance scaling.

  4. Advanced disaster recovery: Streamline disaster recovery with failover to cross-region DR-Replica and automatic reinstatement of the old primary. The application can still connect using the same write endpoint, which is automatically assigned to the new primary after failover. 

Enterprise Plus edition addresses the previously mentioned challenges:

  1. Improved performance: Benefit from higher core-to-memory ratios for better database performance.

  2. Faster reads: Data caching improves read performance for read-heavy workloads. Read-cache can be enabled in the primary, the read-replica, or both as needed. 

  3. Easy scaling: Scale instances quickly with minimal downtime (sub-second) to handle traffic spikes or planned events. Scale the instance down when traffic is low with sub-second downtime. 

  4. Minimized maintenance downtime: Reduce downtime during maintenance to less than a second and provide better business continuity. 

  5. Handle regional failures: Easily fail over to a cross-region DR replica, and Cloud SQL automatically rebuilds your architecture as the original region recovers. This lessens the hassle of DR drills and helps ensure application availability. 

  6. Automatic IP address re-pointing: Leverage the write endpoint to automatically connect to the current primary after a switchover or failover and you don’t need to make any IP address changes on the application end. 

To test out these benefits quickly, there’s an easy, near-zero downtime upgrade option from Cloud SQL Enterprise edition to Enterprise Plus edition.

Planned maintenance best practices

While Cloud SQL Enterprise Plus offers near-zero downtime planned maintenance, following these best practices can further enhance business continuity: 

  1. Staging environment testing: To identify potential issues, use the maintenance timing feature to deploy maintenance to test/staging environments at least a week before production.

  2. Read-replica maintenance: Apply self-service maintenance to one of the read replicas before the primary instance to avoid simultaneous downtime for read and write operations. Make sure that the primary and other replicas are updated shortly afterwards, as we recommend maintaining the same maintenance version in the primary as well as all the other replicas. 

  3. Maintenance window: Always configure a maintenance window during off-peak hours to control when maintenance is performed. 

  4. Maintenance notifications: Opt in to maintenance notifications to make sure you receive an email at least one week before scheduled maintenance. 

  5. Reschedule maintenance: Use the reschedule maintenance feature if a maintenance activity conflicts with a critical business period.

  6. Deny maintenance period: Use the deny maintenance period feature to postpone maintenance for up to 90 days during sensitive periods. 

By combining these strategies, you can build highly available and scalable database solutions in Cloud SQL, helping to ensure your business continuity and minimize downtime. Refer to the maintenance FAQ for more detailed information.

Posted in