Building a resilient architecture with Cloud SQL
Subhash Guddad
Director, Technology Practice Lead, Data Management
Customers build and deploy many applications that have varied requirements from an availability perspective. The databases that store and manage the data created and used by these applications play a key role in determining the overall availability of the applications. Some applications can tolerate a longer recovery time or RTO (Recovery Time Objective) and have ways to deal with some amount of data loss or RPO ( Recovery Point Objective). Other critical applications have a requirement for no data loss i.e. the RPO has to be zero and be able to return to service quickly i.e. a short RTO.. The databases supporting these applications should have capabilities to meet the various RPO and RTO requirements that the applications need.
Cloud SQL is Google Cloud’s fully managed relational database service for MySQL, PostgreSQL, and SQL Server. It provides full compatibility with the source database engines while reducing operational costs by automating database provisioning, storage capacity management, and other time-consuming tasks. Cloud SQL has built-in features to ensure business continuity with reliable and secure services, backed by a 24/7 SRE team providing a 99.95% SLA for the service.
This guide discusses the features in Cloud SQL that can be used to build a resilient database architecture. We list the planned and unplanned events that can impact the availability of the Cloud SQL instance. We discuss the unique capabilities of Cloud SQL that can control and limit the impact of planned maintenance events in terms of downtime. Planned events could be configuration updates or patching activities that are needed to keep the database instance in optimal health.
We look at the various types of unplanned events that can cause an outage and discuss features that can be used by customers to reduce the RPO and RTO. The features include database backup and recovery capabilities that form the foundation of an availability strategy and can protect against failures and human errors and reduce the data loss exposure to a minimum.
For environments where the RPO needs to be zero, we discuss the Cloud SQL High Availability configuration that provides a RPO of zero. The replication capabilities of Cloud SQL and how replicas can be used in an availability architecture, both in the same region and using cross-region replicas as a building block to address the disaster recovery requirements, are also covered in the guide.
Finally, the guide briefly discusses best practices for applications to manage connections to the database, use observability to monitor load on the database and handle failures gracefully.