Understanding Cloud SQL Maintenance: why is it needed?
Since I joined the Cloud SQL team, customers have asked me one question about our service more than any other: “What happens during Cloud SQL maintenance?” It’s a fair question–I’d want to know too if something was going to impact my database’s availability!
In this blog series, I’ll take you through the ins and outs of Cloud SQL maintenance. In Part 1, I will share how maintenance and other system updates make database operations a whole lot simpler for our users. In Part 2, I’ll take you step-by-step through the maintenance process and offer a behind the scenes look at the engineering that has gone into minimizing database downtime. In Part 3, I will finish with an overview of how users use Cloud SQL maintenance settings and design their applications to optimize their scheduled maintenance experiences.
Let’s get started!
What comprises a Cloud SQL instance?We first need to cover the system components that comprise a Cloud SQL instance. Each Cloud SQL instance is powered by a virtual machine (VM) running on a host Google Cloud server. Each VM operates the database engine, such as MySQL, PostgreSQL, or SQL Server, as well as service agents that provide supporting services like logging and monitoring. For users of our high availability option, we set up a standby VM in another zone in the same region with an identical configuration to the primary VM. Database data is stored on a scalable, durable network storage device called a persistent disk that attaches to the VM. Finally, a static IP address sits in front of each VM, which ensures that the IP address that an application connects to persists throughout the lifetime of the Cloud SQL instance, including through maintenance or automatic failover.
What are the database updates that happen on a Cloud SQL instance?
Over the life of a Cloud SQL instance, there are two types of updates. Updates that users perform, which are called configuration updates and updates that Cloud SQL performs, which are called system updates.
As a database’s usage grows and new workloads are added, users may want to update their database configuration accordingly. These configuration updates include increasing compute resources, modifying a database flag, and enabling high availability. Although Cloud SQL makes these updates possible with the click of a button, configuration updates can require downtime. When thinking holistically about application availability, users need to plan ahead for these configuration updates.
Keeping the database instance up and running requires operational effort beyond configuration updates. Servers and disks need to be replaced and upgraded. Operating systems need to be patched as new vulnerabilities are discovered. Database engines need to be upgraded as the database software provider releases new features and fixes new issues. Normally, a database administrator would need to perform each of these updates regularly in order to ensure their system stays reliable, protected, and up-to-date. Cloud SQL takes care of these system updates on behalf of our users, so that they can spend fewer cycles managing their database and more cycles developing great applications. In fact, managed system updates attract many users to our managed service.
How does maintenance fit into system updates?
In general, Cloud SQL system updates are divided into three categories: hardware updates, online updates, and maintenance.
Hardware updates improve underlying physical infrastructure. These include swapping out a defective machine host or replacing an old disk. Google Cloud performs hardware updates without interruption to a user’s application. For example, when updating a database server, Google Cloud uses live migration, an advanced technology that reliably migrates a VM from the original host to a new one while the VM stays running.
Online updates enhance the software of the supporting service agents that sit adjacent to the database engine. These updates are performed while the database is up and running, serving traffic. Online updates do not cause downtime for a user’s application.
Maintenance updates the operating system and the database engine. Since these updates require that the instance be restarted, they incur some downtime. For this reason, Cloud SQL allows users to schedule maintenance to occur at the time that is least disruptive to a user’s application.
As you can see, Cloud SQL performs most system updates without any application impact. We take care to only schedule maintenance when we need to update a part of the system that cannot be updated without interrupting the service. To moderate application impacts, we bundle critical updates together into maintenance events that are scheduled once every few months. We’ve gone further to design the maintenance workflow to complete quickly so that our users’ applications can get back up and running. We’ll discuss this further in Part 2. To make maintenance more manageable, we equip users with settings such as maintenance windows and deny periods, which we will cover in more detail in Part 3.
If you’re interested in learning more about how maintenance fits together with all of the other benefits of Cloud SQL, read our blog about the value of managed database services.
Stay tuned for Part 2, where we will talk more specifically about how long maintenance lasts, what kinds of updates come with maintenance, and how Cloud SQL conducts maintenance to ensure minimum impact to our users’ instances.