Maintenance on Cloud SQL instances

This page explains how maintenance updates occur on Cloud SQL instances, and how you can control the timing of these updates. To get started, see Finding and setting maintenance windows.

Overview

As a managed service, Cloud SQL automatically updates instances to ensure that the underlying hardware, operating system, and database engine are reliable, performant, secure, and up-to-date. Most of these updates are performed while your Cloud SQL instance is up and running. However, certain system updates require a brief service interruption to be performed. These updates are called maintenance.

Maintenance updates the operating system and the database engine. Because these updates require the instance to be restarted, they incur some downtime. Maintenance updates deliver the following benefits:

  • Cloud SQL features. To launch new features, the database engine is updated and new plugins to the database are installed.

  • Database version upgrades. The database software provider that develops MySQL releases new minor versions several times a year. With each new version come bug fixes, security patches, performance enhancements, and new database features. You can find the latest minor version that Cloud SQL for MySQL supports by reviewing release notes or Database versions and version policies. Cloud SQL instances are upgraded to the latest database version shortly after release, so that you benefit from running the latest database software.

  • Operating system patches. We continuously monitor for newly identified security vulnerabilities in the operating system. Upon discovery, we patch the operating system to protect you from new risks.

Maintenance impact

During a maintenance event, a Cloud SQL for MySQL instance loses connectivity for less than 60 seconds on average.

Downtime might be higher for instances that have high activity at the beginning of maintenance or have very large datasets. Cloud SQL typically schedules maintenance once every few months.

You can take steps to ensure that maintenance has as little impact as possible on your operations by using our maintenance settings and by making your systems resilient to transient errors.

Maintenance settings

Cloud SQL offers you the ability to configure maintenance updates through a set of maintenance settings.

You can configure maintenance to be scheduled at times when brief downtime causes the lowest impact to your applications. For each Cloud SQL instance, you can configure the following:

  • Maintenance window. The day of the week and the hour in which Cloud SQL schedules maintenance. Maintenance windows last for one hour. Learn how to configure a maintenance window.

  • Order of update. Sets the order in which the Cloud SQL instance is updated relative to other instances in the same region. Order of update can be set to Any, Earlier, or Later. Later instances are updated one week after Earlier instances with the same maintenance window in the same region. You set the order of update when you configure a maintenance window.

  • Deny maintenance period. A block of days in which Cloud SQL does not schedule maintenance. Deny maintenance periods can be up to 90 days long. Learn how to configure a deny maintenance period.

Maintenance example

Assume you are a developer at a retailer managing a shopping cart service. You have one Cloud SQL instance for a production environment and a second for a staging environment. You want maintenance to occur at the time when your instance handles the lowest amount of traffic, which is around midnight on Sundays. You also want to skip maintenance during your busy end-of-year holiday shopping season.

In this case, you set your production instance's maintenance settings to:

  • Maintenance window: Sundays between 12:00AM and 1:00AM ET
  • Order of update: Later
  • Deny maintenance period: November 1 through January 15.

The maintenance settings for your staging environment would be identical, except the order of update would be set to Earlier. This ensures you can run operational acceptance tests for a maintenance release in staging at least seven days before maintenance rolls out to production. If something goes wrong in the staging environment, you have time to diagnose and fix the issue so that your production environment is unaffected.

Upcoming maintenance notifications

You can have a notification about upcoming maintenance sent to your email at least one week before maintenance is scheduled. If you want to set an email filter for notifications, the email title is Upcoming maintenance for your Cloud SQL instance instancename.

Notifications for maintenance are not sent out by default. You need to opt in to maintenance notifications. You must also select a maintenance window before you can receive notifications.

Notifications are sent to the email address associated with your Google account. It's not possible to configure a custom email alias (for instance, a team email alias).

You opt into maintenance notifications for all Cloud SQL instances that have maintenance windows in a given project. You receive one notification per instance. Upcoming maintenance notifications are not sent out for read replicas.

You can also view upcoming maintenance information in the Google Cloud Console.

  • In the Instances list, in the Maintenance column. If maintenance is scheduled, you see the date and time for when it is scheduled to start. You can filter the instances list using the term Maintenance to find all the instances scheduled for maintenance. The Maintenance column only displays when maintenance is scheduled on one or more instances in the project. If no maintenance is scheduled, the column is hidden.
  • On the Instance details page in the Maintenance pane. If maintenance is scheduled, under Upcoming, you see a date and time for when it is scheduled to start.
  • On the ACTIVITY page in the Google Cloud Console, you can view a list of instances scheduled for maintenance. If maintenance is scheduled, instances have the message SQL Maintenance and the date and time for when it is scheduled to start.

Rescheduling maintenance

If you have a maintenance window for your instance, you can reschedule maintenance at any time before maintenance is currently scheduled. For example, if you have a new service launching during your currently scheduled maintenance time, you might want to reschedule the maintenance window to a few days after your launch.

You may reschedule maintenance multiple times so long as it's not more than one week after the originally scheduled time.

You have a few scheduling options for the new maintenance window:

  • Apply updates immediately. You can apply the update to your instance immediately instead of waiting for the scheduled maintenance window. In this case, maintenance generally starts within five minutes.
  • Reschedule to another time. You can postpone a scheduled maintenance event in two ways:

    • Next available window. This defers maintenance by one week.
    • Specific time. This lets you choose any specific time within one week after the originally scheduled maintenance time.

To reschedule maintenance, see rescheduling planned maintenance.

How maintenance works

To keep maintenance brief, Cloud SQL uses a maintenance failover workflow that largely resembles our automatic failover workflow for highly available instances.

In short, these are the steps:

  1. Set up an updated VM with the new software.
  2. Stop the original VM.
  3. Start up the updated VM.
  4. Switch over the disk and static IP to the updated VM.

Step through the tabs below to see details of the workflow, including pre- and post-maintenance.

Pre-maintenance

Before maintenance, the client communicates with the original VM through a static IP address. The data is stored on a persistent disk that is attached to the original VM. In this example, the Cloud SQL instance has high availability configured, which means that another VM is on standby to take over in the event of an unplanned outage. The Cloud SQL instance is serving traffic to the application.

Diagram showing the pre-maintenance state

Step 1

Set up the new VM.

a new Virtual Machine (VM) is set up with the latest database software and VM operating system (OS). The updated VM OS is started. At this point, the database engine is not yet started. For highly available instances, a new standby VM is also set up.

The total downtime is substantially shortened by installing the software update on another VM while the original Cloud SQL instance is still serving traffic.

Diagram showing setting up the VM

Step 2

Shut down the original VM.

The database engine is shut down so that the disk can be detached from the original VM and attached to the updated VM. Before shutting down, the database engine waits for a few seconds for ongoing transactions to be committed and requests from existing connections to drain. After that, any open or long-running transactions are rolled back. The database stops accepting new connections, and existing connections are dropped. The instance becomes unavailable and maintenance downtime begins.

Diagram of instance after failover

Step 3

Switch over to the updated VM.

The disk is detached from the original VM and attached to the updated VM. The static IP address is reconfigured to point to the updated VM. This ensures that the application uses the same IP address after maintenance as before. The database cache is cycled out with the original VM, meaning that the database cache is effectively cleared during maintenance.

Diagram of switching over to updated VM

Step 4

Start the updated VM.

The updated database engine is started on the data disk. Using a common data disk ensures that all transactions written to the original instance prior to maintenance are still present on the updated database after maintenance. If any incomplete transactions didn't finish rolling back during database shutdown, the database automatically goes through crash recovery to ensure that the database is restored to a usable state.

Diagram of starting up the updated VM

Post-maintenance

After Step 4, the Cloud SQL instance is available to accept connections and it returns to serving traffic to the application.

To the application, apart from the updated software, the Cloud SQL instance looks the same. The application still connects to the Cloud SQL instance using the same static IP address, and the updated VM runs in the same zone as the original VM. All data written to the original database is preserved.

Diagram of post-maintenance

Maintenance FAQ

How does maintenance affect Legacy HA failover instances?

Legacy HA failover instances are taken down for maintenance updates. They receive maintenance updates right before the primary instance. You can't set a maintenance window directly on a Legacy HA failover instance, because it shares the maintenance window of the primary instance.

How does maintenance affect read replicas?

Read replicas don't observe maintenance windows and can receive maintenance updates at any time. There is no guarantee as to when the updates occur and updates might overlap or occur very close to the primary instance update.

Can I cancel scheduled maintenance?

You can't cancel a scheduled maintenance window, but you can reschedule it.

Rescheduling limitations

There are a few things you need to know about rescheduling:

  • You must reschedule maintenance at least 24 hours before the originally scheduled maintenance event happens.

  • You can reschedule maintenance on one or multiple instances in your project. However, you can only reschedule one instance at a time (bulk rescheduling is not available).

  • You can reschedule maintenance to a time that falls within a deny maintenance period, or even outside the maintenance window, as long as the time falls within the one week rescheduling limitation.

  • If a maintenance operation is in progress, rescheduling is delayed until the operation is complete.

What happens if the maintenance event is cancelled?

If Cloud SQL cancels a maintenance event, you receive notification that maintenance is cancelled in advance, when possible.

You receive a new notification of upcoming maintenance when the maintenance event is rescheduled.

Deny maintenance period limitations

There are a few things you need to know about deny maintenance periods:

  • You can have a deny maintenance period even if you don't have maintenance windows configured for your instance. Deny maintenance periods can span from 1 to 90 days.

  • The deny maintenance period takes precedence over any scheduled maintenance window. If there is a conflict between the timing of a maintenance window and the deny maintenance period, the deny maintenance period overrides the maintenance window.

  • Deny maintenance periods and relative scheduling are independent features. A deny maintenance period specified on an Earlier instance has no impact in determining the schedule for the Later instance. Notifications are not sent if the maintenance schedule falls within the deny maintenance period for Earlier or Later instances.

  • When a deny period is set on a primary instance, maintenance for all replicas associated with the primary instance is also denied. As an example, a primary instance located in region A has three read replicas: two in region A and one in region B. When a deny period is set on the primary instance, maintenance to each of the replicas, including the replica in region B, does not receive maintenance until the deny period on the primary instance expires.

  • If a deny maintenance period is set after maintenance is scheduled such that the deny maintenance period overlaps with the scheduled maintenance time, the update is skipped.

  • You can set the deny maintenance period to recur every year by not including the year in the start and end date parameters. If the year is specified, the deny maintenance period is set for only that year.

  • You can set multiple deny maintenance periods in a year. We recommend that you avoid chaining deny periods together to skip consecutive scheduled maintenance events. Staying current on Cloud SQL maintenance is important to ensure that your instance operates reliably. Typically, Cloud SQL maintenance is scheduled once every few months.

  • In order to ensure service reliability, Cloud SQL may notify users with instances running maintenance releases that are more than 12 months old that the next maintenance rollout is required.

  • When a deny maintenance period ends, regular maintenance behavior resumes.

Minimizing the impact of maintenance

In general, Google Cloud recommends that users running applications in the cloud make their systems resilient to transient errors, which are momentary inter-service communication issues caused by temporary unavailability. Occasional transient errors are unavoidable in the cloud.

Some of the transient errors that occur during maintenance are dropped connections and failed in-flight transactions. If you design your systems and tune your applications to be resilient to transient errors, you're also positioned to minimize impacts due to database maintenance.

To minimize the impact of dropped connections, you can use connection pools. While connections between the pooler and the database are dropped during maintenance, the connections between the application and the pooler are preserved. That way, the work of reestablishing the connections is transparent to the application and offloaded to the connection pooler instead.

To reduce the transaction failures, you can limit the number of long-running transactions. Rewriting queries to be smaller and more efficient not only reduces maintenance downtime, but also improves database performance and reliability.

To recover efficiently from connection drops and transaction failures, you can efficiently manage your database connections. You can build connection and query retry logic with exponential back-off into your applications and connection poolers. In the event that a query fails or a connection is dropped, the system institutes a wait period before retrying, which increases for each subsequent retry. For example, the system might wait just a few seconds for the first retry, but up to a minute for the fourth retry. Following this pattern ensures that these failures are corrected, without overloading the service.

Other creative solutions can minimize maintenance impacts as well, from using scripts to warm the database cache after maintenance to streamlining the number of tables in databases. We recommend following database management best practices and operational guidelines to ensure that maintenance goes smoothly.

Time-sensitive maintenance

In very rare cases, Cloud SQL might need to schedule maintenance outside of your maintenance settings to patch severe stability issues or vulnerabilities that are time-sensitive. These updates roll out rapidly, and Cloud SQL counts them as downtime against the SLA.

What's next