About maintenance on Cloud SQL instances

This page explains how maintenance updates occur on Cloud SQL instances, and how you can control the timing of these updates. To get started, see Finding and setting maintenance windows.

Overview

As a managed service, Cloud SQL automatically updates instances to ensure that the underlying hardware, operating system, and database engine are reliable, performant, secure, and up-to-date. Most of these updates are performed while your Cloud SQL instance is up and running. However, certain system updates require a brief service interruption to be performed. These updates are called maintenance.

Maintenance updates the operating system and the database engine. Because these updates require the instance to be restarted, they incur some downtime. Maintenance updates deliver the following benefits:

  • Cloud SQL features. To launch new features, the database engine is updated and new plugins to the database are installed.

  • Database version upgrades. The database software provider that develops MySQL releases new minor versions several times a year. With each new version come bug fixes, security patches, performance enhancements, and new database features. You can find the latest minor version that Cloud SQL for MySQL supports by reviewing release notes or Database versions and version policies. Cloud SQL instances are upgraded to the latest database version shortly after release, so that you benefit from running the latest database software.

    You need to perform MySQL 8.0 minor version upgrades manually. See Set the MySQL minor version for an instance.

  • Operating system patches. We continuously monitor for newly identified security vulnerabilities in the operating system. Upon discovery, we patch the operating system to protect you from new risks.

Maintenance impact

During a maintenance event, a Cloud SQL for MySQL instance loses connectivity for less than 60 seconds on average.

For Cloud SQL Enterprise Plus edition, Cloud SQL offers near-zero downtime planned maintenance.

Downtime might be higher for instances that have high activity during maintenance or have very large datasets. Cloud SQL typically schedules maintenance once every few months.

You can take steps to ensure that maintenance has as little impact as possible on your operations by using our maintenance settings and by making your systems resilient to transient errors.

Near-zero downtime planned maintenance

With near-zero downtime planned maintenance, Cloud SQL Enterprise Plus edition instances with high availability typically lose connectivity for less than 10 seconds during planned maintenance.

The downtime might be higher for instances that have high activity during the maintenance.

Prerequisites and constraints

  • If you're using any of the following flags, they must be set to default values:

    • sync_binlog, value: 1
    • innodb_flush_log_at_trx_commit, value 1
    • replica_skip_errors, value OFF
    • binlog_order_commit, value ON

    For more information, see Configure database flags.

  • If you are using Cloud SQL Auth Proxy or Cloud SQL Language Connectors, ensure that they are updated to their latest version.

  • If you have external replicas with GTID enabled, then you must configure those replicas to use GTID-based auto-positioning or the replication will break after maintenance. For more information, see GTID auto-positioning.

  • Your MySQL server UID will change during maintenance.

  • During maintenance, the database logs will have messages from two different VMs.
  • If a DDL is issued during the planned maintenance, the changes might have a creation or modification timestamp that is after the maintenance timestamp.

Maintenance settings

Cloud SQL offers you the ability to configure maintenance updates through a set of maintenance settings.

You can configure maintenance to be scheduled at times when brief downtime causes the lowest impact to your applications. For each Cloud SQL instance, you can configure the following:

  • Maintenance window. The day of the week and the hour in which Cloud SQL schedules maintenance. Maintenance windows last for one hour. Learn how to configure a maintenance window.

  • Order of update. Sets the order in which the Cloud SQL instance is updated relative to other instances in the same region. Order of update can be set to Any, Earlier, or Later. Later instances are updated one week after Earlier instances with the same maintenance window in the same region. You set the order of update when you configure a maintenance window.

  • Deny maintenance period. A block of days in which Cloud SQL does not schedule maintenance. Deny maintenance periods can be up to 90 days long. Learn how to configure a deny maintenance period.

Default maintenance windows

If you don't set a maintenance window, then Cloud SQL updates your instance in the following default windows according to your instance's time zone:

  • Weekday window (Monday to Friday): 10 PM to 6 AM
  • Weekend window: Friday, 10 PM to Monday, 6 AM

Maintenance example

Assume you are a developer at a retailer managing a shopping cart service. You have one Cloud SQL instance for a production environment and a second for a staging environment. You want maintenance to occur at the time when your instance handles the lowest amount of traffic, which is around midnight on Sundays. You also want to skip maintenance during your busy end-of-year holiday shopping season.

In this case, you set your production instance's maintenance settings to:

  • Maintenance window: Sundays between 12:00AM and 1:00AM ET
  • Order of update: Later
  • Deny maintenance period: November 1 through January 15.

The maintenance settings for your staging environment would be identical, except the order of update would be set to Earlier. This ensures you can run operational acceptance tests for a maintenance release in staging at least seven days before maintenance rolls out to production. If something goes wrong in the staging environment, you have time to diagnose and fix the issue so that your production environment is unaffected.

Upcoming maintenance notifications

You can have a notification about upcoming maintenance sent to your email at least one week before maintenance is scheduled. If you want to set an email filter for notifications, the email title is Upcoming maintenance for your Cloud SQL instance instancename.

Notifications for maintenance aren't sent out by default. You need to opt in to maintenance notifications. Before you can receive notifications, you must also select a maintenance window.

Notifications are sent to the email address associated with your Google account. It's not possible to configure a custom email alias (for instance, a team email alias).

You opt into maintenance notifications for all Cloud SQL instances that have maintenance windows in a given project. You receive one notification per instance. Upcoming maintenance notifications are not sent out for read replicas.

You can also view upcoming maintenance information in the Google Cloud console.

  • In the Instances list, in the Maintenance column. If maintenance is scheduled, you see the date and time for when it is scheduled to start. You can filter the instances list using the term Maintenance to find all the instances scheduled for maintenance. The Maintenance column only displays when maintenance is scheduled on one or more instances in the project. If no maintenance is scheduled, the column is hidden.
  • On the Instance details page in the Maintenance pane. If maintenance is scheduled, under Upcoming, you see a date and time for when it is scheduled to start.
  • On the ACTIVITY page in the Google Cloud console, you can view a list of instances scheduled for maintenance. If maintenance is scheduled, instances have the message SQL Maintenance, and the date and time for when it's scheduled to start.

Reschedule maintenance

If you have a maintenance window for your instance, you can reschedule maintenance at any time before maintenance is currently scheduled. For example, if you have a new service launching during your currently scheduled maintenance time, you might want to reschedule the maintenance window to a few days after your launch.

You can reschedule maintenance multiple times so long as it's not more than 28 days after the originally scheduled time.

You have a few scheduling options for the new maintenance window:

  • Apply updates immediately. You can apply the update to your instance immediately instead of waiting for the scheduled maintenance window. In this case, maintenance generally starts within five minutes.
  • Reschedule to another time. You can postpone a scheduled maintenance event in two ways:

    • Next available window. This option defers maintenance to the next available maintenance window following the current scheduled maintenance time, which is typically one week later.
    • Specific time. This option lets you choose any specific time within 28 days after the originally scheduled maintenance time.

To reschedule maintenance, see rescheduling planned maintenance.

How maintenance works

To keep maintenance brief, Cloud SQL uses a maintenance failover workflow that largely resembles our automatic failover workflow for highly available instances.

In short, these are the steps:

  1. Set up an updated VM with the new software.
  2. Stop the original VM.
  3. Switch over the disk and static IP to the updated VM.
  4. Start up the updated VM.

Step through the tabs below to see details of the workflow, including pre- and post-maintenance.

Pre-maintenance

Before maintenance, the client communicates with the original VM through a static IP address. The data is stored on a persistent disk that is attached to the original VM. In this example, the Cloud SQL instance has high availability configured, which means that another VM is on standby to take over in the event of an unplanned outage. The Cloud SQL instance is serving traffic to the application.

Diagram showing the pre-maintenance state

Step 1

Set up the new VM.

a new Virtual Machine (VM) is set up with the latest database software and VM operating system (OS). The updated VM OS is started. At this point, the database engine is not yet started. For highly available instances, a new standby VM is also set up.

The total downtime is substantially shortened by installing the software update on another VM while the original Cloud SQL instance is still serving traffic.

Diagram showing setting up the VM

Step 2

Shut down the original VM.

The database engine is shut down so that the disk can be detached from the original VM and attached to the updated VM. Before shutting down, the database engine waits for a few seconds for ongoing transactions to be committed and requests from existing connections to drain. After that, any open or long-running transactions are rolled back. The database stops accepting new connections, and existing connections are dropped. The instance becomes unavailable and maintenance downtime begins.

Diagram of instance after failover

Step 3

Switch over to the updated VM.

The disk is detached from the original VM and attached to the updated VM. The static IP address is reconfigured to point to the updated VM. This ensures that the application uses the same IP address after maintenance as before. The database cache is cycled out with the original VM, meaning that the database cache is effectively cleared during maintenance.

Diagram of switching over to updated VM

Step 4

Start the updated VM.

The updated database engine is started on the data disk. Using a common data disk ensures that all transactions written to the original instance prior to maintenance are still present on the updated database after maintenance. If any incomplete transactions didn't finish rolling back during database shutdown, the database automatically goes through crash recovery to ensure that the database is restored to a usable state.

Diagram of starting up the updated VM

Post-maintenance

After Step 4, the Cloud SQL instance is available to accept connections and it returns to serving traffic to the application.

To the application, apart from the updated software, the Cloud SQL instance looks the same. The application still connects to the Cloud SQL instance using the same static IP address, and the updated VM runs in the same zone as the original VM. All data written to the original database is preserved.

Diagram of post-maintenance

Minimize the impact of maintenance

In general, Google Cloud recommends that users running applications in the cloud make their systems resilient to transient errors, which are momentary inter-service communication issues caused by temporary unavailability. Occasional transient errors are unavoidable in the cloud.

Some of the transient errors that occur during maintenance are dropped connections and failed in-flight transactions. If you design your systems and tune your applications to be resilient to transient errors, you're also positioned to minimize impacts due to database maintenance.

To minimize the impact of dropped connections, you can use connection pools. While connections between the pooler and the database are dropped during maintenance, the connections between the application and the pooler are preserved. That way, the work of reestablishing the connections is transparent to the application and offloaded to the connection pooler instead.

To reduce the transaction failures, you can limit the number of long-running transactions. Rewriting queries to be smaller and more efficient not only reduces maintenance downtime, but also improves database performance and reliability.

To recover efficiently from connection drops and transaction failures, you can efficiently manage your database connections. You can build connection and query retry logic with exponential back-off into your applications and connection poolers. In the event that a query fails or a connection is dropped, the system institutes a wait period before retrying, which increases for each subsequent retry. For example, the system might wait just a few seconds for the first retry, but up to a minute for the fourth retry. Following this pattern ensures that these failures are corrected, without overloading the service.

Other creative solutions can minimize maintenance impacts as well, from using scripts to warm the database cache after maintenance to streamlining the number of tables in databases. We recommend following database management best practices and operational guidelines to ensure that maintenance goes smoothly.

Time-sensitive maintenance

In very rare cases, Cloud SQL might need to schedule maintenance outside of your maintenance settings to patch severe stability issues or vulnerabilities that are time-sensitive. These updates are delivered rapidly, and Cloud SQL counts them as downtime against the SLA.

Self-service maintenance

Cloud SQL regularly releases software improvements and patches to security vulnerabilities through new maintenance versions that you can install on your instances. Cloud SQL maintains a Cloud SQL maintenance changelog for each database engine major version. To learn more, see Cloud SQL maintenance changelogs.

While Cloud SQL schedules maintenance updates once every few months to ensure you have the latest software, you can use self-service maintenance to keep your instance up-to-date if:

  • You need an update sooner than your next scheduled maintenance event.
  • You want to catch up to the latest software after skipping your most recent maintenance update.

If you use read replicas, then you can use self-service maintenance to update all of your read replicas. You specify the primary instance, and the maintenance request updates all of the read replicas of the primary instance to the specified maintenance version. Then the primary instance is updated to the maintenance version.

Maintenance limitations

This section describes the limitations of Cloud SQL maintenance.

Reschedule limitations

There are a few things you need to know about rescheduling:

  • You must reschedule maintenance at least 24 hours before the originally scheduled maintenance event happens.

  • You can reschedule maintenance on one or multiple instances in your project. However, you can only reschedule one instance at a time (bulk rescheduling is not available).

  • You can reschedule maintenance to a time that falls within a deny maintenance period, or even outside the maintenance window, as long as the time falls within the 28 days rescheduling limitation.

  • If a maintenance operation is in progress, rescheduling is delayed until the operation is complete.

Deny maintenance period limitations

There are a few things you need to know about deny maintenance periods:

  • You can have a deny maintenance period even if you don't have maintenance windows configured for your instance. Deny maintenance periods can span from 1 to 90 days.

  • The deny maintenance period takes precedence over any scheduled maintenance window. If there is a conflict between the timing of a maintenance window and the deny maintenance period, the deny maintenance period overrides the maintenance window.

  • Deny maintenance periods and relative scheduling are independent features. A deny maintenance period specified on an Earlier instance has no impact in determining the schedule for the Later instance. Notifications are not sent if the maintenance schedule falls within the deny maintenance period for Earlier or Later instances.

  • When a deny period is set on a primary instance, maintenance for all replicas associated with the primary instance is also denied. As an example, a primary instance located in region A has three read replicas: two in region A and one in region B. When a deny period is set on the primary instance, maintenance to each of the replicas, including the replica in region B, does not receive maintenance until the deny period on the primary instance expires.

  • If a deny maintenance period is set after maintenance is scheduled such that the deny maintenance period overlaps with the scheduled maintenance time, the update is skipped.

  • You can set the deny maintenance period to recur every year by not including the year in the start and end date parameters. If the year is specified, the deny maintenance period is set for only that year.

  • You can set multiple deny maintenance periods in a year. We recommend that you avoid chaining deny periods together to skip consecutive scheduled maintenance events. Staying current on Cloud SQL maintenance is important to ensure that your instance operates reliably. Typically, Cloud SQL maintenance is scheduled once every few months.

  • In order to ensure service reliability, Cloud SQL may notify users with instances running maintenance releases that are more than 12 months old that the next maintenance rollout is required.

  • When a deny maintenance period ends, regular maintenance behavior resumes.

  • Deny maintenance periods don't affect user-triggered operations, such has self-service maintenance.

Maintenance FAQ

How does maintenance affect Legacy HA failover instances?

Legacy HA failover instances are taken down for maintenance updates. They receive maintenance updates right before the primary instance. You can't set a maintenance window directly on a Legacy HA failover instance, because it shares the maintenance window of the primary instance.

Does maintenance downtime count toward the SLA?

Downtime from normal maintenance does not count towards the SLA. However, Cloud SQL counts time-sensitive maintenance downtime against the SLA.

How does maintenance affect read replicas?

  • Cloud SQL always maintains read replicas before the primary instance. If the primary instance has a maintenance window, read replicas observe the same maintenance window.
  • If your primary instance has multiple read replicas, Cloud SQL might update some of the replicas simultaneously.
  • Read replicas observe the deny maintenance period set for the primary instance.

Can I cancel scheduled maintenance?

You can't cancel a scheduled maintenance window, but you can reschedule it. You can also configure a deny maintenance period that overlaps with the scheduled maintenance time to effectively skip maintenance.

What happens if the maintenance event is canceled?

If Cloud SQL cancels a maintenance event, you receive a notification that maintenance is canceled in advance, when possible.

You receive a new notification of upcoming maintenance when the maintenance event is rescheduled.

Is Cloud SQL maintenance cumulative?

Maintenance updates are cumulative. There's no need to apply each maintenance update that you might have missed. The latest maintenance version is applied in the next scheduled maintenance update. Or, you can apply the latest maintenance update using self-service maintenance.

How long does self-service maintenance take for all the read replicas of a primary instance?

The amount of time that a self-service maintenance update takes depends on the total number of read replicas of your primary instance. To reduce the amount of time that the self-service maintenance update might take, you can update a few read replicas individually and then perform the update on the primary instance to update the rest of the read replicas.

The second update skips any replicas that already have the target maintenance version.

If I have multiple read replicas of my primary instance, can I do self-service maintenance on a single read replica?

Yes, you can perform self-service maintenance on an individual read replica instance. However, we recommend that you update the rest of the read replicas and primary instance to the same maintenance version soon afterwards. We recommend that you operate all the read replicas and primary instance with an identical maintenance version.

What's next