This page is an overview of the high availability (HA) configuration for Cloud SQL instances. To configure a new instance for HA, or to enable HA on an existing instance, see Enabling and disabling high availability on an instance.
HA configuration overview
The HA configuration, sometimes called a cluster, provides data redundancy. A Cloud SQL instance configured for HA is also called a regional instance and is located in a primary and secondary zone within the configured region. Within a regional instance, the configuration is made up of a primary instance and a standby instance. Through synchronous replication to each zone's persistent disk, all writes made to the primary instance are also made to the standby instance. In the event of an instance or zone failure, this configuration reduces downtime, and your data continues to be available to client applications.
Note: The standby instance cannot be used for read queries. This differs from the Cloud SQL for MySQL legacy HA configuration.
Regional PD support for Cloud SQL and the Cloud SQL HA configuration are GA with full SLA coverage. An HA-configured instance is charged at double the price of a standalone instance. This includes CPU, RAM, and storage. For more information, see the pricing page.
If an HA-configured instance becomes unresponsive, Cloud SQL automatically switches to serving data from the standby instance. This is called a failover. To see if failover has occurred, check your operation log's failover history.
Click the tabs to see how failover affects your instance.
The following process occurs:
The primary instance or zone fails.
Each second, the primary instance writes to a system database as a heartbeat signal. If multiple heartbeats aren't detected, failover is initiated. This occurs if the primary instance is unresponsive for approximately 60 seconds or the zone containing the primary instance experiences an outage.
The standby instance now serves data upon reconnection.
Through a shared static IP address with the primary instance, the standby instance now serves data from the secondary zone.
For Cloud SQL to allow a failover, the configuration must meet the following requirements:
- The primary instance must be in a normal operating state (not stopped, undergoing maintenance, or performing a long-running Cloud SQL instance operation such as a backup, import or export operation).
- The secondary zone and standby instance must both be in a healthy state. When the standby instance is unresponsive and/or replication to the secondary zone is interrupted, failover operations are blocked. After Cloud SQL repairs the standby instance and the secondary zone is available, replication resumes and Cloud SQL allows failover.
Backup and restore
Automated backups and point-in-time recovery must be enabled for high availability (point-in-time recovery uses binary logging).
Applications and instances
There is no difference in working with non-HA and HA instances, so your application does not need to be configured in any particular way. When failover occurs, any existing connections to the primary instance and read replicas are closed, and it will take approximately 2-3 minutes for connections to the primary instance to be reestablished. Connections to replicas can take longer. Your application reconnects using the same connection string or IP address, so you do not need to update your application after failover.
To see exactly how your applications are affected by failover, manually initiate failover.
Maintenance events affect primary instances configured with HA in the same way as any other instance. You can expect primary instances to be down during this time. To minimize impact to your service, you can set a maintenance window to control when downtime occurs.
When maintenance occurs on an instance, it does not fail over to the standby instance. Maintenance updates are applied to the standby instance at the same time as the primary instance.
Regional persistent disk performance depends on many factors. Specifically, look at vm instance type size and your workload input and output. Another metric to note is that the latency for regional persistent disk with solid-state drives (SSD) is higher than it would be for persistent disk with local SSD. What this means is that if your workload is not a streaming workload and latency sensitive, it can't reach the input/output operations per second (IOPS) limit as regional persistent disk with SSD has higher latency than a persistent disk with local SSD — this is because the redundancy needed to write two copies increases the tail latency.
Legacy MySQL high availability option
Until Q1 2021, you have the option of using the legacy process for adding high availability to MySQL instances, which uses a failover replica. The legacy functionality is not available in the Cloud Console. Instead, use gcloud or cURL commands. See Legacy configuration: Creating a new instance configured for high availability or Legacy configuration: Configuring an existing instance for high availability.
- Enabling and disabling high availability on an instance.
- Initiate failover.
- Learn more about managing your database connections.
- Learn more about regions and zones in Cloud SQL.