This page gives an overview of manual failover for Memorystore for Redis. To learn how to perform a failover, see Initiating a manual failover.
What is a manual failover?
A standard tier Memorystore for Redis instance uses a replica node to back up the primary node. A normal failover occurs when the primary node becomes unhealthy, causing the replica to be designated as the new primary. A manual failover differs from a normal failover because you initiate it yourself. For more information on how Memorystore for Redis replication works, see High availability.
Why initiate a manual failover?
Initiating a manual failover allows you to test how your application responds to a failover. This knowledge can ensure a smoother failover process if an unexpected failover occurs later on.
Optional data protection mode
The two available data protection modes are:
- Manual failover always runs in limited-data-loss mode, unless you change the mode.
To change the data protection mode, use one of the following commands:
gcloud redis instances failover INSTANCE_NAME --data-protection-mode=force-data-loss
gcloud redis instances failover INSTANCE_NAME --data-protection-mode=limited-data-loss
How data protection modes work
If you want to test how your application will behave in a real failover
scenario, you can use the
force-data-loss mode because it most accurately
represents the conditions of a failover in disaster recovery.
Any failover from the primary to the replica risks some data loss. The
limited-data-loss mode keeps that data loss to a minimum by verifying that the
difference in synchronization between your primary and replica is below 30MB
before initiating the failover.
force-data-loss mode overrides this check on primary-replica
synchronization. If you use the
force-data-loss mode when the replica
synchronization is more than 30 MB behind the primary, you could lose 30 MB of
data or more.
Bytes pending replication metric
The bytes pending replication metric tells you how many remaining bytes the replica needs to copy before the primary is fully backed up. You can access this metric in the Google Cloud Console on the instance details page. To view the instance details page, click the instance id in your project's instances list page.
Alternatively, access the Google Cloud's operations suite metrics explorer for your project, and search for the redis.googlapis.com/replication/offset_diff metric.
When to run a manual failover
Manual failovers using the default
limited-data-loss protection mode only
succeed if the bytes pending replication metric is less than 30MB. If you
want to run a manual failover with bytes pending replication higher than
30MB, use the
force-data-loss protection mode.
If you are trying to preserve as much data as possible, temporarily stop your application from writing to the Redis instance, and wait to run your manual failover until the bytes pending replication metric is as low as you deem acceptable.
Potential issues blocking a manual failover
Running a manual failover on a Basic Tier instance does not work because Basic Tier instances do not have replicas.
If your Redis instance is unhealthy, then the manual failover operation is blocked.
If your instance has incomplete operations pending, such as scaling or updating, the manual failover operation is blocked. You must wait until your instance is in the
READYstate to run a manual failover.
Client application connection
When your primary node fails over to the replica, existing connections to Memorystore for Redis are dropped. However, on reconnect, your application is automatically redirected to the new primary node using the same connection string or IP address.
Verifying a manual failover
You can verify the success of a manual failover operation with the
Google Cloud Console, Google Cloud's operations suite, or
Cloud Console verification
Before you start a manual failover, go to the Memorystore for Redis instances list page, and click the name of your instance.
Then, under Instance Properties, view which zones your primary and replica are in. Make a note of the zones. Check this page again when you complete your manual failover to confirm that the primary node and replica node switched zones.
Cloud Monitoring verification
To view the metrics for a monitored resource using Metrics Explorer, do the following:
- In the Google Cloud Console navigation pane, select Monitoring:
Go to Google Cloud Console
If this is the first access of Cloud Monitoring for this Google Cloud project, then Cloud Monitoring creates a Workspace. Typically, this process is automatic and completes within a few minutes. If prompted to either select a Workspace or to create a Workspace, select create.
- In the Monitoring navigation pane, click Metrics Explorer.
- Ensure Metric is the selected tab.
- Click in the box labeled
Find resource type and metric, and then select from the menu or
enter the name for the resource and metric. Use the following information to complete the
fields for this text box:
- For the Resource, select or enter Cloud Memorystore Redis.
- For the Metric, select or enter Node role.
- Use the Filter, Group By, and Aggregator menus to modify how the data is displayed. For example, you can group by resource or metric labels. For more information, see Selecting metrics.
The Cloud Monitoring chart represents the primary and replica nodes with two lines. When a node's line has a value of zero on the chart, it is the replica node. When a node's line has a value of one on the chart, it is the primary node. The chart represents a failover by showing how the lines switch from one to zero, and zero to one, respectively.
Before you initiate a manual failover, use the following command to check which zone your primary node is in:
gcloud redis instances describe [INSTANCE_ID] --region=[REGION]
Your primary node is in the zone labeled
currentLocationId. Make a note of the
After you complete a manual failover, you can confirm that your primary node
switched to a new zone by running the
gcloud redis instances describe command
again and checking that the
currentLocationId changed zones.
locationId label tells you the zone in which you originally
provisioned your primary node. The
alternativeLocationId label tells you the
zone in which system originally provisioned your replica node. Each time a
failover occurs the primary and replica switch between these two zones. However,
the zones associated with
alternativeLocationId do not