Zero-downtime scaling with Memorystore for Redis Cluster: under the hood
Senior Product Manager, Google Cloud Databases
Ahead of the general availability of Memorystore for Redis Cluster, a fully-managed and highly-scalable low latency Redis Cluster service, we want to share some of its differentiating capabilities in a series of “under the hood” blogs. Just like other Google Cloud services like AlloyDB, where Google made significant enhancements to open source software, we’ve similarly made under-the-hood improvements to the Redis engine. This blog is the first of a series discussing the enhancements we’ve made to Redis Cluster.
Why Redis Cluster?
With the continued growth of Redis, many developers have turned to Redis Cluster for its improved scalability and performance. In fact, the primary motivation for the release of OSS (open source software) Redis Cluster in 2015 was the need for a highly scalable, in-memory data store that can provide lookups at ultra-low latencies. Because Redis is single-threaded, a clustered offering that scales “out” (or “horizontally”) by adding additional nodes provides vastly superior performance over a single “standalone” node that only scales “up” by growing the VM size. Redis Cluster integrates high availability directly within the engine, eliminating the need for external tools such as Sentinel. Moreover, the architecture of Redis Cluster offers operational flexibility. During updates or maintenance, only specific segments of the cluster are affected, allowing for continuous service while minimizing the impact on user data.
Despite the tantalizing opportunity to scale performance with Redis Cluster, the lack of reliable automation and the risky nature of Redis Cluster scaling operations, even today, often limits users from realizing its full benefits. In this blog we detail the specific enhancements the Memorystore team made to Redis to address the shortcomings of OSS scaling. For each of these enhancements, we’ve been actively engaging with the OSS Redis community, sharing our designs, code and improvements. Contributing to OSS and keeping our work aligned with the core Redis Cluster project ensures compatibility, embodies our commitment to open-source principles, and helps to avoid any divergent behaviors.
With Memorystore for Redis Cluster, we’ve made significant enhancements to the Redis engine to de-risk both scale-in and scale-out operations. With true zero-downtime scalability, you can fully take advantage of our pay-as-you-go model, increasing capacity ahead of peak events and shrinking it afterwards, empowering you to only pay for what you need. Consider the Black Friday/Cyber Monday sales period when many businesses must prepare for an overwhelming volume of transactions and user interactions. By leveraging the fully-managed Memorystore for Redis Cluster, you can quickly and efficiently scale your infrastructure in preparation for this peak and subsequently reduce it when the demand subsides, ensuring seamless user experiences while optimizing costs.
Issues with OSS Redis scaling
Below, we’ve detailed challenges associated with scaling OSS Redis. Then, in the following section we’ll discuss how we’re addressing these risks.
In theory, scaling out a self-managed Redis Cluster is fairly straightforward. If you’re running on a service like Compute Engine, you first provision a new set of VMs with the appropriate resource and network configurations. Next, you employ the Redis CLI to integrate these nodes into the cluster, designating some as primary nodes and others as replicas. Finally, you use the Redis CLI to redistribute the slots to these newly integrated nodes.
However, in practice things are often much, much more complicated. Scaling a cluster is often time-sensitive — perhaps you received an alert about high CPU utilization or low memory — and like many manual operations, error-prone. And, if you’re managing a cluster with many nodes or with replicas, the complexity only increases. Further, in the current state of OSS Redis Cluster, the resharding process isn't entirely autonomous. An external agent, running outside of Redis, is required to drive the procedure by extracting keys from the source node and subsequently dispatching them to the target node.
Redis Cluster partitions its data across 16,384 slots, each assigned to a specific node. The primary reason for resharding is to recalibrate storage and compute capacity in line with changing workloads. This ensures that nodes are adequately primed to manage their data responsibilities.
Each slot must have one of the three “migration states” of “stable”, “importing”, or “migrating”, which each play a crucial role in the resharding mechanism. Redis maintains these states in memory. The "stable" state signifies the slot’s regular operational state. The node with a slot in the "importing" state is not yet the data owner and only serves requests prefixed with the
ASKING command. The node with a slot in the “migrating” state retains data ownership. If keys requested by clients aren't found on a given node, then that node responds with an "-ASK" error, directing them to the target node which contains the keys. Clients then send the "ASKING" command to the target node to access the keys during the slot's migration. This intricate choreography of “migration state” management not only adds significant overhead but also makes it challenging to identify and address all potential failure cases. The inherent complexity reduces the predictability and reliability of the system, making it difficult to design a straightforward error recovery strategy hence impacting the overall robustness and performance of the OSS Redis system.
During a scaling operation the external agent leverages the
CLUSTER SETSLOT command to manipulate the migration state of slots on both originating and receiving nodes. Subsequently, it applies
CLUSTER GETKEYSINSLOT to extract the keys to be moved, and
MIGRATE to move them from source to target, ensuring each key, associated with a slot by its hash value, is transported intact.
Upon completion, the cluster's configuration across nodes updates, reflecting the new slot-to-node assignments. This synchronization is vital for maintaining a consistent and accurate view of the cluster's architecture.
With this as the backdrop, here are the most common challenges with scaling OSS Redis Cluster.
High availability lapse in empty shards
In OSS Redis, empty shards (those without assigned slots) present operational challenges. For one, they lack identification until they complete the migration of their first slot, making it difficult for external agents to pinpoint issues without sophisticated state tracking. Moreover, even with replicas in place, these empty shards don't support automatic failover until they receive their initial slot. If a primary node in one of these shards becomes unresponsive during the import of its very first slot, it can lead to both service interruptions and potential data loss pertaining to that slot.
Single point of failure in slot-ownership finalization
Migration states in OSS Redis clusters aren't propagated to replicas. Consequently, during automatic failovers, slot migration states may be lost, and newly elected primaries might remain unaware of ongoing migrations. The lack of state continuity increases the risk of potential data loss when the original primaries experience a crash or out-of-memory (OOM) events. Rectifying this requires a sophisticated external agent, introducing complexity and operation overheads.
Higher impact on workload
The current OSS migration process is driven by an agent that runs externally to the Redis server. The agent’s external nature both presents risks associated with the agent’s availability and durability as well as increases the risk of scaling activities because, for example, the agent cannot easily check available memory to determine a safe key count to migrate. Because of the agent’s limited control over data volume movement, scaling operations from that external agent can lead to prolonged blocks on running user workloads, directly affecting the cluster’s operations. Due to the agent’s lack of real-time visibility into both memory pressure and client workload levels, it becomes challenging for the agent to balance the migration operation with ongoing customer workloads. For instance, migration might inadvertently coincide with a surge in customer activity, amplifying the impact on user operations and potentially leading to pronounced service disruptions.
Higher risk of OOM
Externally-driven migration poses a significant OOM risk, stemming from limited control over the amount of data migrated simultaneously. When handling the migration of large keys or dense data structures like a populated hash map, the source node can experience sudden and intense memory pressure. A distinct challenge for external agents is their lack of real-time visibility into this memory pressure. An out-of-memory condition not only jeopardizes the ongoing migration process but can also lead to the OOM killing of the Redis process, causing disruptions to the application itself.
Low resilience to transient errors
The migration process is susceptible to disruptions from issues like network glitches or node failovers. In such events, the external agent gets minimal error feedback from migration commands. To determine the root cause, the agent executes Redis commands on the affected node, adding load to it even while it’s managing customer workloads. The information retrieved can age quickly. Additionally, resuming an interrupted migration demands explicit operations from the agent. Such high-touch interventions are not only tedious but also introduce a higher risk of errors.
Higher migration overhead
The existing OSS slot migration process involves multiple inter-process communications (IPCs) between Redis and the external agent. This process requires the agent to pull data from the source and then relay it to the target, effectively doubling the network bandwidth utilization. Moreover, this approach increases the collective CPU overhead. The data being read into the agent is processed and then dispatched to the target, resulting in redundant computational work on both the agent and the source. This not only strains resources but can also slow down the overall migration process, impacting performance.
The external execution of the slot migration protocol, detailed in Redis official documentation, necessitates a sophisticated external agent for the migration's initiation, oversight, and conclusion. Beyond just initiating the migration and overseeing its progression, this agent effectively becomes a complex state machine. Each step or state within this machine not only progresses the data migration but also must account for error handling to ensure state consistency. Given the myriad of steps involved, each with its potential failure modes, maintaining this consistent state while navigating the nuances of each phase compounds the operational intricacies and risks.
Absence of stable shard identification
OSS Redis 7.0 introduced the CLUSTER SHARDS command, yet it still lacks a mechanism for constant, stable identification of each shard. This omission complicates operations, hindering precise referencing of specific shards and understanding node-to-shard relationships, especially when slot ownership changes over time.
Solutions, from Memorystore for Redis Cluster
To address these complexities, and their associated risks, Memorystore for Redis Cluster provides users with zero-downtime scaling (in or out) from a single click or API. In addition, the following four improvements to Memorystore for Redis Cluster systematically address the inherent challenges in traditional OSS Redis resharding, offering a platform with enhanced efficiency, reliability, and operational simplicity.
1. Engine-driven slot migration (addresses challenges #1, #2, #3, #4)
By moving the migration protocol into the Redis engine, Memorystore for Redis Cluster streamlines slot migration and reduces the overall complexity of scaling operations to improve the availability and durability of the cluster. Utilizing the engine's main thread, Redis directly manages migrations, eliminating the dependency on an external agent. This approach reduces network and CPU overhead as data is directly transferred from source to target. Moreover, the system's design inherently balances customer and migration workloads, adjusting migration pace based on server activity. If interruptions occur, the architecture automatically resumes migration after a failover. This design provides Memorystore for Redis Cluster with a more efficient and resilient migration mechanism.
2. Introduction of Shard ID (addresses challenge #4)
Memorystore for Redis Cluster introduced a SHARD ID enabling direct referencing of a shard. This facilitates efficient verification of whether two nodes belong to the same shard, particularly in scenarios when nodes are down or don't own slots. By eliminating the need to depend on large slot ranges, the system becomes more robust and navigable. Google contributed this feature to OSS and it has been merged into OSS Redis 7.2.
3. Automatic failover in empty shards (addresses challenge #1)
Building on the concept of first-class citizen shards, Memorystore for Redis Cluster enhances the Redis engine to initiate leader elections for empty shards. This ensures that migrated keys on the replicas remain accessible even if the primary node fails. This mechanism enhances reliability, especially during scaling transition periods.
4. Mitigating single point of failure in scaling (addresses challenge #1)
Memorystore for Redis Cluster introduced two key solutions:
a. Migration states are now replicated to replicas, ensuring they're safeguarded during automatic failovers.
b. Proper sequencing for slot ownership, preventing potential data loss if primary nodes crash during migrations. We have been working closely with the OSS community to incorporate these improvements into Redis 8.0.
With Memorystore for Redis Cluster, you’re unlocking the true potential of a scalable Redis Cluster that also provides microsecond latency. You can easily get started today by heading over to the Cloud console and creating a cluster, and then scaling in or out with just a few clicks. If you want to learn more about migrating to Memorystore for Redis Cluster, take a look at this step-by-step migration blog. Stay tuned for our next blog in this series of deep dives and let us know if you have any feedback by reaching out to us at email@example.com.