From millions to billions: Announcing vector search in Memorystore for Valkey and Redis Cluster
Kyle Meggs
Product Manager, Google Cloud
Jacob Murphy
Software Engineer
With the addition of vector search earlier this year, Memorystore for Redis emerged as an ideal platform for gen AI use cases such as Retrieval Augmented Generation (RAG), recommendation systems, semantic search, and more. Why? Because of its ultra-low latency vector search. Just a single Memorystore for Redis instance can perform vector search at single-digit millisecond latency over tens of millions of vectors. But what if you want to store more vectors than can fit into a single VM?
Today, we’re excited to announce vector search on both the new Memorystore for Valkey and Memorystore for Redis Cluster, combining 1) ultra-low latency in-memory vector search, with 2) zero-downtime scalability (in or out), and 3) powerful high performance vector search across millions or billions of vectors. Currently in preview, vector support for these Memorystore offerings mean you can now scale out your cluster by scaling out to 250 shards, storing billions of vectors in a single instance. In fact, a single Memorystore for Redis Cluster instance can perform vector search at single-digit millisecond latency on over a billion vectors with greater than 99% recall! This scale enables demanding enterprise applications such as semantic search over a global corpus of data.
Scalable in-memory vector search
The key to this performance and scalability is partitioning the vector index across the nodes in the cluster. Memorystore uses a local index partitioning strategy, meaning that each node contains a partition of the index that corresponds to the portion of the keyspace that is stored locally. Since the keyspace is already uniformly sharded using the OSS cluster protocol, each index partition is of roughly equal size.
Because of this design, adding nodes linearly improves index build times for all vector indices. Additionally, if the number of vectors is held constant, adding nodes improves Hierarchical Navigable Small World (HNSW) search performance logarithmically, and brute-force search performance improves linearly. Putting it all together, a single cluster can allow for a billion vectors to be indexable and searchable while maintaining fast index build times and low search latencies at high recall.
Hybrid queries
In addition to improved scalability, we are also excited to launch support for hybrid queries on Memorystore for Valkey and Memorystore for Redis Cluster. Hybrid queries let you combine vector searches with filters on numeric and tag fields. By combining numeric, tag, and vector search, you can use Memorystore to answer complex queries.
Suppose you are an online clothing retailer and you want to provide recommendations for similar items. Using a vector index, you can find semantically similar items using embeddings and vector similarity search. But with just vector search, it is possible that you are surfacing some irrelevant results which should be filtered out. A user could be searching for a red dress, but there could be some items in your search that are a different article of clothing (e.g. red hats) or some that are much more expensive than the original item.
To solve this problem with hybrid search, you can:
1. Use `FT.CREATE` to create a new vector index with additional fields for filtering:
`FT.CREATE inventory_index SCHEMA embedding VECTOR HNSW 6 DIM 128 TYPE FLOAT32 DISTANCE_METRIC L2 clothing_type TAG clothing_price_usd NUMERIC`
This creates an index `inventory_index` with:
-
A vector field `embedding` for the semantic embedding of the clothing item
-
A tag field `clothing_type` for the type of the article of clothing (e.g. “dress” or “hat”)
-
A numeric field `clothing_price_usd` for the price of the article of clothing
2. Use `FT.SEARCH` to perform a hybrid query on `inventory_index`. For example, we can query for 10 results while filtering to only articles of clothing of type “dress” and within the price range of $100 to $200:
`FT.SEARCH inventory_index “(@clothing_type:{dress} @clothing_price_usd:[100-200])=>[KNN 10 @embedding $query_vector]“ PARAMS 2 query_vector “...” DIALECT 2`
These filter expressions also support boolean logic, meaning that multiple fields can be combined to fine tune the search results to only those that matter. With this new functionality, applications can tune vector search queries to their needs to get even richer results than before.
Standing behind OSS Valkey
In the open-source community, there’s a lot of enthusiasm for the Valkey key-value datastore. As part of our commitment to make Valkey amazing, we’ve coauthored an RFC (Request For Comments submission) and we’re working with the open source community to donate our vector search capabilities to Valkey. An RFC is the first step in driving alignment within the community and we welcome feedback on our proposal and implementation. Our primary goals are to enable Valkey developers around the world to leverage Valkey vector search to create amazing gen AI applications.
The search is over for fast and scalable vector search
With today’s addition of fast and scalable vector search on Memorystore for Valkey and Memorystore for Redis Cluster, in addition to the existing functionality on Memorystore for Redis, Memorystore now offers ultra-low latency across all of its most popular engines. So when you’re building generative AI applications which require robust and consistent low-latency vector search, Memorystore will be hard to beat. Get started today by creating a Memorystore for Valkey or Memorystore for Redis Cluster instance to experience the speed of in-memory search.