Choose a vector index in AlloyDB AI

This page describes AlloyDB AI vector search strategies and explains when to use each strategy. By default, AlloyDB uses k-nearest neighbors search (KNN) to find vectors that are similar to a query. Vector indexes implement a search strategy called Approximate Nearest Neighbor (ANN). When you create a vector index, AlloyDB AI uses ANN, which provides better performance than KNN. Keep in mind that, when you select a vector index, you need to balance query latency and recall.

Recall measures how effectively a search retrieves all relevant items for a given query. For example, imagine you have 100 embeddings, each one representing an entity in your database. You query your embeddings with a target vector and limit it to 10 results. A KNN vector search finds the 10 exact closest vectors using a brute force calculation method, which results in 100% recall. AlloyDB AI uses this method by default if no vector search index is created or chosen. When you create a vector index in AlloyDB for PostgreSQL, it typically uses ANN, which might partition vectors according to similarity to facilitate faster retrieval. As a result, using ANN, the 10 vectors returned in the earlier example might not be exactly the 10 vectors that are closest in distance. If only 8 out of the 10 retrieved vectors are the closest in space to your query vector, then your recall is 80%.

Query latency defines how fast the search results are generated. For example, latency is calculated based on the time spent on a search to return the vectors after you submit a query.

Choose your search strategy

When you perform vector search in AlloyDB, choose one the following search strategies:

Search Strategy Description Use Cases
K-nearest neighbors (KNN) An algorithm that finds the k-nearest neighbors data points to a given query data point. When you perform a vector search without creating an index, a KNN search is performed by default.
  • Your application is very sensitive to accuracy and you need the exact closest matches.
  • You have fewer than 100,000 vectors.
Approximate Nearest Neighbors (ANN) An algorithm that finds approximately the closest data points. ANN divides existing customer data points into small groups based on similarities.
  • Your application requires low latency.
  • You have more than 100,000 vectors.

Google recommends that you create a vector index to optimize performance on your vector search queries. For more information about how the ANN index is used for similarity searches, see Create indexes and query vectors using ScaNN.

What's next