ScaNN vector query performance overview

Select a documentation version:

This page provides a conceptual overview of improving vector query performance using AlloyDB AI's Scalable Nearest Neighbor (ScaNN) index. For more information, see Create indexes and query vectors.

The ScaNN index uses tree-quantization-based indexing, in which indexes learn a search tree together with a quantization (or hashing) function. When you run a query, the search tree is used to prune the search space, while quantization is used to compress the index size. This pruning speeds up the scoring of the similarity—in other words, the distance—between the query vector and the database vectors.

To achieve both a high query-per-second rate (QPS) and a high recall with your nearest-neighbor queries, you must partition the tree of your ScaNN index in a way that is most appropriate to your data and your queries.

High-dimensional embedding models can retain much of the information at much lower dimensionality. For example, you can retain 90% of the information with only 20% of the embedding's dimensions. To help speed up such datasets, the AlloyDB AI ScaNN index automatically performs dimension reduction using Principal Component Analysis (PCA) on the indexed vectors, which further reduces CPU and memory usage for the vector search. For more information, see scann.enable_pca.

Because dimension reduction causes minor recall loss in the index, the AlloyDB AI ScaNN index compensates for recall loss by first performing a ranking step with a larger number of PCAed vector candidates from the index. Then, ScaNN re-ranks the PCAed vector candidates by the original vectors. For more information, see scann.pre_reordering_num_neighbors.

What's next