This document shows you how to tune your indexes to achieve faster query performance and better recall.
Tune a ScaNN
index
ScaNN index uses tree-quantization based indexing. In Tree-quantization techniques, indexes learn a search tree together with a quantization (or hashing) function. When you run a query, the search tree is used to prune the search space while quantization is used to compress the index size. This pruning speeds up the scoring of the similarity (i.e., distance) between the query vector and the database vectors.
To achieve both a high query-per-second rate (QPS)
and a high recall with your nearest-neighbor queries, you must partition
the tree of your ScaNN
index in a way that is most appropriate to your data
and your queries.
Before you build a ScaNN
index, complete the following:
- Make sure that a table with your data is already created.
- Make sure that the value you set for the
maintenance_work_mem
and theshared_buffers
flag is less than total machine memory to avoid issues while generating the index.
Tuning parameters
The following index parameters and database flags are used together to find the right balance of recall and QPS. All the parameters apply to both ScaNN
index types.
Tuning parameter | Description | Parameter type |
---|---|---|
num_leaves |
The number of partitions to apply to this index. The number of partitions you apply to when creating an index affects the index performance. By increasing partitions for a set number of vectors, you create a more fine-grained index, which improves recall and query performance. However, this comes at the cost of longer index creation times. Since three-level trees build faster than two-level trees, you can increase the num_leaves_value when creating a three-level tree index to achieve better performance.
|
Index creation |
quantizer |
The type of quantizer you want to use for the K-means tree. The default value is SQ8 for better query performance.Set it to FLAT for better recall. |
Index creation |
enable_pca |
Enables Principal Component Analysis (PCA), which is a dimension reduction technique used to automatically
reduce the size of the embedding when possible. This option is enabled by default. Set to false if you observe deterioration in recall. |
Index creation |
scann.num_leaves_to_search |
The database flag controls the trade off between recall and QPS. The default value is 1% of the value set in num_leaves . Higher the value set, better is the recall, but results in lower QPS, and the other way around. |
Query runtime |
scann.max_top_neighbors_buffer_size |
The database flag specifies the size of cache used to improve the performance for filtered queries by scoring or ranking the scanned candidate neighbors in memory instead of the disk. The default value is 20000 . Higher the value set, better is the QPS under filtered queries, but results in higher memory usage, and the other way around. |
Query runtime |
scann.pre_reordering_num_neighbors |
The database flag when set, specifies the number of candidate neighbors to consider during the reordering stages after initial search identifies a set of candidates. Set this to a value higher than the number of neighbors you want the query to return. Higher value sets result in better recall, but this approach results in lower QPS. |
Query runtime |
scann.num_search_threads |
The number of searcher threads for multi-thread search. The default value is 2 . |
Query runtime |
scann.max_num_prefetch_datasets |
The maximum number of data batches to prefetch during index search,where batch is a group of buffer pages. The default value is 100 . When you use a multi-thread search, batch locking locks the buffer pages first. This might lead to conflicts on Data Manipulation Language (DML) and replication path for certain workloads. If you want to reduce conflicts, then try to reduce this value, but doing so might reduce the parallelism. |
Query runtime |
max_num_levels |
The maximum number of levels of the K-means clustering tree.
|
Index creation |
Tune a ScaNN
index
Consider the following examples for two-level and three-level ScaNN
indexes that show how tuning parameters are set:
Two-level index
SET LOCAL scann.num_leaves_to_search = 1;
SET LOCAL scann.pre_reordering_num_neighbors=50;
CREATE INDEX my-scann-index ON my-table
USING scann (vector_column cosine)
WITH (num_leaves = [power(1000000, 1/2)]);
Three-level index
SET LOCAL scann.num_leaves_to_search = 10;
SET LOCAL scann.pre_reordering_num_neighbors=50;
CREATE INDEX my-scann-index ON my-table
USING scann (vector_column cosine)
WITH (num_leaves = [power(1000000, 2/3)], max_num_levels = 2);
Any insert or update operation on a table where a ScaNN
index is already
generated impacts how the learned tree optimizes the index. If
your table is prone to frequent updates or insertions, then we recommend
periodically reindexing the existing ScaNN
index to improve the recall accuracy.
You can monitor index metrics to determine the amount of mutations created since the index was built, and then reindex accordingly. For more information about metrics, see Vector index metrics.
Best practices for tuning
Based on the type of ScaNN
index you plan to use, the recommendations for tuning your index vary. This section provides recommendations about how to tune index parameters for optimal balance between recall and QPS.
Two-level tree index
To apply recommendations to help you find the optimal values of num_leaves
and num_leaves_to_search
for your dataset,
follow these steps:
- Create the
ScaNN
index withnum_leaves
set to the square root of the indexed table's row count. - Run your test queries, increasing the value of
scann.num_of_leaves_to_search
, until you achieve your target recall range–for example, 95%. For more information about analyzing your queries, see Analyze your queries. - Take note of the ratio between
scann.num_leaves_to_search
andnum_leaves
that will be used in subsequent steps. This ratio provides approximation around the dataset that will help you achieve your target recall.
If you are working with high dimension vectors (500 dimensions or higher) and want to improve recall, then try tuning the value ofscann.pre_reordering_num_neighbors
. As a starting point, set the value to100 * sqrt(K)
whereK
is the limit that you set in your query. - If your QPS is too low after your queries achieve a target recall, then follow these steps:
- Recreate the index, increasing the value of
num_leaves
andscann.num_leaves_to_search
according to the following guidance:- Set
num_leaves
to a larger factor of the square root of your row count. For example, if the index hasnum_leaves
set to the square root of your row count, try setting it to double the square root. If the value is already double, then try setting it to triple the square root. - Increase
scann.num_leaves_to_search
as needed to maintain its ratio withnum_leaves
, which you noted in Step 3. - Set
num_leaves
to a value less than or equal to the row count divided by 100.
- Set
- Run the test queries again.
While you're running the test queries, experiment with reducing
scann.num_leaves_to_search
, finding a value that increases QPS while keeping your recall high. Try different values ofscann.num_leaves_to_search
without rebuilding the index.
- Recreate the index, increasing the value of
- Repeat Step 4 until both the QPS and the recall range have reached acceptable values.
Three-level tree index
In addition to the recommendations for the two-level tree ScaNN
index, use the following guidance and the steps to tune the index:
- Increasing the
max_num_levels
from1
for a two-level tree to2
for a three-level tree significantly reduces the time to create an index, but at the expense of recall accuracy. Setmax_num_levels
using the following recommendation:- Set the value to
2
if the number of vector rows exceeds 100 million rows. - Set the value to
1
if the number of vector rows are less than 10 million rows. - Set to either
1
or2
if the number of vector rows lie between 10 million and 100 million rows, based on balance of index creation time and the recall accuracy you need.
- Set the value to
To apply recommendations to find the optimal value of num_leaves
and max_num_levels
index parameters, follow these steps:
Create the
ScaNN
index with the followingnum_leaves
andmax_num_levels
combinations based on your dataset:- vector rows greater than 100 million rows: Set
max_num_levels
as2
andnum_leaves
aspower(rows, ⅔)
. - vector rows less than 100 million rows: Set
max_num_levels
as1
andnum_leaves
assqrt(rows)
. - vector rows between 10 million and 100 million rows: Start by setting
max_num_levels
as1
andnum_leaves
assqrt(rows)
.
- vector rows greater than 100 million rows: Set
Run your test queries. For more information about analyzing queries, see Analyze your queries.
If the index creation time is satisfactory, then retain the
max_num_levels
value, and experiment with thenum_leaves
value for optimal recall accuracy.If you aren't satisfied with the index creation time, then do the following:
If
max_num_levels
value is1
, then drop the index. Rebuild the index withmax_num_levels
value set to2
.Run the queries and tune the
num_leaves
value for optimal recall accuracy.If the
max_num_levels
value is2
, then drop the index. Rebuild the index with the samemax_num_levels
value and tune thenum_leaves
value for optimal recall accuracy.
Tune an IVF
index
Tuning the values you set for the lists
, ivf.probes
, and the quantizer
parameters might
help optimize your application's performance:
Tuning parameter | Description | Parameter type |
---|---|---|
lists |
The number of lists created during index building. The starting point for setting this value is (rows)/1000 for up to one million rows, and sqrt(rows) for more than one million rows. |
Index creation |
quantizer |
The type of quantizer you want to use for the K-means tree. The default value is SQ8 for better query performance. Set it to FLAT for better recall. |
Index creation |
ivf.probes |
the number of nearest lists to explore during search. The starting point for this value is sqrt(lists) . |
Query runtime |
Consider the following example that shows an IVF
index with the tuning parameters set:
SET LOCAL ivf.probes = 10;
CREATE INDEX my-ivf-index ON my-table
USING ivf (vector_column cosine)
WITH (lists = 100, quantizer = 'SQ8');
Tune an IVFFlat
index
Tuning the values you set for the lists
and theivfflat.probes
parameters can
help optimize application performance:
Tuning parameter | Description | Parameter type |
---|---|---|
lists |
The number of lists created during index building. The starting point for setting this value is (rows)/1000 for up to one million rows, and sqrt(rows) for more than one million rows. |
Index creation |
ivfflat.probes |
The number of nearest lists to explore during search. The starting point for this value is sqrt(lists) . |
Query runtime |
Before you build an IVFFlat
index, make sure that your database's
max_parallel_maintenance_workers
flag is set to a value sufficient to expedite
the index creation on large tables.
Consider the following example that shows an IVFFlat
index with the tuning parameters set:
SET LOCAL ivfflat.probes = 10;
CREATE INDEX my-ivfflat-index ON my-table
USING ivfflat (vector_column cosine)
WITH (lists = 100);
Tune an HNSW
index
Tuning the values you set for the m
, ef_construction
, and the hnsw.ef_search
parameters can
help optimize application performance.
Tuning parameter | Description | Parameter type |
---|---|---|
m |
The maximum number of connections per from a node in the graph. You can start with the default value as 16 (default) and experiment with higher values based on the size of your dataset. |
Index creation |
ef_construction |
The size of the dynamic candidate list maintained during graph construction, which constantly updates the current best candidates for nearest neighbors for a node. Set this value to any value higher than twice of the m value—for example, 64 (default). |
Index creation |
ef_search |
The size of the dynamic candidate list used during search. You can start setting this value to either m or ef_construction , and then change it while observing the recall. The default value is 40 . |
Query runtime |
Consider the following example that shows an hnsw
index with the tuning parameters set:
SET LOCAL hnsw.ef_search = 40;
CREATE INDEX my-hnsw-index ON my-table
USING hnsw (vector_column cosine)
WITH (m = 16, ef_construction = 200);
Analyze your queries
Use the EXPLAIN ANALYZE
command to analyze your query insights as shown in the following example SQL query.
EXPLAIN ANALYZE SELECT result-column FROM my-table
ORDER BY EMBEDDING_COLUMN ::vector
USING INDEX my-scann-index
<-> embedding('textembedding-gecko@003', 'What is a database?')
LIMIT 1;
The example response QUERY PLAN
includes information such as the time taken, the number of rows scanned or returned, and the resources used.
Limit (cost=0.42..15.27 rows=1 width=32) (actual time=0.106..0.132 rows=1 loops=1)
-> Index Scan using my-scann-index on my-table (cost=0.42..858027.93 rows=100000 width=32) (actual time=0.105..0.129 rows=1 loops=1)
Order By: (embedding_column <-> embedding('textgecko@003', 'What is a database?')::vector(768))
Limit value: 1
Planning Time: 0.354 ms
Execution Time: 0.141 ms
View vector index metrics
You can use the vector index metrics to review performance of your vector index, identify areas for improvement, and tune your index based on the metrics, if needed.
To view all vector index metrics, run the following SQL query, which uses the
pg_stat_ann_indexes
view:
SELECT * FROM pg_stat_ann_indexes;
For more information about the complete list of metrics, see Vector index metrics.