이 페이지에서는 PostgreSQL용 AlloyDB에서 벡터 쿼리 검색률을 측정하는 방법을 설명합니다. 벡터 검색의 맥락에서 재현율은 색인이 반환하는 실제 최근접 이웃인 벡터의 비율을 나타냅니다. 예를 들어 최근접 이웃 20개에 대한 최근접 이웃 쿼리에서 최근접 이웃 '정답'을 19개 반환하면 재현율은 19/20x100 = 95%가 됩니다.
벡터 쿼리에서 검색에서 검색된 관련 결과의 비율을 측정하기 때문에 재현율이 중요합니다. 재현율은 K-최근접 이웃 (KNN) 검색의 결과와 비교하여 근사 최근접 이웃 (ANN) 검색의 결과 정확성을 평가하는 데 도움이 됩니다.
ANN은 지정된 쿼리 지점과 유사한 데이터 포인트를 찾는 알고리즘으로, 실제 이웃이 아닌 근사 이웃을 찾음으로써 속도를 개선합니다. ANN을 사용하면 속도와 검색의 균형을 맞출 수 있습니다.
KNN은 유사성 측정항목을 기반으로 데이터 세트 내에서 특정 쿼리 벡터와 가장 유사한 'k'개의 벡터를 찾는 알고리즘입니다. k는 쿼리에서 반환할 이웃 수를 나타냅니다.
다음을 비롯한 다양한 벡터 색인에 대해 벡터 검색 쿼리의 검색 결과 검색률을 측정할 수 있습니다.
확장 가능한 최근접 이웃 (ScaNN): 효율적인 벡터 유사성 검색을 위한 알고리즘입니다.
Hierarchical Navigable Small World (HNSW): 벡터 데이터베이스에서 효율적인 근사 최근접 이웃 검색에 사용되는 그래프 기반 알고리즘입니다.
IVFFLAT (Inverted File with Flat Compression) 및 IVF (Inverted File Flat): 특히 PostgreSQL pgvector 확장 프로그램과 같은 데이터베이스에서 ANN 검색에 사용되는 벡터 색인 유형입니다.
이 페이지에서는 PostgreSQL, AlloyDB, 벡터 검색에 익숙하다고 가정합니다.
시작하기 전에
pgvector 확장 프로그램을 설치하거나 업데이트합니다.
pgvector 확장 프로그램이 설치되어 있지 않으면 vector 확장 프로그램 버전 0.8.0.google-3 이상을 설치하여 생성된 임베딩을 vector 값으로 저장합니다. vector 확장 프로그램에는 pgvector 함수와 연산자가 포함됩니다. Google은 AlloyDB용 최적화로 이 버전의 pgvector를 확장합니다.
pgvector 확장 프로그램이 이미 설치된 경우 vector 확장 프로그램을 버전 0.8.0.google-3 이상으로 업그레이드하여 리콜 평가자 기능을 사용합니다.
ALTEREXTENSIONvectorUPDATETO'0.8.0.google-3';
ScaNN 색인을 만들려면 alloydb_scann 확장 프로그램을 설치합니다.
CREATEEXTENSIONIFNOTEXISTSalloydb_scann;
벡터 색인에서 벡터 검색어의 재현율 평가
evaluate_query_recall 함수를 사용하여 지정된 구성의 벡터 색인에서 벡터 쿼리의 재현율을 찾을 수 있습니다. 이 함수를 사용하면 원하는 벡터 쿼리 검색 결과를 얻기 위해 매개변수를 조정할 수 있습니다.
재현율은 검색 품질에 사용되는 측정항목으로, 반환된 결과 중 객관적으로 쿼리 벡터에 가장 가까운 결과의 비율로 정의됩니다. evaluate_query_recall 함수는 기본적으로 사용 설정되어 있습니다.
QUERY_TIME_CONFIGURATIONS: Optional: the configuration
that you can set for the ANN query. This must be in JSON format.
The default value is NULL.
INDEX_METHODS: Optional: a text array that contains
different vector index methods for which you want to calculate the
recall. If you set an index method for which a corresponding index
doesn't exist, the recall is 1. The input must be a subset of
{scann, hnsw, ivf, ivfflat}. If no value is provided, the ScaNN
method is used.
To view differences between query recall and execution time, change the
query time parameters for your index.
The following table lists query time parameters for ScaNN, HNSW, and IVF/IVFFLAT
index methods. The parameters are formatted as
{"scann.num_leaves_to_search":1, "scann.pre_reordering_num_neighbors":10, "hnsw.ef_search": 1}.
For more information about ScaNN index methods, see
AlloyDB ScaNN Index reference.
For more information about HNSW and IVF/IVFFLAT index methods, see
pgvector.
Optional: You can also add configurations from pg_settings to the
QUERY_TIME_CONFIGURATIONS. For example, to run a query with columnar
engine scan enabled, add the following config from pg_settings as
{"google_columnar_engine.enable_columnar_scan" : on}.
The configurations are set locally in the function. Adding these
configurations doesn't impact the configurations that you set in your
session. If you don't set any configurations, AlloyDB uses
all of the configurations that you set in the session. You can also set
only those configurations that are best suited for your use case.
Optional: To view the default configuration settings, run the SHOW command
or view the pg_settings.
Optional: If you have a ScaNN index for which you want to tune the recall, see the
tuning parameters in ScaNN index reference.
The following is an example output, where ann_execution_time is the time
that it takes a vector query to execute using index scans.
ground_truth_execution_time is the time that it takes the query to run
using a sequential scan.
ann_execution_time and ground_truth_execution_time are different
from but directly dependent on Execution time in the query plan.
Execution time is the total time to execute the query from the client.
t=# SELECT * FROM evaluate_query_recall( $$ SELECT id FROM t1 ORDER BY val <=> '[1000,1000,49000]' LIMIT 10 $$, '{"scann.num_leaves_to_search":1, "scann.pre_reordering_num_neighbors":10, "hnsw.ef_search": 1}', ARRAY['scann', 'hnsw']);
NOTICE: Recall is 1. This might mean that the vector index is not present on the table or index scan not chosen during query execution.
id| query | configurations | recall |ann_execution_time | ground_truth_execution_time | index_type
----+-------------------------------------------------------------------+------------------------------------------------------------------------------------------------+--------+--------------------+-----------------------------+------------
1 | SELECT id FROM t1 ORDER BY val <=> '[1000,1000,49000]' LIMIT 10 | {"scann.num_leaves_to_search":1, "scann.pre_reordering_num_neighbors":10, "hnsw.ef_search": 1} | 0.5 | 4.23 | 118.211 | scann
2 | SELECT id FROM t1 ORDER BY val <=> '[1000,1000,49000]' LIMIT 10 | {"scann.num_leaves_to_search":1, "scann.pre_reordering_num_neighbors":10, "hnsw.ef_search": 1} | 1 | 107.198 | 118.211 | hnsw
(2 rows)
로 묶인 SQL 쿼리
결과가 Recall is 1 (쿼리 리콜이 1)인 경우 테이블에 벡터 색인이 없거나 쿼리 실행 중에 벡터 색인이 선택되지 않았음을 나타낼 수 있습니다. 이 상황은 테이블에 벡터 색인이 없거나 플래너가 벡터 색인 스캔을 선택하지 않는 경우에 발생합니다.
쿼리가 select id, name from table order by embedding <->'[1,2,3]' LIMIT 10;.이고 열 이름의 예상 값이 NULL인 경우 쿼리를 다음 중 하나로 변경합니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[[["\u003cp\u003eThis document explains how to measure vector query recall in AlloyDB for PostgreSQL, which is the percentage of true nearest neighbors returned by the index in a search.\u003c/p\u003e\n"],["\u003cp\u003eRecall is a metric used to evaluate the accuracy of an Approximate Nearest Neighbor (ANN) search compared to a slower K-Nearest Neighbor (KNN) search, by showing how relevant the results of the search are.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eevaluate_query_recall\u003c/code\u003e function can be used to measure the recall for vector queries on a ScaNN index, allowing you to tune parameters to achieve the desired vector query recall results, albeit that improved recall can decrease query speed.\u003c/p\u003e\n"],["\u003cp\u003eTo use the functions, you will need to first install the vector extension, version 0.8.0.google-2 or later, and the \u003ccode\u003ealloydb_scann\u003c/code\u003e extension to generate ScaNN indexes.\u003c/p\u003e\n"],["\u003cp\u003eWhen performing the query recall function, the evaluated recall may not be accurate if the vector column used in the query has an index other than ScaNN.\u003c/p\u003e\n"]]],[],null,["# Measure vector query recall\n\nThis page describes how to measure vector query recall in AlloyDB for PostgreSQL. In\nthe context of vector search, *recall* refers to the percentage of vectors\nthat the index returns which are true nearest neighbors. For example, if\na nearest neighbor query for the 20 nearest neighbors returns 19 of the ground\ntruth nearest neighbors, then the recall is 19/20x100 = 95%.\n\nIn a vector query, recall is important because it measures the percentage of\nrelevant results retrieved from a search. Recall helps you evaluate the accuracy\nof the results from an approximate nearest neighbor (ANN) search as compared to\nthe results from a k-nearest neighbors (KNN) search.\n\n*ANN* is an algorithm that finds data points similar to a\ngiven query point, and it improves speed by finding the approximate neighbors\nas opposed to actual neighbors. When you use ANN, you balance speed with recall.\n\n*KNN* is an algorithm that finds\nthe \"k\" most similar vectors to a given query vector within a dataset, based on\na similarity metric. *k* is the number of neighbors that you want the query to return.\n\nYou can measure the recall of your vector search query for different vector\nindexes, including the following:\n\n- Scalable Nearest Neighbors (ScaNN): an algorithm for efficient vector similarity search.\n- Hierarchical Navigable Small World (HNSW): a graph-based algorithm used for efficient approximate nearest neighbor search in vector databases.\n- Inverted File with Flat Compression (IVFFLAT) and Inverted File Flat (IVF): types of vector indexes that are used for ANN searches, particularly in databases like the PostgreSQL `pgvector` extension.\n\nThis page assumes that you're familiar with PostgreSQL, AlloyDB,\nand vector search.\n\nBefore you begin\n----------------\n\n1. Install or update the pgvector extension.\n\n 1. If the `pgvector` extension isn't installed, install the\n `vector` extension version `0.8.0.google-3` or later to store\n generated embeddings as `vector` values. The `vector` extension\n includes `pgvector` functions and operators. Google extends this\n version of `pgvector` with optimizations for AlloyDB.\n\n CREATE EXTENSION IF NOT EXISTS vector WITH VERSION '0.8.0.google-3';\n\n For more information, see [Store, index, and query vectors](/alloydb/docs/ai#store-index-query-vectors).\n 2. If the `pgvector` extension is already installed, upgrade the\n `vector` extension to version 0.8.0.google-3 or later to get recall\n evaluator capabilities.\n\n ALTER EXTENSION vector UPDATE TO '0.8.0.google-3';\n\n2. To create ScaNN indexes, install the `alloydb_scann` extension.\n\n CREATE EXTENSION IF NOT EXISTS alloydb_scann;\n\nEvaluate recall for vector queries on a vector index\n----------------------------------------------------\n\nYou can find the recall for a vector query on a vector index for a given\nconfiguration using the `evaluate_query_recall` function. This function lets you tune\nyour parameters to achieve the vector query recall results that you want.\n*Recall* is the metric used for search quality, and is defined as the\npercentage of the returned results that are objectively closest to the\nquery vectors. The `evaluate_query_recall` function is turned on by default.\n| **Note:** Improved recall can result in slower query execution (QPS).\n\n### Find the recall for a vector query\n\n1. Open a SQL editor in [AlloyDB Studio](/alloydb/docs/manage-data-using-studio) or open a [`psql` client](/alloydb/docs/connect-psql).\n2. [Create a ScaNN, HNSW, or IVFFLAT vector index](/alloydb/docs/ai/store-index-query-vectors?resource=scann).\n\n | **Note:** You can create indexes of different methods on a single vector column and compare the performance of your query using each of the created indexes. This is recommended for developmental workloads only.\n3. Ensure that the [`enable_indexscan` flag](https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-ENABLE-INDEXSCAN)\n is on. If the flag is off, no index scan is chosen and the recall for all\n indexes is 1.\n\n4. Run the `evaluate_query_recall`\n function, which takes in the query as a parameter and returns the\n following recall:\n\n SELECT * FROM evaluate_query_recall( \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eQUERY_TEXT\u003c/span\u003e\u003c/var\u003e, \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eQUERY_TIME_CONFIGURATIONS\u003c/span\u003e\u003c/var\u003e, \u003cvar translate=\"no\"\u003e\u003cspan class=\"devsite-syntax-n\"\u003eINDEX_METHODS\u003c/span\u003e\u003c/var\u003e )\n\n Before you run this command, make the following replacements:\n - \u003cvar translate=\"no\"\u003eQUERY_TEXT\u003c/var\u003e: the SQL query, enclosed in `$$`.\n - \u003cvar translate=\"no\"\u003eQUERY_TIME_CONFIGURATIONS\u003c/var\u003e: Optional: the configuration that you can set for the ANN query. This must be in JSON format. The default value is `NULL`.\n - \u003cvar translate=\"no\"\u003eINDEX_METHODS\u003c/var\u003e: Optional: a text array that contains\n different vector index methods for which you want to calculate the\n recall. If you set an index method for which a corresponding index\n doesn't exist, the recall is `1`. The input must be a subset of\n `{scann, hnsw, ivf, ivfflat}`. If no value is provided, the ScaNN\n method is used.\n\n To view differences between query recall and execution time, change the\n query time parameters for your index.\n\n The following table lists query time parameters for ScaNN, HNSW, and IVF/IVFFLAT\n index methods. The parameters are formatted as\n `{\"scann.num_leaves_to_search\":1, \"scann.pre_reordering_num_neighbors\":10, \"hnsw.ef_search\": 1}`.\n\n For more information about ScaNN index methods, see\n [AlloyDB ScaNN Index reference](/alloydb/docs/reference/ai/scann-index-reference).\n For more information about HNSW and IVF/IVFFLAT index methods, see\n [`pgvector`](https://github.com/pgvector/pgvector).\n5. Optional: You can also add configurations from `pg_settings` to the\n `QUERY_TIME_CONFIGURATIONS`. For example, to run a query with columnar\n engine scan enabled, add the following config from `pg_settings` as\n `{\"google_columnar_engine.enable_columnar_scan\" : on}`.\n\n The configurations are set locally in the function. Adding these\n configurations doesn't impact the configurations that you set in your\n session. If you don't set any configurations, AlloyDB uses\n all of the configurations that you set in the session. You can also set\n only those configurations that are best suited for your use case.\n6. Optional: To view the default configuration settings, run the `SHOW` command\n or view the `pg_settings`.\n\n7. Optional: If you have a ScaNN index for which you want to tune the recall, see the\n tuning parameters in [ScaNN index reference](/alloydb/docs/reference/ai/scann-index-reference).\n\n The following is an example output, where `ann_execution_time` is the time\n that it takes a vector query to execute using index scans.\n `ground_truth_execution_time` is the time that it takes the query to run\n using a sequential scan.\n\n `ann_execution_time` and `ground_truth_execution_time` are different\n from but directly dependent on **Execution time** in the query plan.\n Execution time is the total time to execute the query from the client. \n\n t=# SELECT * FROM evaluate_query_recall( $$ SELECT id FROM t1 ORDER BY val \u003c=\u003e '[1000,1000,49000]' LIMIT 10 $$, '{\"scann.num_leaves_to_search\":1, \"scann.pre_reordering_num_neighbors\":10, \"hnsw.ef_search\": 1}', ARRAY['scann', 'hnsw']);\n NOTICE: Recall is 1. This might mean that the vector index is not present on the table or index scan not chosen during query execution.\n id| query | configurations | recall |ann_execution_time | ground_truth_execution_time | index_type\n ----+-------------------------------------------------------------------+------------------------------------------------------------------------------------------------+--------+--------------------+-----------------------------+------------\n 1 | SELECT id FROM t1 ORDER BY val \u003c=\u003e '[1000,1000,49000]' LIMIT 10 | {\"scann.num_leaves_to_search\":1, \"scann.pre_reordering_num_neighbors\":10, \"hnsw.ef_search\": 1} | 0.5 | 4.23 | 118.211 | scann\n 2 | SELECT id FROM t1 ORDER BY val \u003c=\u003e '[1000,1000,49000]' LIMIT 10 | {\"scann.num_leaves_to_search\":1, \"scann.pre_reordering_num_neighbors\":10, \"hnsw.ef_search\": 1} | 1 | 107.198 | 118.211 | hnsw\n (2 rows)\n\n If the result is `Recall is 1` (recall of the query is\n `1`), this might indicate that the vector index isn't present on the table or\n that the vector index wasn't chosen during query execution. This situation occurs\n when no vector index exists on the table or when the planner doesn't choose the\n vector index scan.\n | **Note:** Don't use columns in queries whose value may result in NULL values. If those columns are required, add the `COALESCE` function to the query.\n\n If the query is `select id, name from table order by embedding \u003c-\u003e'[1,2,3]' LIMIT 10;.`\n and the expected value of the column name is `NULL`, then change the\n query to one of the following: \n\n select id, COALESCE(name, 'NULL') as name from table order by embedding \u003c-\u003e '[1,2,3]' LIMIT 10;\n\n Or \n\n select id from table order by embedding \u003c-\u003e '[1,2,3]' LIMIT 10;\n\nWhat's next\n-----------\n\n- [Create a ScaNN index](/alloydb/docs/ai/store-index-query-vectors?resource=scann).\n- [Store, index, and query vectors](/alloydb/docs/ai#store-index-query-vectors).\n- [Tune vector query performance](/alloydb/docs/ai/tune-indexes?resource=scann)."]]