如果資料集很小,可以使用 K 近鄰 (KNN) 找出精確的 k 個最鄰近向量。不過,隨著資料集擴大,KNN 搜尋的延遲時間和費用也會增加。您可以使用 ANN 找出近似的 k 個最鄰近項目,大幅降低延遲和成本。
在 ANN 搜尋中,傳回的 k 個向量並非真正的 k 個最鄰近項目,因為 ANN 搜尋會計算近似距離,且可能不會查看資料集中的所有向量。有時會傳回幾個不在前 k 個最鄰近項目中的向量。這就是所謂的「召回率損失」。
可接受的喚回度損失量取決於應用實例,但在大多數情況下,為了提升資料庫效能而損失部分喚回度,是可接受的取捨。
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-05 (世界標準時間)。"],[],[],null,["| **PostgreSQL interface note:** The examples in this topic are intended for GoogleSQL-dialect databases. This feature doesn't support PostgreSQL interface.\n\n\u003cbr /\u003e\n\n\n| **Note:** This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see the [Spanner editions overview](/spanner/docs/editions-overview).\n\n\u003cbr /\u003e\n\nThis page describes how to find approximate nearest neighbors (ANN) and query\nvector embeddings using the ANN distance functions.\n\nWhen a dataset is small, you can use [K-nearest neighbors (KNN)](/spanner/docs/find-k-nearest-neighbors)\nto find the exact k-nearest vectors. However, as your dataset grows, the latency\nand cost of a KNN search also increase. You can use ANN to find the approximate\nk-nearest neighbors with significantly reduced latency and cost.\n\nIn an ANN search, the k-returned vectors aren't the true top k-nearest\nneighbors because the ANN search calculates approximate distances and might not\nlook at all the vectors in the dataset. Occasionally, a few vectors that aren't\namong the top k-nearest neighbors are returned. This is known as *recall loss*.\nHow much recall loss is acceptable to you depends on the use case, but in most\ncases, losing a bit of recall in return for improved database performance is an\nacceptable tradeoff.\n\nFor more details about the approximate distance functions supported in\nSpanner, see the following GoogleSQL reference pages:\n\n- [`APPROX_COSINE_DISTANCE`](/spanner/docs/reference/standard-sql/mathematical_functions#approx_cosine_distance)\n- [`APPROX_EUCLIDEAN_DISTANCE`](/spanner/docs/reference/standard-sql/mathematical_functions#approx_euclidean_distance)\n- [`APPROX_DOT_PRODUCT`](/spanner/docs/reference/standard-sql/mathematical_functions#approx_dot_product)\n\nQuery vector embeddings\n\nSpanner accelerates approximate nearest neighbor (ANN) vector\nsearches by using a [vector index](/spanner/docs/vector-indexes). You can use a\nvector index to query vector embeddings. To query vector embeddings, you must\nfirst [create a vector index](/spanner/docs/vector-indexes#create-vector-index).\nYou can then use any one of the three approximate distance functions to find the\nANN.\n\nRestrictions when using the approximate distance functions include the\nfollowing:\n\n- The approximate distance function must calculate the distance between an embedding column and a constant expression (for example, a parameter or a literal).\n- The approximate distance function output must be used in a `ORDER BY` clause as the sole sort key, and a `LIMIT` must be specified after the `ORDER BY`.\n- The query must explicitly filter out rows that aren't indexed. In most cases, this means that the query must include a `WHERE \u003ccolumn_name\u003e IS NOT NULL` clause that matches the vector index definition, unless the column is already marked as `NOT NULL` in the table definition.\n\nFor a detailed list of limitations, see the\n[approximate distance function reference page](/spanner/docs/reference/standard-sql/mathematical_functions).\n\n**Examples**\n\nConsider a `Documents` table that has a `DocEmbedding` column of precomputed\ntext embeddings from the `DocContents` bytes column, and a\n`NullableDocEmbedding` column populated from other sources that might be null. \n\n CREATE TABLE Documents (\n UserId INT64 NOT NULL,\n DocId INT64 NOT NULL,\n Author STRING(1024),\n DocContents BYTES(MAX),\n DocEmbedding ARRAY\u003cFLOAT32\u003e NOT NULL,\n NullableDocEmbedding ARRAY\u003cFLOAT32\u003e,\n WordCount INT64\n ) PRIMARY KEY (UserId, DocId);\n\nTo search for the nearest 100 vectors to `[1.0, 2.0, 3.0]`: \n\n SELECT DocId\n FROM Documents\n WHERE WordCount \u003e 1000\n ORDER BY APPROX_EUCLIDEAN_DISTANCE(\n ARRAY\u003cFLOAT32\u003e[1.0, 2.0, 3.0], DocEmbedding,\n options =\u003e JSON '{\"num_leaves_to_search\": 10}')\n LIMIT 100\n\nIf the embedding column is nullable: \n\n SELECT DocId\n FROM Documents\n WHERE NullableDocEmbedding IS NOT NULL AND WordCount \u003e 1000\n ORDER BY APPROX_EUCLIDEAN_DISTANCE(\n ARRAY\u003cFLOAT32\u003e[1.0, 2.0, 3.0], NullableDocEmbedding,\n options =\u003e JSON '{\"num_leaves_to_search\": 10}')\n LIMIT 100\n\nWhat's next\n\n- Learn more about Spanner [vector indexes](/spanner/docs/vector-indexes).\n\n- Learn more about the [GoogleSQL `APPROXIMATE_COSINE_DISTANCE()`, `APPROXIMATE_EUCLIDEAN_DISTANCE()`, `APPROXIMATE_DOT_PRODUCT()`](/spanner/docs/reference/standard-sql/mathematical_functions) functions.\n\n- Learn more about the [GoogleSQL `VECTOR INDEX` statements](/spanner/docs/reference/standard-sql/data-definition-language#vector_index_statements).\n\n- Learn more about [vector index best practices](/spanner/docs/vector-index-best-practices).\n\n- Try the [Getting started with Spanner Vector Search](https://codelabs.developers.google.com/codelabs/spanner-getting-started-vector-search)\n for a step-by-step example of using ANN."]]