如果数据集较小,您可以使用 K 最近邻 (KNN) 来查找精确的 k 最近邻向量。不过,随着数据集的增大,KNN 搜索的延迟时间和费用也会增加。您可以使用 ANN 查找近似 k 最近邻,从而显著缩短延迟时间、降低成本。
在 ANN 搜索中,返回的 k 向量并非真正的 Top-k 最近邻,因为 ANN 搜索会计算近似距离,并且可能不会查看数据集中的所有向量。有时,系统会返回不在前 k 个最近邻中的几个向量。这称为“召回率损失”。您可以接受的召回率损失程度取决于具体应用场景,但在大多数情况下,牺牲一点召回率来换取数据库性能的提升是可接受的权衡。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-05。"],[],[],null,["| **PostgreSQL interface note:** The examples in this topic are intended for GoogleSQL-dialect databases. This feature doesn't support PostgreSQL interface.\n\n\u003cbr /\u003e\n\n\n| **Note:** This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see the [Spanner editions overview](/spanner/docs/editions-overview).\n\n\u003cbr /\u003e\n\nThis page describes how to find approximate nearest neighbors (ANN) and query\nvector embeddings using the ANN distance functions.\n\nWhen a dataset is small, you can use [K-nearest neighbors (KNN)](/spanner/docs/find-k-nearest-neighbors)\nto find the exact k-nearest vectors. However, as your dataset grows, the latency\nand cost of a KNN search also increase. You can use ANN to find the approximate\nk-nearest neighbors with significantly reduced latency and cost.\n\nIn an ANN search, the k-returned vectors aren't the true top k-nearest\nneighbors because the ANN search calculates approximate distances and might not\nlook at all the vectors in the dataset. Occasionally, a few vectors that aren't\namong the top k-nearest neighbors are returned. This is known as *recall loss*.\nHow much recall loss is acceptable to you depends on the use case, but in most\ncases, losing a bit of recall in return for improved database performance is an\nacceptable tradeoff.\n\nFor more details about the approximate distance functions supported in\nSpanner, see the following GoogleSQL reference pages:\n\n- [`APPROX_COSINE_DISTANCE`](/spanner/docs/reference/standard-sql/mathematical_functions#approx_cosine_distance)\n- [`APPROX_EUCLIDEAN_DISTANCE`](/spanner/docs/reference/standard-sql/mathematical_functions#approx_euclidean_distance)\n- [`APPROX_DOT_PRODUCT`](/spanner/docs/reference/standard-sql/mathematical_functions#approx_dot_product)\n\nQuery vector embeddings\n\nSpanner accelerates approximate nearest neighbor (ANN) vector\nsearches by using a [vector index](/spanner/docs/vector-indexes). You can use a\nvector index to query vector embeddings. To query vector embeddings, you must\nfirst [create a vector index](/spanner/docs/vector-indexes#create-vector-index).\nYou can then use any one of the three approximate distance functions to find the\nANN.\n\nRestrictions when using the approximate distance functions include the\nfollowing:\n\n- The approximate distance function must calculate the distance between an embedding column and a constant expression (for example, a parameter or a literal).\n- The approximate distance function output must be used in a `ORDER BY` clause as the sole sort key, and a `LIMIT` must be specified after the `ORDER BY`.\n- The query must explicitly filter out rows that aren't indexed. In most cases, this means that the query must include a `WHERE \u003ccolumn_name\u003e IS NOT NULL` clause that matches the vector index definition, unless the column is already marked as `NOT NULL` in the table definition.\n\nFor a detailed list of limitations, see the\n[approximate distance function reference page](/spanner/docs/reference/standard-sql/mathematical_functions).\n\n**Examples**\n\nConsider a `Documents` table that has a `DocEmbedding` column of precomputed\ntext embeddings from the `DocContents` bytes column, and a\n`NullableDocEmbedding` column populated from other sources that might be null. \n\n CREATE TABLE Documents (\n UserId INT64 NOT NULL,\n DocId INT64 NOT NULL,\n Author STRING(1024),\n DocContents BYTES(MAX),\n DocEmbedding ARRAY\u003cFLOAT32\u003e NOT NULL,\n NullableDocEmbedding ARRAY\u003cFLOAT32\u003e,\n WordCount INT64\n ) PRIMARY KEY (UserId, DocId);\n\nTo search for the nearest 100 vectors to `[1.0, 2.0, 3.0]`: \n\n SELECT DocId\n FROM Documents\n WHERE WordCount \u003e 1000\n ORDER BY APPROX_EUCLIDEAN_DISTANCE(\n ARRAY\u003cFLOAT32\u003e[1.0, 2.0, 3.0], DocEmbedding,\n options =\u003e JSON '{\"num_leaves_to_search\": 10}')\n LIMIT 100\n\nIf the embedding column is nullable: \n\n SELECT DocId\n FROM Documents\n WHERE NullableDocEmbedding IS NOT NULL AND WordCount \u003e 1000\n ORDER BY APPROX_EUCLIDEAN_DISTANCE(\n ARRAY\u003cFLOAT32\u003e[1.0, 2.0, 3.0], NullableDocEmbedding,\n options =\u003e JSON '{\"num_leaves_to_search\": 10}')\n LIMIT 100\n\nWhat's next\n\n- Learn more about Spanner [vector indexes](/spanner/docs/vector-indexes).\n\n- Learn more about the [GoogleSQL `APPROXIMATE_COSINE_DISTANCE()`, `APPROXIMATE_EUCLIDEAN_DISTANCE()`, `APPROXIMATE_DOT_PRODUCT()`](/spanner/docs/reference/standard-sql/mathematical_functions) functions.\n\n- Learn more about the [GoogleSQL `VECTOR INDEX` statements](/spanner/docs/reference/standard-sql/data-definition-language#vector_index_statements).\n\n- Learn more about [vector index best practices](/spanner/docs/vector-index-best-practices).\n\n- Try the [Getting started with Spanner Vector Search](https://codelabs.developers.google.com/codelabs/spanner-getting-started-vector-search)\n for a step-by-step example of using ANN."]]