如果資料表定義中未將嵌入資料欄標示為 NOT NULL,您必須在向量索引定義中,使用 WHERE COLUMN_NAME IS NOT NULL 子句宣告該資料欄,其中 COLUMN_NAME 是嵌入資料欄的名稱。如要使用餘弦距離,在可為空值的嵌入項目資料欄 NullableDocEmbedding 上建立含有三層樹狀結構和 1000000 個葉節點的向量索引,請執行下列操作:
[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-05 (世界標準時間)。"],[],[],null,["| **PostgreSQL interface note:** The examples in this topic are intended for GoogleSQL-dialect databases. This feature doesn't support PostgreSQL interface.\n\n\u003cbr /\u003e\n\n\n| **Note:** This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see the [Spanner editions overview](/spanner/docs/editions-overview).\n\n\u003cbr /\u003e\n\nThis page explains how to create and manage Spanner vector indexes,\nwhich use approximate nearest neighbor (ANN) search and tree-based structures to\naccelerate vector similarity searches on your data.\n\nSpanner accelerates approximate nearest neighbor (ANN) vector\nsearches by using a specialized vector index. This index leverages Google\nResearch's [Scalable Nearest Neighbor (ScaNN)](https://github.com/google-research/google-research/tree/master/scann),\na highly efficient nearest neighbor algorithm.\n\nThe vector index uses a tree-based structure to partition data and facilitate\nfaster searches. Spanner offers both two-level and three-level\ntree configurations:\n\n- Two-level tree configuration: Leaf nodes (`num_leaves`) contain groups of closely related vectors along with their corresponding centroid. The root level consists of the centroids from all leaf nodes.\n- Three-level tree configuration: Similar in concept to a two-level tree, while introducing an additional branch layer (`num_branches`), from which leaf node centroids are further partitioned to form the root level (`num_leaves`).\n\nSpanner picks an index for you. However, if you know that a\nspecific index works best, then you can use the [`FORCE_INDEX` hint](/spanner/docs/secondary-indexes#index-directive)\nto choose to use the most appropriate vector index for your use case.\n\nFor more information, see [`VECTOR INDEX` statements](/spanner/docs/reference/standard-sql/data-definition-language#vector_index_statements).\n\nLimitations\n\n- You can't pre-split vector indexes. For more information, see [Pre-splitting overview](/spanner/docs/pre-splitting-overview#limitations).\n\nCreate vector index\n\nTo optimize the recall and performance of a vector index, we recommend that you:\n\n- Create your vector index after most of the rows with embeddings are\n written to your database. You might also need to periodically\n rebuild the vector index after you insert new data. For more information, see\n [Rebuild the vector index](/spanner/docs/vector-index-best-practices#rebuild).\n\n- Use the `STORING` clause to store a copy of a column in the vector index. If a\n column value is stored in the vector index, then Spanner\n performs filtering at the index's leaf level to improve query\n performance. We recommend that you store a column if it's used in a filtering\n condition. For more information about using `STORING` in an index, see\n [Create an index for index-only scans](/spanner/docs/secondary-indexes#storing-clause).\n\nWhen you create your table, the embedding column must be an array of the\n`FLOAT32` (recommended) or `FLOAT64` data type, and have a *vector_length*\nannotation, indicating the dimension of the vectors.\n\nThe following DDL statement creates a `Documents` table with an embedding\ncolumn `DocEmbedding` with a vector length: \n\n CREATE TABLE Documents (\n UserId INT64 NOT NULL,\n DocId INT64 NOT NULL,\n Author STRING (1024),\n DocContents Bytes(MAX),\n DocEmbedding ARRAY\u003cFLOAT32\u003e(vector_length=\u003e128) NOT NULL,\n NullableDocEmbedding ARRAY\u003cFLOAT32\u003e(vector_length=\u003e128),\n WordCount INT64,\n ) PRIMARY KEY (DocId);\n\nAfter you populate your `Documents` table, you can create a vector index with a\ntwo-level tree and 1000 leaf nodes on the `Documents` table with an embedding\ncolumn `DocEmbedding` using the cosine distance: \n\n CREATE VECTOR INDEX DocEmbeddingIndex\n ON Documents(DocEmbedding)\n STORING (WordCount)\n OPTIONS (distance_type = 'COSINE', tree_depth = 2, num_leaves = 1000);\n\nIf your embedding column isn't marked as `NOT NULL` in the table definition, you\nmust declare it with a `WHERE COLUMN_NAME IS NOT NULL` clause in the vector\nindex definition, where `COLUMN_NAME` is the name of your embedding column. To\ncreate a vector index with a three-level tree and 1000000 leaf nodes on the\nnullable embedding column `NullableDocEmbedding` using the cosine distance: \n\n CREATE VECTOR INDEX DocEmbeddingThreeLevelIndex\n ON Documents(NullableDocEmbedding)\n STORING (WordCount)\n WHERE NullableDocEmbedding IS NOT NULL\n OPTIONS (distance_type = 'COSINE', tree_depth = 3, num_branches=1000, num_leaves = 1000000);\n\nFilter a vector index\n\nYou can also create a filtered vector index to find the most similar items in\nyour database that match the filter condition. A filtered vector index\nselectively indexes rows that satisfy the specified filter conditions, improving\nsearch performance.\n\nIn the following example, the table `Documents2` has a column called `Category`.\nIn our vector search, we want to index the \"Tech\" category so we create a\ngenerated column that evaluates to `NULL` if the category condition isn't met. \n\n CREATE TABLE Documents2 (\n DocId INT64 NOT NULL,\n Category STRING(MAX),\n NullIfFiltered BOOL AS (IF(Category = 'Tech', TRUE, NULL)) HIDDEN,\n DocEmbedding ARRAY\u003cFLOAT32\u003e(vector_length=\u003e128),\n ) PRIMARY KEY (DocId);\n\nThen, we create a vector index with a filter. The `TechDocEmbeddingIndex` vector\nindex only indexes documents in the \"Tech\" category. \n\n CREATE VECTOR INDEX TechDocEmbeddingIndex\n ON Documents2(DocEmbedding)\n STORING(NullIfFiltered)\n WHERE DocEmbedding IS NOT NULL AND NullIfFiltered IS NOT NULL\n OPTIONS (...);\n\nWhen Spanner runs the following query, which has filters that\nmatch the `TechDocEmbeddingIndex`, it automatically picks and is accelerated by\n`TechDocEmbeddingIndex`. The query only searches documents in the \"Tech\"\ncategory. You can also use `{@FORCE_INDEX=TechDocEmbeddingIndex}` to force\nSpanner to use `TechDocEmbeddingIndex` explicitly. \n\n SELECT *\n FROM Documents2\n WHERE DocEmbedding IS NOT NULL AND NullIfFiltered IS NOT NULL\n ORDER BY APPROX_(....)\n LIMIT 10;\n\n| **Note:** In this query, if you replace `NullIfFiltered IS NOT NULL` with `Category = 'Tech'`, then the query won't match the vector index `TechDocEmbeddingIndex`.\n\nWhat's next\n\n- Learn more about Spanner [approximate nearest neighbors](/spanner/docs/find-approximate-nearest-neighbors).\n\n- Learn more about the [GoogleSQL `APPROXIMATE_COSINE_DISTANCE()`, `APPROXIMATE_EUCLIDEAN_DISTANCE()`, `APPROXIMATE_DOT_PRODUCT()`](/spanner/docs/reference/standard-sql/mathematical_functions) functions.\n\n- Learn more about the [GoogleSQL `VECTOR INDEX` statements](/spanner/docs/reference/standard-sql/data-definition-language#vector_index_statements).\n\n- Learn more about [vector index best practices](/spanner/docs/vector-index-best-practices)."]]