[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-05 (世界標準時間)。"],[],[],null,["# Find approximate matches with fuzzy search\n\n\u003cbr /\u003e\n\n\n| **Note:** This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see the [Spanner editions overview](/spanner/docs/editions-overview).\n\n\u003cbr /\u003e\n\nThis page describes how to use a fuzzy search as part of a\n[full-text search](/spanner/docs/full-text-search).\n\nIn addition to performing exact token searches using the\n[`SEARCH`](/spanner/docs/reference/standard-sql/search_functions#search_fulltext)\nand\n[`SEARCH_SUBSTRING`](/spanner/docs/reference/standard-sql/search_functions#search_substring)\nfunctions, Spanner also supports approximate (or fuzzy) searches. Fuzzy\nsearches find matching documents despite small differences between the query and\nthe document.\n\nSpanner supports the following types of fuzzy search:\n\n- N-grams-based approximate search\n- Phonetic search using [Soundex](https://en.wikipedia.org/wiki/Soundex)\n\nUse an n-grams-based approximate search\n---------------------------------------\n\nN-grams-based fuzzy search relies on the same\nsubstring tokenization that a\n[substring search](/spanner/docs/full-text-search/substring-search)\nrequires. The configuration of the tokenizer is important as it affects\nsearch quality and performance. The following example shows how to create a\nquery with misspelled or differently spelled words to find approximate matches\nin the search index.\n\n**Schema** \n\n### GoogleSQL\n\n CREATE TABLE Albums (\n AlbumId STRING(MAX) NOT NULL,\n AlbumTitle STRING(MAX),\n AlbumTitle_Tokens TOKENLIST AS (\n TOKENIZE_SUBSTRING(AlbumTitle, ngram_size_min=\u003e2, ngram_size_max=\u003e3,\n relative_search_types=\u003e[\"word_prefix\", \"word_suffix\"])) HIDDEN\n ) PRIMARY KEY(AlbumId);\n\n CREATE SEARCH INDEX AlbumsIndex\n ON Albums(AlbumTitle_Tokens)\n STORING (AlbumTitle);\n\n### PostgreSQL\n\nThis example uses\n[`spanner.tokenize_substring`](/spanner/docs/reference/postgresql/functions-and-operators#search_functions). \n\n CREATE TABLE albums (\n albumid character varying NOT NULL,\n albumtitle character varying,\n albumtitle_tokens spanner.tokenlist GENERATED ALWAYS AS (\n spanner.tokenize_substring(albumtitle, ngram_size_min=\u003e2, ngram_size_max=\u003e3,\n relative_search_types=\u003e'{word_prefix, word_suffix}'::text[])) VIRTUAL HIDDEN,\n PRIMARY KEY(albumid));\n\n CREATE SEARCH INDEX albumsindex\n ON albums(albumtitle_tokens)\n INCLUDE (albumtitle);\n\n**Query**\n\nThe following query finds the albums with titles that are the closest to\n\"Hatel Kaliphorn\", such as \"Hotel California\". \n\n### GoogleSQL\n\n SELECT AlbumId\n FROM Albums\n WHERE SEARCH_NGRAMS(AlbumTitle_Tokens, \"Hatel Kaliphorn\")\n ORDER BY SCORE_NGRAMS(AlbumTitle_Tokens, \"Hatel Kaliphorn\") DESC\n LIMIT 10\n\n### PostgreSQL\n\nThis examples uses\n[`spanner.score_ngrams`](/spanner/docs/reference/postgresql/functions-and-operators#search_functions)\nand\n[`spanner.search_ngrams`](/spanner/docs/reference/postgresql/functions-and-operators#search_functions). \n\n SELECT albumid\n FROM albums\n WHERE spanner.search_ngrams(albumtitle_tokens, 'Hatel Kaliphorn')\n ORDER BY spanner.score_ngrams(albumtitle_tokens, 'Hatel Kaliphorn') DESC\n LIMIT 10\n\n### Optimize performance and recall for an n-grams-based approximate search\n\nThe sample query in the previous section searches in two phases, using two\ndifferent functions:\n\n1. [`SEARCH_NGRAMS`](/spanner/docs/reference/postgresql/functions-and-operators#search_functions) finds all candidate albums that have shared n-grams with the search query. For example, three-character n-grams for \"California\" include `[cal, ali,\n lif, ifo, for, orn, rni, nia]` and for \"Kaliphorn\" include `[kal, ali, lip,\n iph, pho, hor, orn]`. The shared n-grams in these data sets are `[ali,\n orn]`. By default, `SEARCH_NGRAMS` matches all documents with at least two shared n-grams, therefore \"Kaliphorn\" matches \"California\".\n2. [`SCORE_NGRAMS`](/spanner/docs/reference/standard-sql/search_functions#score_ngrams) ranks matches by similarity. The similarity of two strings is defined as a ratio of distinct shared n-grams to distinct non-shared n-grams:\n\n$$ \\\\frac{shared\\\\_ngrams}{total\\\\_ngrams_{index} + total\\\\_ngrams_{query} - shared\\\\_ngrams} $$\n\nUsually the search query is the same across both the `SEARCH_NGRAMS` and\n`SCORE_NGRAMS` functions. The recommended way to do this is to use the argument\nwith\n[query parameters](/spanner/docs/reference/standard-sql/lexical#query_parameters)\nrather than with string literals, and specify the same query parameter in the\n`SEARCH_NGRAMS` and `SCORE_NGRAMS` functions.\n\nSpanner has three configuration arguments that can be used with\n`SEARCH_NGRAMS`:\n\n- The minimum and maximum sizes for n-grams are specified with the `TOKENIZE_SUBSTRING`(/spanner/docs/reference/standard-sql/search_functions#tokenize_substring) or [`TOKENIZE_NGRAMS`](/spanner/docs/reference/standard-sql/search_functions#tokenize_ngrams) functions. We don't recommend one character n-grams because they could match a very large number of documents. On the other hand, long n-grams cause `SEARCH_NGRAMS` to miss short misspelled words.\n- The minimum number of n-grams that `SEARCH_NGRAMS` must match (set with the `min_ngrams` and `min_ngrams_percent` arguments in `SEARCH_NGRAMS`). Higher numbers typically make the query faster, but reduce recall.\n\nIn order to achieve a good balance between performance and recall, you can\nconfigure these arguments to fit the specific query and workload.\n\nWe also recommend including an inner `LIMIT` to avoid creating very expensive\nqueries when a combination of popular n-grams is encountered. \n\n### GoogleSQL\n\n SELECT AlbumId\n FROM (\n SELECT AlbumId,\n SCORE_NGRAMS(AlbumTitle_Tokens, @p) AS score\n FROM Albums\n WHERE SEARCH_NGRAMS(AlbumTitle_Tokens, @p)\n LIMIT 10000 # inner limit\n )\n ORDER BY score DESC\n LIMIT 10 # outer limit\n\n### PostgreSQL\n\nThis example uses query parameter `$1` which is bound to 'Hatel Kaliphorn'. \n\n SELECT albumid\n FROM\n (\n SELECT albumid, spanner.score_ngrams(albumtitle_tokens, $1) AS score\n FROM albums\n WHERE spanner.search_ngrams(albumtitle_tokens, $1)\n LIMIT 10000\n ) AS inner_query\n ORDER BY inner_query.score DESC\n LIMIT 10\n\n### N-grams-based fuzzy search versus enhanced query mode\n\nAlongside n-grams-based fuzzy search, the\n[enhanced query mode](/spanner/docs/full-text-search/query-overview#enhanced_query_mode)\nalso handles some misspelled words. Thus, there is some overlap between the two\nfeatures. The following table summarizes the differences:\n\nPerform a phonetic search with Soundex\n--------------------------------------\n\nSpanner provides the\n[`SOUNDEX`](/spanner/docs/reference/standard-sql/string_functions#soundex)\nfunction for finding words that are spelled differently, but sound the same. For\nexample, `SOUNDEX(\"steven\")`, `SOUNDEX(\"stephen\")` and`SOUNDEX(\"stefan\")` are\nall \"s315\", while `SOUNDEX(\"stella\")` is \"s340\". `SOUNDEX` is case sensitive and\nonly works for Latin-based alphabets.\n\nPhonetic search with `SOUNDEX` can be implemented with a generated column and a\nsearch index as shown in the following example: \n\n### GoogleSQL\n\n CREATE TABLE Singers (\n SingerId INT64,\n AlbumTitle STRING(MAX),\n AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumTitle)) HIDDEN,\n Name STRING(MAX),\n NameSoundex STRING(MAX) AS (LOWER(SOUNDEX(Name))),\n NameSoundex_Tokens TOKENLIST AS (TOKEN(NameSoundex)) HIDDEN\n ) PRIMARY KEY(SingerId);\n\n CREATE SEARCH INDEX SingersPhoneticIndex ON Singers(AlbumTitle_Tokens, NameSoundex_Tokens);\n\n### PostgreSQL\n\nThis example uses\n[`spanner.soundex`](/spanner/docs/reference/postgresql/functions-and-operators#string_functions). \n\n CREATE TABLE singers (\n singerid bigint,\n albumtitle character varying,\n albumtitle_tokens spanner.tokenlist GENERATED ALWAYS AS (spanner.tokenize_fulltext(albumtitle)) VIRTUAL HIDDEN,\n name character varying,\n namesoundex character varying GENERATED ALWAYS AS (lower(spanner.soundex(name))) VIRTUAL,\n namesoundex_tokens spanner.tokenlist GENERATED ALWAYS AS (spanner.token(lower(spanner.soundex(name))) VIRTUAL HIDDEN,\n PRIMARY KEY(singerid));\n\n CREATE SEARCH INDEX singersphoneticindex ON singers(albumtitle_tokens, namesoundex_tokens);\n\nThe following query matches \"stefan\" to \"Steven\" on `SOUNDEX`, along with\n`AlbumTitle` containing \"cat\": \n\n### GoogleSQL\n\n SELECT SingerId\n FROM Singers\n WHERE NameSoundex = LOWER(SOUNDEX(\"stefan\")) AND SEARCH(AlbumTitle_Tokens, \"cat\")\n\n### PostgreSQL\n\n SELECT singerid\n FROM singers\n WHERE namesoundex = lower(spanner.soundex('stefan')) AND spanner.search(albumtitle_tokens, 'cat')\n\n\u003cbr /\u003e\n\nWhat's next\n-----------\n\n- Learn about [tokenization and Spanner tokenizers](/spanner/docs/full-text-search/tokenization).\n- Learn about [search indexes](/spanner/docs/full-text-search/search-indexes).\n- Learn about [full-text search queries](/spanner/docs/full-text-search/query-overview)."]]