In addition to full token matching, Spanner search indexes support substring searches. This page describes how to perform a substring search as part of a full-text search in Spanner.
Substring searches have the following characteristics:
- Case insensitive, discards most punctuation, and normalizes whitespace.
- No Chinese, Japanese, Korean (CJK) segmentation, since partial CJK queries often segment incorrectly.
- For multiple search terms, the result must contain a substring from each
term. For example,
'happ momen'
matches"happy moment"
, because both substrings are found in the text. It doesn't match"happy day"
.
Examples
Stored text | Substring query | Match |
---|---|---|
Bridge over Troubled Water | ridg roub | Yes |
Bridge over Troubled Water | ridg , roub | Yes |
Bridge over Troubled Water | over brid | Yes |
Bridge over Troubled Water | ate bridge | Yes |
Bridge over Troubled Water | Bridge bridge bridge | Yes |
Bridge over Troubled Water | bri trou ter | Yes |
Bridge over Troubled Water | bri dge | Yes |
Bridge over Troubled Water | troubledwater | No |
Bridge over Troubled Water | trubled | No |
For a substring search, use the TOKENIZE_SUBSTRING
function in the TOKENLIST
column definition, as shown in the following DDL example:
CREATE TABLE Albums (
AlbumId STRING(MAX) NOT NULL,
AlbumTitle STRING(MAX),
AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_SUBSTRING(AlbumTitle)) HIDDEN
) PRIMARY KEY(AlbumId);
In the SQL query, use the SEARCH_SUBSTRING
function in the WHERE
clause. For
example, the following query matches an album with title "happy" from the table
created in the previous example:
SELECT Album
FROM Albums
WHERE SEARCH_SUBSTRING(AlbumTitle_Tokens, 'happ');
TOKENIZE_SUBSTRING
generates n-grams
for each token and stores these n-grams in the search index. The minimum and
maximum length of n-grams to generate are configured through optional arguments.
Substring search indexes can use 10-30x more storage as full-text indexes over
the same data, because the tokenization produces a lot more tokens. This is
especially true if as the difference between ngram_size_min
and
ngram_size_max
grows. Substring queries also use more resources to execute.
Like TOKENIZE_FULLTEXT
,
you can configure TOKENIZE_SUBSTRING
to use specific types of content.
Enable a relative substring search
In addition to the basic substring search,
SEARCH_SUBSTRING
supports the relative search mode. A relative search refines substring search
results.
To enable the relative search mode, set the
TOKENIZE_SUBSTRING
relative_search_types
parameter to TRUE.
When relative search is enabled in tokenization, SEARCH_SUBSTRING
can perform
queries with the following relative search types:
phrase
: matches contiguous substringsExamples
Stored text Substring query. Match Bridge over Troubled Water bridge over Yes Bridge over Troubled Water Bridge bridge bridge No Bridge over Troubled Water brid over No Bridge over Troubled Water ridge over trouble Yes Bridge over Troubled Water bridge ove troubled No Bridge over Troubled Water idge ove Yes Bridge over Troubled Water idge , ove Yes Bridge over Troubled Water RIDGE OVE Yes Bridge over Troubled Water bridge water No value_prefix
: matches contiguous substrings and the match has to start at the beginning of the value. This is conceptually similar to theSTARTS_WITH
function for case and whitespace normalized strings.Examples
Stored text Substring query Match Bridge over Troubled Water bridge over Yes Bridge over Troubled Water bridge , over Yes Bridge over Troubled Water ridge over No Bridge over Troubled Water troubled water No value_suffix
: matches contiguous substrings and the match has to match at the end of the value. This is conceptually similar to theENDS_WITH
function for case and whitespace normalized strings.Examples
Stored text Substring query. Match Bridge over Troubled Water troubled water Yes Bridge over Troubled Water troubled ; water Yes Bridge over Troubled Water roubled water Yes Bridge over Troubled Water troubled wate No Bridge over Troubled Water trouble water No Bridge over Troubled Water bridge over No word_prefix:
likevalue_prefix
, but the string has to match at a term boundary (rather than a value boundary).Examples
Stored text Substring query Match Bridge over Troubled Water over trouble Yes Bridge over Troubled Water Over , trouble Yes Bridge over Troubled Water troub water No Bridge over Troubled Water over water No Bridge over Troubled Water ove troubled No Bridge over Troubled Water ver troubled Yes word_suffix
: likevalue_suffix
, but the string has to match at the end of a term boundary.Examples
Stored text Substring query Match Bridge over Troubled Water ver troubled Yes Bridge over Troubled Water over trouble No Bridge over Troubled Water over water No Bridge over Troubled Water ove troubled No
What's next
- Learn about full-text search queries.
- Learn how to rank search results.
- Learn how to paginate search results.
- Learn how to mix full-text and non-text queries.
- Learn how to search multiple columns.