In addition to full token matching, Spanner search indexes support substring searches. This page describes how to perform a substring search as part of a full-text search in Spanner.
Substring searches have the following characteristics:
- Case insensitive, discards most punctuation, and normalizes whitespace.
- No Chinese, Japanese, Korean (CJK) segmentation, since partial CJK queries often segment incorrectly.
- For multiple search terms, the result must contain a substring from each
term. For example, 'happ momen'matches"happy moment", because both substrings are found in the text. It doesn't match"happy day".
Examples
| Stored text | Substring query | Match | 
|---|---|---|
| Bridge over Troubled Water | ridg roub | Yes | 
| Bridge over Troubled Water | ridg , roub | Yes | 
| Bridge over Troubled Water | over brid | Yes | 
| Bridge over Troubled Water | ate bridge | Yes | 
| Bridge over Troubled Water | Bridge bridge bridge | Yes | 
| Bridge over Troubled Water | bri trou ter | Yes | 
| Bridge over Troubled Water | bri dge | Yes | 
| Bridge over Troubled Water | troubledwater | No | 
| Bridge over Troubled Water | trubled | No | 
For a substring search, use the TOKENIZE_SUBSTRING
function in the TOKENLIST column definition, as shown in the following DDL
example:
GoogleSQL
CREATE TABLE Albums (
AlbumId STRING(MAX) NOT NULL,
AlbumTitle STRING(MAX),
AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_SUBSTRING(AlbumTitle)) HIDDEN
) PRIMARY KEY(AlbumId);
PostgreSQL
This example uses
spanner.tokenize_substring.
CREATE TABLE albums (
albumid character varying NOT NULL,
albumtitle character varying,
albumtitle_tokens spanner.tokenlist
    GENERATED ALWAYS AS (spanner.tokenize_substring(albumtitle)) VIRTUAL HIDDEN,
PRIMARY KEY(albumid));
In the SQL query, use the SEARCH_SUBSTRING function in the WHERE clause. For
example, the following query matches an album with title "happy" from the table
created in the previous example:
GoogleSQL
SELECT Album
FROM Albums
WHERE SEARCH_SUBSTRING(AlbumTitle_Tokens, 'happ');
PostgreSQL
This example uses
spanner.search_substring.
SELECT album
FROM albums
WHERE spanner.search_substring(albumtitle_tokens, 'happ');
TOKENIZE_SUBSTRING generates n-grams
for each token and stores these n-grams in the search index. The minimum and
maximum length of n-grams to generate are configured through optional arguments.
Substring search indexes can use 10-30x more storage as full-text indexes over
the same data, because the tokenization produces a lot more tokens. This is
especially true if as the difference between ngram_size_min and
ngram_size_max grows. Substring queries also use more resources to execute.
Like TOKENIZE_FULLTEXT,
you can configure TOKENIZE_SUBSTRING to use specific types of content.
Enable a relative substring search
In addition to the basic substring search,
SEARCH_SUBSTRING
supports the relative search mode. A relative search refines substring search
results.
To enable the relative search mode, set the relative_search_types parameter of
TOKENIZE_SUBSTRING
 to a non-empty array with elements of supported relative search types.
When relative search is enabled in tokenization, SEARCH_SUBSTRING can perform
queries with the following relative search types:
- phrase: matches contiguous substrings- Examples - Stored text - Substring query. - Match - Bridge over Troubled Water - bridge over - Yes - Bridge over Troubled Water - Bridge bridge bridge - No - Bridge over Troubled Water - brid over - No - Bridge over Troubled Water - ridge over trouble - Yes - Bridge over Troubled Water - bridge ove troubled - No - Bridge over Troubled Water - idge ove - Yes - Bridge over Troubled Water - idge , ove - Yes - Bridge over Troubled Water - RIDGE OVE - Yes - Bridge over Troubled Water - bridge water - No 
- value_prefix: matches contiguous substrings and the match has to start at the beginning of the value. This is conceptually similar to the- STARTS_WITHfunction for case and whitespace normalized strings.- Examples - Stored text - Substring query - Match - Bridge over Troubled Water - bridge over - Yes - Bridge over Troubled Water - bridge , over - Yes - Bridge over Troubled Water - ridge over - No - Bridge over Troubled Water - troubled water - No 
- value_suffix: matches contiguous substrings and the match has to match at the end of the value. This is conceptually similar to the- ENDS_WITHfunction for case and whitespace normalized strings.- Examples - Stored text - Substring query. - Match - Bridge over Troubled Water - troubled water - Yes - Bridge over Troubled Water - troubled ; water - Yes - Bridge over Troubled Water - roubled water - Yes - Bridge over Troubled Water - troubled wate - No - Bridge over Troubled Water - trouble water - No - Bridge over Troubled Water - bridge over - No 
- word_prefix:like- value_prefix, but the string has to match at a term boundary (rather than a value boundary).- Examples - Stored text - Substring query - Match - Bridge over Troubled Water - over trouble - Yes - Bridge over Troubled Water - Over , trouble - Yes - Bridge over Troubled Water - troub water - No - Bridge over Troubled Water - over water - No - Bridge over Troubled Water - ove troubled - No - Bridge over Troubled Water - ver troubled - Yes 
- word_suffix: like- value_suffix, but the string has to match at the end of a term boundary.- Examples - Stored text - Substring query - Match - Bridge over Troubled Water - ver troubled - Yes - Bridge over Troubled Water - over trouble - No - Bridge over Troubled Water - over water - No - Bridge over Troubled Water - ove troubled - No 
What's next
- Learn about full-text search queries.
- Learn how to rank search results.
- Learn how to paginate search results.
- Learn how to mix full-text and non-text queries.
- Learn how to search multiple columns.