This page describes the SEARCH
function and the enhanced query mode, which are
used to perform full-text search
queries on Spanner tables.
Query a search index
Spanner provides the
SEARCH
function for search index queries. The SEARCH
function requires two arguments:
- A search index name
- An rquery
The SEARCH
function only works when a search index is defined. The SEARCH
function can be combined with any arbitrary SQL constructs, such as filters,
aggregations, or joins.
The following query uses the SEARCH
function to return all albums that have
either friday
or monday
in the title:
SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, 'friday OR monday')
The rquery language
The second argument of the SEARCH
function is a domain-specific language (DSL)
called rquery. The rquery language is similar to what internet users are
accustomed to using on google.com:
- Multiple terms imply
AND
. For example, "Big Time" is equivalent tobig AND time
. The
OR
operation implies disjunction between two terms, such asbig OR time
. The statementSEARCH(tl, 'big time OR fast car')
is equivalent to:SEARCH(tl, 'big') AND (SEARCH(tl, 'time') OR SEARCH(tl, 'fast')) AND SEARCH(tl, 'cat');
OR
only applies to the two adjacent terms. For example, the search expressionhappy friday OR monday
searches for all the documents that have the termhappy
and either the termfriday
ormonday
.Double quotes mean a phrase search. For example, the rquery
"big cat"
matches "I have a big cat", but doesn't match "My cat is big".The
AROUND
operator matches terms that are within a certain distance of each other. For example, rquerybig AROUND cat
matches "big, white, and fluffy cat", but doesn't match "big dog was sitting next to a small cat". The default is to match terms separated by, at most, five positions. To adjust the distance, pass an argument to theAROUND
operator. Spanner supports two syntaxes forAROUND
:big AROUND(10) cat
big AROUND 10 cat
The
AROUND
operator, when used as part of a phrase, can match terms that are within a certain distance of each other, and in the same order (the default is five tokens). For example, the stringbig AROUND cat
matches "big white and fluffy cat", but not "cat was big".Negation of a single token is expressed with a dash (
-
). For example-dog
matches all documents that don't contain the termdog
.Punctuation is generally ignored. For example, "Big Time!" is equivalent to "Big Time".
Search is case insensitive. For example, "Big Time" matches "big time".
The OR
andAROUND
operators are case sensitive. The pipe character (|
) is a shortcut forOR
.
How the rquery language works
The rquery language follows the same rules as the plain-text tokenizer when splitting the input string into distinct terms. This includes segmentation of Asian languages.
Many applications provide a way for their users to enter a search query in a
search box. The easiest way to integrate such end user queries is to send the
user input directly into the SEARCH
function.
The following table explains the meaning of various rquery strings:
rquery | Explanation |
---|---|
Big Cat |
Matches documents that contain both terms "big" and "cat". |
cat OR dog |
Matches documents that contain at least one of the terms "cat" and "dog". |
-cat |
Matches all documents that don't contain the term "cat". |
"big dog" -"big cat" |
Matches documents that contain two adjacent terms "big" and "dog", but don't contain adjacent "big" and "cat". For example, this query matches "I have a big dog and a small cat", but doesn't match "I have a big dog and a big cat". |
cat|dog |
This is the same as cat OR dog. |
and OR or |
Matches documents that have either the term "and" or the term "or" (the OR operator must be upper case) |
Enhanced query mode
In addition to a purely token-based full-text search, Spanner
supports a richer mode called enhance_query
. When enabled, this mode extends
the search query to include more token variants. These rewrites increase search
recall.
To enable this option, set the optional argument enhance_query=>true
in the
SEARCH
function. For example, the search query hotl cal
matches the album
Hotel California
.
SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, 'hotl cal', enhance_query=>true)
The enhance_query
mode is a query-time option. It doesn't affect tokenization.
You can use the same search index with or without enhance_query
.
Google is continuously improving the query enhancement algorithms. As the
result, a query with enhance_query == true
might yield slightly different
results over time..
When the enhance_query
mode is enabled, it may increase the number of terms
that the SEARCH
function is looking for and may result in slightly elevated
latency.
For example, the following query uses a three-second timeout and fails if
enhance_query
is unavailable:
@{require_enhance_query=true, enhance_query_timeout_ms=3000}
SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, 'cat', enhance_query=>true)
SQL query requirements
There are several conditions that a SQL query must meet to use a search index. If these conditions aren't met, the query uses either an alternative query plan or fails if no alternative plan exists.
Queries must meet the following conditions:
SEARCH
andSEARCH_SUBSTRING
functions require a search index. Spanner doesn't support these functions in queries against the base table or secondary indexes. Queries againstPartitioned indexes must have all partition columns bound by an equality condition in the
WHERE
clause of the query.For example, if a search index is defined as
PARTITION BY x, y
, the query must have a conjunct in theWHERE
clause ofx = <parameter or constant> AND y = <parameter or constant>
. That search index isn't considered by the query optimizer if such a condition is missing.All
TOKENLIST
columns referenced bySEARCH
andSEARCH_SUBSTRING
operators must be indexed in the same search index.For example, consider the following table and index definition:
CREATE TABLE Albums ( AlbumId STRING(MAX) NOT NULL, AlbumTitle STRING(MAX), AlbumStudio STRING(MAX), AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumTitle)) HIDDEN, AlbumStudio_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumStudio)) HIDDEN ) PRIMARY KEY(AlbumId); CREATE SEARCH INDEX AlbumsTitleIndex ON Albums(AlbumTitle_Tokens); CREATE SEARCH INDEX AlbumsStudioIndex ON Albums(AlbumStudio_Tokens);
The following query fails because there's no single search index that indexes both
AlbumTitle_Tokens
andAlbumStudio_Tokens
:SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, @p1) AND SEARCH(AlbumStudio_Tokens, @p2)
If the sort order column is nullable, both the schema and the query must exclude rows where the sort order column is NULL. See Search index sort order for details.
If the search index is NULL filtered, the query must include the same NULL-filtering expression that's used in an index. See NULL-filtered search indexes for details.
Search indexes aren't supported in DML, partitioned DML, and partitioned queries .
Search indexes are typically used in read-only transactions. If application requirements allow stale results, we recommend running search queries with a staleness duration of 10 seconds or longer. For more information, see Read stale data. This is particularly useful for large search queries that fan out to multiple Paxos groups.
Search indexes aren't recommended in read-write transactions. During execution, search queries lock an entire index partition; as a result, a high rate of search queries in read-write transactions might cause lock conflicts leading to latency spikes. By default, search indexes aren't automatically selected in read-write transactions. If a query is forced to use a search index in a read-write transaction it fails by default. This behavior can be overridden with the
@{ALLOW_SEARCH_INDEXES_IN_TRANSACTION=TRUE}
statement-level hint (but queries are still prone to lock conflicts).
Once index eligibility conditions are met, the query optimizer tries to
accelerate non-text query conditions (like Rating > 4
). If the search index
doesn't include the appropriate TOKENLIST
column, the condition isn't
accelerated and remains a residual condition.
Query parameters
rquery and other parameters like OFFSET
are specified as either a literal or
a query parameter.
We recommend using query parameters rather than string literals. Parameterized
queries have better query
cache hit rates resulting in
lower query latency and lower overall CPU usage.
For example, instead of using a query like the following:
SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, 'cat')
use the following syntax:
SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, @p)
Spanner runs the query optimizer on distinct SQL text. The fewer distinct SQL texts the application uses, the fewer times that the query optimization is invoked.
Search index sort order
Search indexes sort order behavior is different from secondary indexes.
For example, consider the following table:
CREATE TABLE Albums (
AlbumId STRING(MAX) NOT NULL,
ReleaseTimestamp INT64 NOT NULL,
AlbumName STRING(MAX),
AlbumName_Token TOKENLIST AS (TOKEN(AlbumName)) HIDDEN
) PRIMARY KEY(AlbumId);
The application might define a secondary index to look up information using the
AlbumName
sorted by ReleaseTimestamp
:
CREATE INDEX AlbumsSecondaryIndex ON Albums(AlbumName, ReleaseTimestamp DESC);
The equivalent search index looks as following (this uses exact-match tokenization, since secondary indexes don't support full-text searches):
CREATE SEARCH INDEX AlbumsSearchIndex
ON Albums(AlbumName_Token)
ORDER BY ReleaseTimestamp DESC;
Search index sort order must conform to the following rules:
- Only use
INT64
columns for the sort order of a search index. Columns that have arbitrary sizes use too many resources on the search index because Spanner needs to store the docid next to every token. Specifically, the sort order column can't use theTIMESTAMP
type becauseTIMESTAMP
uses nanosecond precision which doesn't fit in a 64-bit integer. Sort order columns must not be
NULL
. There are two ways to meet this requirement:- Declare the sort order column as
NOT NULL
. - Configure the index to exclude NULL values.
- Declare the sort order column as
In practice, a timestamp is often used to determine the sort order. A common practice is to use microseconds since the Unix epoch for such timestamps.
Applications usually retrieve the newest data first using a search index that's sorted in descending order.
Index selection
Spanner typically selects the most efficient index for a query
using cost-based modeling. However, the FORCE_INDEX
hint explicitly instructs
Spanner to use a specific search index. For example, the
following shows how to force Spanner to use the AlbumsIndex
:
SELECT AlbumId
FROM Albums @{FORCE_INDEX=AlbumsIndex}
WHERE SEARCH(AlbumTitle_Tokens, @p1)
If the specified search index isn't eligible, the query fails, even if there are other eligible search indexes.
Snippets in search results
A snippet is a piece of text extracted from a given string that gives users a sense of what a search result contains, and the reason why the result is relevant to their query.
For example, Gmail uses snippets to indicate the portion of an email that matches the search query:
Having the database generate a snippet has several benefits:
- Convenience: You don't need to implement logic to generate snippets from a search query.
- Efficiency: Snippets reduce the output size from the server.
The SNIPPET
function creates the snippet. It returns the relevant portion of
the original string value along with positions of characters to highlight. The
client can then choose how to display the snippet to the end user (for example,
using highlighted or bold text). The SNIPPET
function removes all HTML tags
from the original string.
For example, the following uses SNIPPET
to retrieve text from AlbumTitle
:
SELECT AlbumId, SNIPPET(AlbumTitle, "Fast Car")
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, "Fast Car")
What's next
- Learn how to rank search results.
- Learn how to perform a substring search.
- Learn how to paginate search results.
- Learn how to mix full-text and non-text queries.
- Learn how to search multiple columns.