Spanner supports both unpartitioned and partitioned search indexes. This page describes how to create a partitioned search index in Spanner.
Overview
An unpartitioned index is created when the PARTITION BY
clause is omitted in
the index definition. In an unpartitioned index, a query needs to read from
all the index splits. This limits the potential scalability of full-text search
queries.
Partitioned indexes, on the other hand, subdivide the index into smaller units,
one for each unique partition. Queries can only search within a single partition
at a time, specified by an equality condition in the WHERE
clause. Queries
against partitioned indexes are generally more efficient than queries against
unpartitioned indexes because Spanner only needs to read data for a
single partition.
Partitioning of the search index is analogous to the key prefix of a secondary
index.
For example, suppose there are 1,000,000 SingerIds
in a database and the
following two indexes:
CREATE TABLE Albums (
AlbumId STRING(MAX) NOT NULL,
SingerId STRING(MAX) NOT NULL,
ReleaseTimestamp INT64 NOT NULL,
AlbumTitle STRING(MAX),
AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumTitle)) HIDDEN,
SingerId_Tokens TOKENLIST AS (TOKEN(SingerId)) HIDDEN
) PRIMARY KEY(SingerId, AlbumId);
CREATE SEARCH INDEX AlbumsUnpartitionedIndex
ON Albums(AlbumTitle_Tokens, SingerId_Tokens);
CREATE SEARCH INDEX AlbumsIndexBySingerId
ON Albums(AlbumTitle_Tokens)
PARTITION BY SingerId;
The following query selects the AlbumsIndexBySingerId
index because it only
searches data for a single singer. This type of query typically uses fewer
resources.
SELECT AlbumId
FROM Albums
WHERE SingerId = "singer1"
AND SEARCH(AlbumTitle_Tokens, 'happy')
It's also possible to force
a query to use AlbumsUnpartitionedIndex
to return the same results.
However, it uses more resources, because the query needs to access all index
splits and filter through all albums for all singers to find the token "happy",
rather than just the splits corresponding to singer singer1
.
However, there are times when the application needs to search through all of the albums rather than the albums for a specific singer. In these cases, you must use an unpartitioned index:
SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, 'piano concerto 1')
The general recommendation is to use the finest granularity of partitioning that's practical and appropriate for the query. For example, if the application queries an email mailbox where each query is restricted to a specific mailbox, partition the search index on the mailbox ID. However, if the query needs to search through all mailboxes, an unpartitioned index is a better fit.
Certain applications might require multiple partitioning strategies to accommodate their specific search requirements. For example, an inventory management system might need to support queries filtered by product type or manufacturer. Additionally, some applications might need multiple presorts, such as sorting by creation or modification time. In these scenarios, it's recommended that you create multiple search indexes, each optimized for the corresponding queries.
What's next
- Learn about tokenization and Spanner tokenizers.
- Learn about search indexes.
- Learn about numeric indexes.