Partition search indexes

Spanner supports both unpartitioned and partitioned search indexes. This page describes how to create a partitioned search index in Spanner.

An unpartitioned index is created when the PARTITION BY clause is omitted in the index definition. In an unpartitioned index, a query needs to read from all the index splits. This limits the potential scalability of full-text search queries.

Partitioned indexes, on the other hand, subdivide the index into smaller units, one for each unique partition. Queries can only search within a single partition at a time, specified by an equality condition in the WHERE clause. Queries against partitioned indexes are generally more efficient than queries against unpartitioned indexes because Spanner only needs to read data for a single partition. Partitioning the search index is analogous to the key prefix of a secondary index.

For example, suppose there are 1,000,000 SingerIds in a database and the following two indexes:

GoogleSQL

CREATE TABLE Albums (
  AlbumId STRING(MAX) NOT NULL,
  SingerId STRING(MAX) NOT NULL,
  ReleaseTimestamp INT64 NOT NULL,
  AlbumTitle STRING(MAX),
  AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumTitle)) HIDDEN,
  SingerId_Tokens TOKENLIST AS (TOKEN(SingerId)) HIDDEN
) PRIMARY KEY(SingerId, AlbumId);

CREATE SEARCH INDEX AlbumsUnpartitionedIndex
ON Albums(AlbumTitle_Tokens, SingerId_Tokens);

CREATE SEARCH INDEX AlbumsIndexBySingerId
ON Albums(AlbumTitle_Tokens)
PARTITION BY SingerId;

PostgreSQL

CREATE TABLE albums (
  albumid character varying NOT NULL,
  singerid character varying NOT NULL,
  releasetimestamp bigint NOT NULL,
  albumtitle character varying,
  albumtitle_tokens spanner.tokenlist GENERATED ALWAYS AS (spanner.tokenize_fulltext(albumtitle)) VIRTUAL HIDDEN,
  singerid_tokens spanner.tokenlist GENERATED ALWAYS AS (spanner.token(singerid)) VIRTUAL HIDDEN,
PRIMARY KEY(singerid, albumid));

CREATE SEARCH INDEX albumsunpartitionedindex
ON albums(albumtitle_tokens, singerid_tokens);

CREATE SEARCH INDEX albumsindexbysingerid
ON albums(albumtitle_tokens)
PARTITION BY singerid;

The following query selects the AlbumsIndexBySingerId index because it only searches data for a single singer. This type of query typically uses fewer resources.

GoogleSQL

SELECT AlbumId
FROM Albums
WHERE SingerId = "singer1"
AND SEARCH(AlbumTitle_Tokens, 'happy')

PostgreSQL

SELECT albumid
FROM albums
WHERE singerid = 'singer1'
AND spanner.search(albumtitle_tokens, 'happy')

It's also possible to force a query to use AlbumsUnpartitionedIndex to return the same results. However, it uses more resources, because the query needs to access all index splits and filter through all albums for all singers to find the token "happy", rather than just the splits corresponding to singer singer1.

However, there are times when the application needs to search through all of the albums rather than the albums for a specific singer. In these cases, you must use an unpartitioned index:

GoogleSQL

SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, 'piano concerto 1')

PostgreSQL

SELECT albumid
FROM albums
WHERE spanner.search(albumtitle_tokens, 'piano concerto 1')

The general recommendation is to use the finest granularity of partitioning that's practical and appropriate for the query. For example, if the application queries an email mailbox where each query is restricted to a specific mailbox, partition the search index on the mailbox ID. However, if the query needs to search through all mailboxes, an unpartitioned index is a better fit.

Certain applications might require multiple partitioning strategies to accommodate their specific search requirements. For example, an inventory management system might need to support queries filtered by product type or manufacturer. Additionally, some applications might need multiple presorts, such as sorting by creation or modification time. In these scenarios, it's recommended that you create multiple search indexes, each optimized for the corresponding queries. The Spanner query optimizer automatically selects an index for each query.

What's next

Learn about tokenization and Spanner tokenizers.
Learn about search indexes.
Learn about numeric indexes.