Search multiple columns in search indexes

Search indexes can index multiple tokenized columns, making queries on these columns more efficient. This page describes how to perform a search on multiple columns, which is a type of full-text search.

The structure of the search index ensures that queries don't need a distributed join, ensuring predictable performance of the queries. The distributed join is avoided due to the colocation of all tokens that correspond to a base table row on the same split.

For example, consider the following schema:

CREATE TABLE Albums (
  AlbumId STRING(MAX) NOT NULL,
  Title STRING(MAX),
  Studio STRING(MAX),
  Title_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(Title)) HIDDEN,
  Studio_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(Studio)) HIDDEN
) PRIMARY KEY(AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(Title_Tokens, Studio_Tokens);

A query can now search both fields:

SELECT AlbumId
FROM Albums
WHERE SEARCH(Title_Tokens, "fifth symphony")
  AND SEARCH(Studio_Tokens, "Blue Note Studio")

Spanner supports multi-column search queries in conjunction, disjunction, and negation operators in the WHERE clause. You can use all of the following types of queries with a search index:

  • Conjunction: Find documents where Title has the term "car" and Studio has the term "sun".

    SELECT AlbumId
    FROM Albums
    WHERE SEARCH(Title_Tokens, 'car') AND SEARCH(Studio_Tokens, 'sun')
    
  • Disjunction: Find documents where either Title has the term "car" or Studio has the term "sun"

    SELECT AlbumId
    FROM Albums
    WHERE SEARCH(Title_Tokens, 'car') OR SEARCH(Studio_Tokens, 'sun')
    
  • Negation: Find all documents where Title doesn't contain the term "car".

    SELECT AlbumId
    FROM Albums
    WHERE NOT SEARCH(Title_Tokens, 'car')
    

    The rquery language can perform the same type of searches:

    SELECT AlbumId
    FROM Albums
    WHERE SEARCH(Title_Tokens, '-car')
    

    Both forms filter documents where Title is NULL. Tokenization and search functions are defined to return NULL on NULL input. SQL defines NOT NULL as NULL.

Additionally, you can reference the same TOKENLIST column multiple times.

SELECT AlbumId
FROM Albums
WHERE (SEARCH(Title_Tokens, 'car') OR SEARCH(Studio_Tokens, 'sun'))
  AND (SEARCH(Title_Tokens, 'guy') OR SEARCH(Studio_Tokens, electric))

Use either the rquery language or SQL to search for multiple terms in the same column. rquery is recommended due to its efficient query caching for parameterized queries. Aside from the better query cache hit rate, the rquery and SQL languages have the same latency and performance rates.

SELECT AlbumId
FROM Albums
WHERE SEARCH(Title_Tokens, 'car OR guy')

SELECT AlbumId
FROM Albums
WHERE SEARCH(Title_Tokens, 'car') OR SEARCH(Title_Tokens, 'guy')

You can also use non-text conditions accelerated with search indexes in the same query with full-text search functions.

What's next