Web applications often paginate data as it's presented to users. The end user receives one page of results, and when they navigate to the next page, the next batch of results is retrieved and presented. This page describes how to add pagination to search results when performing a full-text search in Spanner.
Overview
There are two ways to implement paginated queries in Spanner: key-based pagination (recommended) and offset-based pagination.
Key-based pagination is a method for retrieving search results in smaller, more manageable chunks while ensuring consistent results across requests. A unique identifier (the "key") from the last result of a page is used as a reference point to fetch the next set of results.
Spanner generally recommends using key-based pagination. While offset-based pagination is easier to implement, it has two significant drawbacks:
- Higher query cost:Offset-based pagination repeatedly retrieves and discards the same results, leading to increased costs and decreased performance.
- Inconsistent Results: In paginated queries, each page is typically retrieved at a different read timestamp. For example, the first page might come from a query at 1 PM, and the next from a query at 1:10 PM. This means that search results can change between queries, leading to inconsistent results across pages.
Key-based pagination, on the other hand, uses a unique identifier (key) from the last result of a page to fetch the next set of results. This ensures both efficient retrieval and consistent results, even if the underlying data changes.
To provide stability in page results, the application could issue all queries
for different pages at the same timestamp. This, however, might fail if the
query exceeds the version retention
period
(the default is 1 hour). For example, this failure happens if version_gc
is
one hour, and the end user fetched the first results at 1 PM and clicked Next
at 3 PM.
Use key-based pagination
Key-based pagination remembers the last item of the previous page and uses it as
a starting point for the next page query. To achieve this, the query must return
the columns specified in the ORDER BY
clause and limit the number of rows
using LIMIT
.
For key-based pagination to work, the query must order results by some strict total order. The easiest way to get one is to choose any total order and then add tie-breaker columns, if needed. In most cases, the total order is the search index sort order and the unique combination of columns is the base table primary key.
Using our Albums
sample schema, for the first page, the query looks like the
following:
SELECT AlbumId, ReleaseTimestamp
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, "green")
ORDER BY ReleaseTimestamp DESC, AlbumId
LIMIT @page_size;
The AlbumId
is the tie breaker since ReleaseTimestamp
isn't a key. There
might be two different albums with the same value for ReleaseTimestamp
.
To resume, the application runs the same query again, but with a WHERE
clause
that restricts the results from the previous page. The additional condition
needs to account for key direction (ascending versus descending), tie breakers,
and the order of NULL values for nullable columns.
In our example, AlbumId
is the only key column (in ascending order) and it
can't be NULL, so the condition is the following:
SELECT AlbumId, ReleaseTimestamp
FROM Albums
WHERE (ReleaseTimestamp < @last_page_release_timestamp
OR (ReleaseTimestamp = @last_page_release_timestamp
AND AlbumId > @last_page_album_id))
AND SEARCH(AlbumTitle_Tokens, @p)
ORDER BY ReleaseTimestamp DESC, AlbumId ASC
LIMIT @page_size;
Spanner interprets this kind of condition as seekable. This means that Spanner doesn't read the index for documents you're filtering out. This optimization is what makes key-based pagination much more efficient than offset-based pagination.
Use offset-based pagination
Offset-based pagination leverages the LIMIT
and OFFSET
clauses of SQL query
to simulate pages. The LIMIT
value indicates the number of results per page.
The OFFSET
value is set to zero for the first page, page size for the second
page, and double the page size for the third page.
For example, the following query fetches the third page, with a page size of 50:
SELECT AlbumId
FROM Albums
WHERE SEARCH(AlbumTitle_Tokens, "green")
ORDER BY ReleaseTimestamp DESC, AlbumId
LIMIT 50 OFFSET 100;
Usage Notes:
- The
ORDER BY
clause is highly recommended to ensure consistent ordering between pages. - In production queries, use query parameters rather than constants to specify
LIMIT
andOFFSET
to make query caching more efficient. For more information, see Query parameters.
What's next
- Learn how to rank search results.
- Learn how to perform a substring search.
- Learn how to mix full-text and non-text queries.
- Learn how to search multiple columns.