Query optimizer

What is a query optimizer?

The Cloud Spanner SQL query optimizer converts a declarative SQL statement, that describes what data the query wants, into an imperative execution plan, that describes one way to precisely obtain that data. The process of transforming a declarative statement into a query execution plan involves performing transformations to tree structures used to represent the query. It is expected that the optimizer, in the process of producing an execution plan, preserves the logical meaning of the original SQL query so that the correct rows are returned.

Another important role of the optimizer is to produce an execution plan that is efficient.

How does the Cloud Spanner optimizer produce efficient execution plans?

Cloud Spanner's optimizer uses a combination of well-established heuristics and cost-based optimization to produce efficient plans. Some heuristics are very straightforward, such as "perform filters on remote machines rather than pulling data to the local machine". Other heuristics are more complex, but still embody the principle of moving logic that reduces data volume closer to the data. That principle is particularly important in a system that shards data across multiple machines.

Not all execution decisions can be made effectively using such fixed rules, so Cloud Spanner's optimizer also makes decisions based on an estimated cost of alternatives. Such cost estimates are calculated using the structure of the query, the schema of the database, and estimates of the data volume that will be produced by fragments of the query. For example, Cloud Spanner will estimate how many rows of the Songs table qualify the filter SongGenre = "Country" if that filter appears in a query. To help the computation of such estimates, Cloud Spanner periodically gathers statistics on the user data.

To learn more about query execution plans and how they are used by Cloud Spanner to perform queries in a distributed environment, see Query execution plans.

Query optimizer versioning

Over time, the query optimizer will evolve, broadening the set of choices in the query execution plan that are cost-based, and improving the accuracy of the estimates that inform those choices, leading to more efficient query execution plans.

Cloud Spanner rolls out optimizer updates as new query optimizer versions. By default, each database always uses the latest version of the optimizer. To put more control in your hands, we've introduced the ability to manage the query optimizer version that your queries use. Before committing to the latest version, you can compare query performance profiles between older versions and the latest version. To learn more, see Manage query optimizer.

Query optimizer version history

The following is a summary of the updates made to the query optimizer in each release.

Version 2: March 1st, 2020 (latest)

  • Adds optimizations in index selection.
  • Improves the performance of REGEXP_CONTAINS and LIKE predicates under certain circumstances.
  • Improves the performance of a scan under a GROUP BY in certain situations.

Version 1: June 18th, 2019

  • Includes many rule based optimizations such as predicate pushdown, limit pushdown, redundant join and redundant expression removal, to name a few.

  • Uses statistics on user data to select which index to use to access each table.

To manage the optimizer version for your scenario, see Manage the query optimizer