Query Execution Operators

Introduction

This page describes details about operators used in Cloud Spanner Query Execution Plans.

The queries and execution plans in this topic are based on the following database schema:

CREATE TABLE Singers (
  SingerId   INT64 NOT NULL,
  FirstName  STRING(1024),
  LastName   STRING(1024),
  SingerInfo BYTES(MAX),
  BirthDate  DATE,
) PRIMARY KEY(SingerId);

CREATE INDEX SingersByFirstLastName ON Singers(FirstName, LastName);

CREATE TABLE Albums (
  SingerId        INT64 NOT NULL,
  AlbumId         INT64 NOT NULL,
  AlbumTitle      STRING(MAX),
  MarketingBudget INT64,
) PRIMARY KEY(SingerId, AlbumId),
  INTERLEAVE IN PARENT Singers ON DELETE CASCADE;

CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle);

CREATE INDEX AlbumsByAlbumTitle2 ON Albums(AlbumTitle) STORING (MarketingBudget);

CREATE TABLE Songs (
  SingerId  INT64 NOT NULL,
  AlbumId   INT64 NOT NULL,
  TrackId   INT64 NOT NULL,
  SongName  STRING(MAX),
  Duration  INT64,
  SongGenre STRING(25),
) PRIMARY KEY(SingerId, AlbumId, TrackId),
  INTERLEAVE IN PARENT Albums ON DELETE CASCADE;

CREATE INDEX SongsBySingerAlbumSongNameDesc ON Songs(SingerId, AlbumId, SongName DESC), INTERLEAVE IN Albums;

CREATE INDEX SongsBySongName ON Songs(SongName);

CREATE TABLE Concerts (
  VenueId      INT64 NOT NULL,
  SingerId     INT64 NOT NULL,
  ConcertDate  DATE NOT NULL,
  BeginTime    TIMESTAMP,
  EndTime      TIMESTAMP,
  TicketPrices ARRAY<INT64>,
) PRIMARY KEY(VenueId, SingerId, ConcertDate);

You can run queries and retrieve execution plans even if the tables have no data.

Leaf operators

A leaf operator is an operator that has no children. The types of leaf operators are:

Array unnest

An array unnest operator flattens an input array into rows of elements. Each resulting row contains up to two columns: the actual value from the array, and optionally the zero-based position in the array.

For example, using this query:

SELECT a, b FROM UNNEST([1,2,3]) a WITH OFFSET b;

The query flattens the array [1,2,3] in column a and shows the array position in column b.

These are the results:

+------+------+
| a    | b    |
+------+------+
|    1 |    0 |
|    2 |    1 |
|    3 |    2 |
+------+------+

This is the execution plan:

array unnest operator

Enumerate rows

An enumerate rows operator returns a row set from zero or more parameters. One scenarios is when the SQL query contains no tables.

For example, using this query:

SELECT 1 + 2 AS Result;

These are the results:

+--------+
| Result |
+--------+
|      3 |
+--------+

This is the execution plan:

enumerate rows operator

Scan

A scan operator returns rows by scanning a source of rows. There are three types of scan operators:

  • Table scan: The scan occurs on a table.
  • Index scan: The scan occurs on an index.
  • Batch scan: The scan occurs on intermediate tables created by other relational operators (for example, a table created by a distributed cross apply).

Whenever possible, Cloud Spanner applies simple predicates on keys as part of a scan. Scans execute more efficiently when predicates are applied because the scan does not need to read the entire table or index. Predicates appear in the execution plan in the form KeyPredicate: column=value.

For example, using this query:

SELECT s.LastName
FROM singers@{FORCE_INDEX=SingersByFirstLastName} AS s
WHERE s.FirstName = 'Catalina';

These are the results:

+----------+
| LastName |
+----------+
| Smith    |
+----------+

This is the execution plan:

scan operator

In the execution plan, the top-level distributed union operator sends subplans to remote servers. Each subplan has a serialize result operator and an index scan operator. The predicate Key Predicate: FirstName = 'Catalina' restricts the scan to rows in the index SingersByFirstLastname that have FirstName equal to Catalina. The output of the index scan is returned to the serialize result operator.

Unary operators

A unary operator is an operator that has a single relational child.

The following operators are unary operators:

Aggregate

An aggregate operator implements GROUP BY SQL statements and aggregate functions (such as COUNT). The input for an aggregate operator is logically partitioned into groups arranged on key columns (or into a single group if GROUP BY is not present). For each group, zero or more aggregates are computed.

For example, using this query:

SELECT s.SingerId, AVG(s.duration) AS average, COUNT(*) AS count
FROM Songs AS s
GROUP BY SingerId;

The query groups by SingerId and performs an AVG aggregation and a COUNT aggregation.

These are the results:

+----------+---------+-------+
| SingerId | average | count |
+----------+---------+-------+
|        3 | 278     |     1 |
|        2 | 225.875 |     8 |
+----------+---------+-------+

The results are aggregated by SingerId with each group is arranged by columns average and count.

This is the execution plan:

aggregate operator

Aggregate operators can be stream-based or hash-based. The execution plan above shows a stream-based aggregate. Stream-based aggregates read from already pre-sorted input (if GROUP BY is present) and compute the groups without blocking. Hash-based aggregates build hash tables to maintain incremental aggregates of multiple input rows simultaneously. Stream-based aggregates are faster and use less memory than hash-based aggregates, but require the input to be sorted (either by key columns or secondary indexes).

For distributed scenarios, an aggregate operator can be separated into a local/global pair. Each remote server performs the local aggregation on its input rows, and then returns its results to the root server. The remote server performs the global aggregation.

Create batch

A create batch operator batches its input rows into a sequence. A create batch operation usually occurs as a part of a distributed cross apply operation. The input rows can be re-ordered during the batching. The number of input rows that get batched in each execution of the batch operator is variable.

See the distributed cross apply operator for an example of a create batch operator in an execution plan.

Compute

A compute operator produces output by reading its input rows and adding one or more additional columns that are computed via scalar expressions. See the union all operator for an example of a compute operator in an execution plan.

Compute struct

A compute struct operator creates a variable for a structure that contains fields for each of the input columns.

For example, using this query:

SELECT FirstName,
ARRAY(SELECT AS STRUCT a.AlbumTitle, a.ReleaseDate
      FROM Albums AS a
      WHERE a.SingerId = s.SingerId)
FROM singers AS s

These are the results:

+-----------+------------------------------------------------------ +
| FirstName | Unspecified                                           |
+-----------+------------------------------------------------------ +
| Marc      | [["Total Junk",2015-03-01],["Go, Go, Go",2017-02-01]] |
| Catalina  | [["Green",2016-11-01]]                                |
| Alice     | [["Nothing To Do With Me",2016-09-01]]                |
| Lea       | [["Play",2015-03-01]]                                 |
+-----------+------------------------------------------------------ +

This is the execution plan:

compute struct operator

In the execution plan, the array subquery operator receives input from a compute struct operator. The compute struct operator creates a structure from the AlbumTitle and ReleaseDate columns in the Albums table.

Filter

A filter operator reads all rows from its input, applies a scalar predicate on each row, and then returns only the rows that satisfy the predicate.

For example, using this query:

SELECT s.LastName
FROM Singers AS s
WHERE s.LastName LIKE 'Rich%';

These are the results:

+----------+
| LastName |
+----------+
| Richards |
+----------+

This is the execution plan:

filter operator

The predicate for singers whose last name starts with Rich is implemented as a filter. The filter's input is the output from an index scan, and the filter's output are rows where LastName starts with Rich.

For performance, whenever a filter is directly positioned above a scan, the filter impacts how data is read. For example, consider a table with key k. A filter with predicate k = 5 directly on top of a scan of the table will look for rows that match k = 5, without reading the entire input. This results in more efficient execution of the query. In the example above, the filter operator reads only the rows that satisfy the WHERE s.LastName LIKE 'Rich%' predicate.

Limit

A limit operator constrains the number of rows returned. An optional OFFSET parameter specifies the starting row to return. For distributed scenarios, a limit operator can be separated into a local/global pair. Each remote server applies the local limit for its output rows, and then returns its results to the root server. The root server aggregates the rows sent by the remote servers and then applies the global limit.

For example, using this query:

SELECT s.SongName
FROM Songs AS s
LIMIT 3;

These are the results:

+----------------------+
| SongName             |
+----------------------+
| Not About The Guitar |
| The Second Time      |
| Starting Again       |
+----------------------+

This is the execution plan:

limit operator

The local limit is the limit for each remote server. The root server aggregates the rows from the remote servers and then applies the global limit.

Serialize result

A serialize result operator is a special case of the compute struct operator that serializes each row of the final result of the query, for returning to the client.

For example, using this query:

SELECT ARRAY(SELECT AS STRUCT so.SongName, so.SongGenre
             FROM Songs AS so
             WHERE so.SingerId = s.SingerId)
FROM Singers AS s

The query asks for an array of SongNameand SongGenre based on SingerId.

These are the results:

+------------------------------------------------------------------+
| []                                                               |
| [[Let's Get Back Together, COUNTRY], [Starting Again, ROCK], ... |
| [[Not About The Guitar, BLUES]]                                  |
| []                                                               |
| []                                                               |
+------------------------------------------------------------------+

This is the execution plan:

Serialize result operator

The serialize result operator creates a result that contains SongName and SongGenre for each row of the Songs table.

Sort

A sort operator reads its input rows, orders them by column(s), and then returns the sorted results.

For example, using this query:

SELECT s.SongName
FROM Songs AS s
ORDER By SongName;

These are the results:

+--------------------------+
| SongName                 |
+--------------------------+
| 42                       |
| Blue                     |
| Fight Story              |
| I Knew You Were Magic    |
| Let Us Get Back Together |
| Not About The Guitar     |
| Nothing Is The Same      |
| Starting Again           |
| The Second Time          |
+--------------------------+

This is the execution plan:

sort operator

In this execution plan, the sort operator receives its input rows from a distributed union operator, sorts the input rows, and returns the sorted rows to a serialize result operator.

To constrain the number of rows returned, a sort operator can optionally have LIMIT and OFFSET parameters. For distributed scenarios, a sort operator with a LIMIT and/or OFFSET operator is separated into a local/global pair. Each remote server applies the sort order and the local limit/offset for its input rows, and then returns its results to the root server. The root server aggregates the rows sent by the remote servers, sorts them, and then applies the global limit/offset.

For example, using this query:

SELECT s.SongName
FROM Songs AS s
ORDER By SongName
LIMIT 3;

These are the results:

+--------------------------+
| SongName                 |
+--------------------------+
| 42                       |
| Blue                     |
| Fight Story              |
+--------------------------+

This is the execution plan:

sort operator with limit

The execution plan shows the local limit for the remote servers and the global limit for the root server.

Union input

A union input operator returns results to a union all operator. See the union all operator for an example of a union input operator in an execution plan.

Binary operators

A binary operator is an operator that has two relational children. The following operators are binary operators:

Cross apply

A cross apply operator runs a table query on each row retrieved by a query of another table, and returns the union of all the table query runs. Cross apply and outer apply operators execute row-oriented processing, unlike operators that execute set-based processing such as hash join . The cross apply operator has two inputs, input and map. The cross apply operator applies each row in the input side to the map side. The result of the cross apply has columns from both the input and map sides.

For example, using this query:

SELECT si.FirstName,
  (SELECT so.SongName
   FROM Songs AS so
   WHERE so.SingerId=si.SingerId
   LIMIT 1)
FROM Singers AS si;

The query asks for the first name of each singer, along with the name of only one of the singer's songs.

These are the results:

+-----------+--------------------------+
| FirstName |                          |
+-----------+--------------------------+
| Alice     | Not About The Guitar     |
| Catalina  | Let Us Get Back Together |
| David     | NULL                     |
| Lea       | NULL                     |
| Marc      | NULL                     |
+-----------+--------------------------+

The first column is populated from the Singers table, and the second column is populated from the Songs table. In cases where a SingerId existed in the Singers table but there was no matching SingerId in the Songs table, the second column contains NULL.

This is the execution plan:

cross apply operator

The top-level node is a distributed union operator. The distributed union operator distributes subplans to remote servers. The subplan contains a serialize result operator that computes the singer's first name and the name of one of the singer's songs and serializes each row of the output.

The serialize result operator receives its input from a cross apply operator. The input side for the cross apply operator is a table scan on the Songs table.

The map side for the cross apply operation contains the following (from top to bottom):

  • An aggregate operator that returns Songs.SongName.
  • A limit operator that limits the number of songs returned to one per singer.
  • An index scan on the SongsBySingerAlbumSongNameDesc index.

The cross apply operator maps each row from the input side to a row in the map side that has the same SingerId. The cross apply operator output is the FirstName value from the input row, and the SongName value from the map row. (The SongName value will be NULL if there is no map row that matches on SingerId.) The distributed union operator at the top of the execution plan then combines all of the output rows from the remote servers and returns them as the query results.

Hash join

A hash join operator is a hash-based implementation of SQL joins. Hash joins execute set-based processing. The hash join operator reads rows from input marked as build and inserts them into a hash table based on a join condition. The hash join operator then reads rows from input marked as probe. For each row it reads from the probe input, the hash join operator looks for matching rows in the hash table. The hash join operator returns the matching rows as its result.

For example, using this query:

SELECT a.AlbumTitle, s.SongName
FROM Albums AS a HASH JOIN Songs AS s
ON a.SingerId = s.SingerId AND a.AlbumId = s.AlbumId;

These are the results:

+-----------------------+--------------------------+
| AlbumTitle            | SongName                 |
+-----------------------+--------------------------+
| Nothing To Do With Me | Not About The Guitar     |
| Green                 | The Second Time          |
| Green                 | Starting Again           |
| Green                 | Nothing Is The Same      |
| Green                 | Let Us Get Back Together |
| Green                 | I Knew You Were Magic    |
| Green                 | Blue                     |
| Green                 | 42                       |
| Terrified             | Fight Story              |
+-----------------------+--------------------------+

This is the execution plan:

hash join operator

In the execution plan, build is a distributed union that distributes scans on the table Albums. Probe is a distributed union operator that distributes scans on the index SongsBySingerAlbumSongNameDesc. The hash join operator reads all rows from the build side. Each build row is placed in a hash table based on the columns in the condition a.SingerId = s.SingerId AND a.AlbumId = s.AlbumId. Next, the hash join operator reads all rows from the probe side. For each probe row, the hash join operator looks for matches in the hash table. The resulting matches are returned by the hash join operator.

Resulting matches in the hash table may also be filtered by a residual condition before they are returned. (An example of where residual conditions appear is in non-equality joins). Hash join execution plans can be complex due to memory management and join variants. The main hash join algorithm is adapted to handle inner, semi, anti, and outer join variants.

Loop join

A loop join operator is a nested iteration that uses two inputs, outer and inner. Each row in outer is matched by a join condition to rows in inner.

For example, using this query:

SELECT a.AlbumTitle, s.SongName
FROM Albums AS a JOIN@{join_type=loop_join} Songs AS s
ON a.SingerId = s.SingerId AND a.AlbumId = s.AlbumId;

These are the results:

+-----------------------+--------------------------+
| AlbumTitle            | SongName                 |
+-----------------------+--------------------------+
| Nothing To Do With Me | Not About The Guitar     |
| Green                 | The Second Time          |
| Green                 | Starting Again           |
| Green                 | Nothing Is The Same      |
| Green                 | Let Us Get Back Together |
| Green                 | I Knew You Were Magic    |
| Green                 | Blue                     |
| Green                 | 42                       |
| Terrified             | Fight Story              |
+-----------------------+--------------------------+

This is the execution plan:

loop join operator

In the execution plan, outer is a distributed union that distributes scans on the table Albums. Inner is a distributed union operator that distributes scans on the index SongsBySingerAlbumSongNameDesc. The loop join operator reads all rows from outer. For each row in outer, the loop join operator reads all rows from inner and evaluates the condition a.SingerId = s.SingerId AND a.AlbumId = s.AlbumId. The loop join combines columns from inner and outer, and returns its results to a serialize result operator.

Outer apply

An outer apply operator is similar to a cross apply operator, except an outer apply operator ensures that each execution on the map side returns at least one row by manufacturing a NULL-padded row if needed. (In other words, it provides left outer join semantics.)

N-ary operators

An N-ary operator is an operator that has more than two relational children. The following operators are N-ary operators:

Union all

A union all operator combines all row sets of its children without removing duplicates. Union all operators receive their input from union input operators that are distributed across multiple servers. The union all operator requires that its inputs have the same schema, that is, the same set of data types for each column.

For example, using this query:

SELECT 1 a, 2 b
UNION ALL
SELECT 3 a, 4 b
UNION ALL
SELECT 5 a, 6 b;

The row type for the children consists of two integers.

These are the results:

+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
|    3 |    4 |
|    5 |    6 |
+------+------+

This is the execution plan:

union_all_operator

The union all operator combines its input rows, and in this example it sends the results to a serialize result operator.

A query such as the following would succeed, because the same set of data types is used for each column, even though the children use different variables for the column names:

SELECT 1 a, 2 b
UNION ALL
SELECT 3 c, 4 e;

A query such as the following would not succeed, because the children use different data types for the columns:

SELECT 1 a, 2 b
UNION ALL
SELECT 3 a, 'This is a string' b;

Scalar subqueries

A scalar subquery is a SQL sub-expression that is part of a scalar expression. Cloud Spanner attempts to remove scalar subqueries whenever possible. In certain scenarios, however, plans can explicitly contain scalar subqueries.

For example, using this query:

SELECT FirstName,
IF(FirstName='Alice',
   (SELECT COUNT(*)
    FROM Songs
    WHERE Duration > 300),
   0)
FROM Singers

This is the SQL sub-expression:

SELECT COUNT(*)
FROM Songs
WHERE Duration > 300

These are the results (of the complete query):

+-----------+----+
| FirstName |    |
+-----------+----+
| Alice     | 1  |
| Catalina  | 0  |
| David     | 0  |
| Lea       | 0  |
| Marc      | 0  |
+-----------+----+

This is the execution plan:

scalar subquery operator

The execution plan contains a scalar subquery, shown as Scalar Subquery, above an aggregate operator.

Cloud Spanner sometimes converts scalar subqueries into another operator such as a join or cross apply, to possibly improve performance.

For example, using this query:

SELECT *
FROM Songs
WHERE Duration = (SELECT MAX(Duration) FROM Songs)

This is the SQL sub-expression:

SELECT MAX(Duration) FROM Songs

These are the results (of the complete query):

+----------+---------+---------+---------------------+----------+-----------+
| SingerId | AlbumId | TrackId | SongName            | Duration | SongGenre |
+----------+---------+---------+---------------------+----------+-----------+
|        2 |       1 |       6 | Nothing Is The Same |      303 |     BLUES |
+----------+---------+---------+---------------------+----------+-----------+

This is the execution plan:

scalar subquery operator not display in plan

The execution plan does not contain a scalar subquery because Cloud Spanner converted the scalar subquery to a cross apply.

Array subqueries

An array subquery is similar to a scalar subquery, except that the subquery is allowed to consume more than one input row. The consumed rows are converted to a single scalar output array that contains one element per consumed input row.

For example, using this query:

SELECT a.AlbumId,
ARRAY(SELECT ConcertDate
      FROM Concerts
      WHERE Concerts.SingerId = a.SingerId)
FROM Albums AS a

This is the subquery:

SELECT ConcertDate
FROM Concerts
WHERE Concerts.SingerId = a.SingerId

The results of the subquery for each AlbumId are converted into an array of ConcertDate rows against that AlbumId. The execution plan contains an array subquery, shown as Array Subquery, above a distributed union operator:

array subquery operator

Distributed operators

The operators described previously on this page execute within the boundaries of a single machine. Distributed operators execute across multiple servers.

The following operators are distributed operators:

The distributed union operator is the primitive operator from which distributed cross apply and distributed outer apply are derived.

Distributed operators appear in execution plans with a distributed union variant on top of one or more local distributed union variants. A distributed union variant performs the remote distribution of subplans. A local distributed union variant is on top of each of the scans performed for the query, as shown in this execution plan:

distributed operator

The local distributed union variants ensure stable query execution when restarts occur for dynamically changing split boundaries.

Whenever possible, a distributed union variant has a split predicate that results in split pruning, meaning the remote servers execute subplans on only the splits that satisfy the predicate. This improves both latency and overall query performance.

Distributed union

A distributed union operator conceptually divides one or more tables into multiple splits, remotely evaluates a subquery independently on each split, and then unions all results.

For example, using this query:

SELECT s.SongName, s.SongGenre
FROM Songs AS s
WHERE s.SingerId = 2 AND s.SongGenre = 'ROCK';

These are the results:

+-----------------+-----------+
| SongName        | SongGenre |
+-----------------+-----------+
| Starting Again  | ROCK      |
| The Second Time | ROCK      |
| Fight Story     | ROCK      |
+-----------------+-----------+

This is the execution plan:

distributed union operator

The distributed union operator sends subplans to remote servers, which perform a table scan across splits that satisfy the query's predicate WHERE s.SingerId = 2 AND s.SongGenre = 'ROCK'. A serialize result operator computes the SongName and SongGenre values from the rows returned by the table scans. The distributed union operator then returns the combined results from the remote servers as the SQL query results.

Distributed cross apply

A distributed cross apply (DCA) operator extends the cross apply operator by executing across multiple servers. The DCA input side groups batches of rows (unlike a regular cross apply operator, which acts on only one input row at a time. The DCA map side is a set of cross apply operators that execute on remote servers.

For example, using this query:

SELECT AlbumTitle FROM Songs
JOIN Albums ON Albums.AlbumId=Songs.AlbumId

The results are in the format:

+-----------------------+
| AlbumTitle            |
+-----------------------+
| Green                 |
| Nothing To Do With Me |
| Play                  |
| Total Junk            |
| Green                 |
...

This is the execution plan:

distributed cross apply operator

The DCA input contains an index scan on the Albums table that batches rows of AlbumId and SingerId data. The map side for this cross apply operator is an index scan on the index SongsBySingerAlbumSongNameDesc, subject to the predicate of SingerId in the input row matching the SingerId key in the SongsBySingerAlbumSongNameDesc index. The mapping returns the SongName for the SingerId values in the batched input rows.

To summarize the DCA process for this example, the DCA's input is the batched rows from the Albums table, and the DCA's output is the application of these rows to the map of the index scan.

Distributed outer apply

A distributed outer apply operator extends the outer apply operator by executing over multiple servers, similar to the way a distributed cross apply operator extends a cross apply operator.

For example, using this query:

SELECT AlbumTitle FROM Songs
LEFT OUTER JOIN@{JOIN_TYPE=APPLY_JOIN} Albums
ON Albums.AlbumId=Songs.AlbumId

The results are in the format:

+-----------------------+
| AlbumTitle            |
+-----------------------+
| Green                 |
| Nothing To Do With Me |
| Play                  |
| Total Junk            |
| Green                 |
...

This is the execution plan:

distributed outer apply operator

Additional information

This section describes items that are not standalone operators, but instead execute tasks to support one or more of the operators listed above. The items described here are technically operators, but they are not separate operators in your query plan.

Struct constructor

A struct constructor creates a struct, or a collection of fields. It typically creates a struct for rows that result from a compute operation. A struct constructor is not a standalone operator. Instead, it appears in compute struct operators or serialize result operators.

For a compute struct operation, the struct constructor creates a struct so columns for the computed rows can use a single variable reference to the struct.

For a serialize result operation, the struct constructor creates a struct to serialize the results.

For example, using this query:

SELECT IF(TRUE, struct(1 AS A, 1 AS B), struct(2 AS A , 2 AS B)).A;

These are the results:

+---+
| A |
+---+
| 1 |
+---+

This is the execution plan:

struct constructor

In the execution plan, struct constructors appear inside a serialize result operator.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Spanner Documentation