Query syntax

Query statements scan one or more tables or expressions and return the computed result rows. This topic describes the syntax for SQL queries in Cloud Spanner SQL.

SQL syntax

query_statement:
    [ statement_hint_expr ][ table_hint_expr ][ join_hint_expr ]
query_expr statement_hint_expr: '@{' statement_hint_key = statement_hint_value [, ...] '}' statement_hint_key: { USE_ADDITIONAL_PARALLELISM| OPTIMIZER_VERSION } query_expr: [ WITH with_query_name AS ( query_expr ) [, ...] ] { select | ( query_expr ) | query_expr set_op query_expr } [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ LIMIT count [ OFFSET skip_rows ] ] select: SELECT [{ ALL | DISTINCT }] { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...] [ FROM from_item [ tablesample_type ] [, ...] ] [ WHERE bool_expression ] [ GROUP BY expression [, ...] ] [ HAVING bool_expression ] set_op: UNION { ALL | DISTINCT } | INTERSECT { ALL | DISTINCT } | EXCEPT { ALL | DISTINCT } from_item: { table_name [ table_hint_expr ] [ [ AS ] alias ] | join | ( query_expr ) [ table_hint_expr ] [ [ AS ] alias ] | field_path | { UNNEST( array_expression ) | UNNEST( array_path ) | array_path } [ table_hint_expr ] [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] | with_query_name [ table_hint_expr ] [ [ AS ] alias ] } table_hint_expr: '@{' table_hint_key = table_hint_value '}' table_hint_key: { FORCE_INDEX | GROUPBY_SCAN_OPTIMIZATION } join: from_item [ join_type ] [ join_method ] JOIN [ join_hint_expr ] from_item [ { ON bool_expression | USING ( join_column [, ...] ) } ] join_type: { INNER | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] }

join_method:
{ HASH }
join_hint_expr: '@{' join_hint_key = join_hint_value [, ...] '}' join_hint_key: { FORCE_JOIN_ORDER | JOIN_METHOD } tablesample_type: TABLESAMPLE sample_method (sample_size percent_or_rows ) sample_method: { BERNOULLI | RESERVOIR } sample_size: numeric_value_expression percent_or_rows: { PERCENT | ROWS }

Notation rules

  • Square brackets "[ ]" indicate optional clauses.
  • Parentheses "( )" indicate literal parentheses.
  • The vertical bar "|" indicates a logical OR.
  • Curly braces "{ }" enclose a set of options.
  • A comma followed by an ellipsis within square brackets "[, ... ]" indicates that the preceding item can repeat in a comma-separated list.

Statement hints

The following hints are supported for query statements:

Hint key Possible values Description
USE_ADDITIONAL_PARALLELISM TRUE
FALSE (default)
If TRUE, the execution engine will favor using more parallelism when possible. Because this can reduce resources available to other operations, you may want to avoid this hint if you run latency-sensitive operations on the same instance.
OPTIMIZER_VERSION 1 to N|latest Executes the query using the specified optimizer version. Possible values are 1 to N (the latest optimizer version) or latest. If the hint is not set, the optimizer executes against the version that is set in database options or specified through the client API. If neither of those are set, the optimizer defaults to the latest version.

In terms of version setting precedence, the value set by the client API takes precedence over the value in the database options and the value set by this hint takes precedence over everything else.

For more information, see Query optimizer.

Sample tables

The following tables are used to illustrate the behavior of different query clauses in this reference.

Roster table

The Roster table includes a list of player names (LastName) and the unique ID assigned to their school (SchoolID). It looks like this:

+-----------------------+
| LastName   | SchoolID |
+-----------------------+
| Adams      | 50       |
| Buchanan   | 52       |
| Coolidge   | 52       |
| Davis      | 51       |
| Eisenhower | 77       |
+-----------------------+

You can use this WITH clause to emulate a temporary table name for the examples in this reference:

WITH Roster AS
 (SELECT 'Adams' as LastName, 50 as SchoolID UNION ALL
  SELECT 'Buchanan', 52 UNION ALL
  SELECT 'Coolidge', 52 UNION ALL
  SELECT 'Davis', 51 UNION ALL
  SELECT 'Eisenhower', 77)
SELECT * FROM Roster

PlayerStats table

The PlayerStats table includes a list of player names (LastName) and the unique ID assigned to the opponent they played in a given game (OpponentID) and the number of points scored by the athlete in that game (PointsScored).

+----------------------------------------+
| LastName   | OpponentID | PointsScored |
+----------------------------------------+
| Adams      | 51         | 3            |
| Buchanan   | 77         | 0            |
| Coolidge   | 77         | 1            |
| Davis      | 52         | 4            |
| Eisenhower | 50         | 13           |
+----------------------------------------+

You can use this WITH clause to emulate a temporary table name for the examples in this reference:

WITH PlayerStats AS
 (SELECT 'Adams' as LastName, 51 as OpponentID, 3 as PointsScored UNION ALL
  SELECT 'Buchanan', 77, 0 UNION ALL
  SELECT 'Coolidge', 77, 1 UNION ALL
  SELECT 'Adams', 52, 4 UNION ALL
  SELECT 'Buchanan', 50, 13)
SELECT * FROM PlayerStats

TeamMascot table

The TeamMascot table includes a list of unique school IDs (SchoolID) and the mascot for that school (Mascot).

+---------------------+
| SchoolID | Mascot   |
+---------------------+
| 50       | Jaguars  |
| 51       | Knights  |
| 52       | Lakers   |
| 53       | Mustangs |
+---------------------+

You can use this WITH clause to emulate a temporary table name for the examples in this reference:

WITH TeamMascot AS
 (SELECT 50 as SchoolID, 'Jaguars' as Mascot UNION ALL
  SELECT 51, 'Knights' UNION ALL
  SELECT 52, 'Lakers' UNION ALL
  SELECT 53, 'Mustangs')
SELECT * FROM TeamMascot

SELECT list

SELECT  [{ ALL | DISTINCT }]
    { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...]

The SELECT list defines the columns that the query will return. Expressions in the SELECT list can refer to columns in any of the from_items in its corresponding FROM clause.

Each item in the SELECT list is one of:

  • *
  • expression
  • expression.*

SELECT *

SELECT *, often referred to as select star, produces one output column for each column that is visible after executing the full query.

SELECT * FROM (SELECT "apple" AS fruit, "carrot" AS vegetable);

+-------+-----------+
| fruit | vegetable |
+-------+-----------+
| apple | carrot    |
+-------+-----------+

SELECT expression

Items in a SELECT list can be expressions. These expressions evaluate to a single value and produce one output column, with an optional explicit alias.

If the expression does not have an explicit alias, it receives an implicit alias according to the rules for implicit aliases, if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query.

SELECT expression.*

An item in a SELECT list can also take the form of expression.*. This produces one output column for each column or top-level field of expression. The expression must either be a table alias or evaluate to a single value of a data type with fields, such as a STRUCT.

The following query produces one output column for each column in the table groceries, aliased as g.

WITH groceries AS
  (SELECT "milk" AS dairy,
   "eggs" AS protein,
   "bread" AS grain)
SELECT g.*
FROM groceries AS g;

+-------+---------+-------+
| dairy | protein | grain |
+-------+---------+-------+
| milk  | eggs    | bread |
+-------+---------+-------+

More examples:

WITH locations AS
  (SELECT STRUCT("Seattle" AS city, "Washington" AS state) AS location
  UNION ALL
  SELECT STRUCT("Phoenix" AS city, "Arizona" AS state) AS location)
SELECT l.location.*
FROM locations l;

+---------+------------+
| city    | state      |
+---------+------------+
| Seattle | Washington |
| Phoenix | Arizona    |
+---------+------------+
WITH locations AS
  (SELECT ARRAY<STRUCT<city STRING, state STRING>>[("Seattle", "Washington"),
    ("Phoenix", "Arizona")] AS location)
SELECT l.LOCATION[offset(0)].*
FROM locations l;

+---------+------------+
| city    | state      |
+---------+------------+
| Seattle | Washington |
+---------+------------+

SELECT modifiers

You can modify the results returned from a SELECT query, as follows.

SELECT DISTINCT

A SELECT DISTINCT statement discards duplicate rows and returns only the remaining rows. SELECT DISTINCT cannot return columns of the following types:

  • STRUCT
  • ARRAY

SELECT * EXCEPT

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output.

WITH orders AS
  (SELECT 5 as order_id,
  "sprocket" as item_name,
  200 as quantity)
SELECT * EXCEPT (order_id)
FROM orders;

+-----------+----------+
| item_name | quantity |
+-----------+----------+
| sprocket  | 200      |
+-----------+----------+

SELECT * REPLACE

A SELECT * REPLACE statement specifies one or more expression AS identifier clauses. Each identifier must match a column name from the SELECT * statement. In the output column list, the column that matches the identifier in a REPLACE clause is replaced by the expression in that REPLACE clause.

A SELECT * REPLACE statement does not change the names or order of columns. However, it can change the value and the value type.

WITH orders AS
  (SELECT 5 as order_id,
  "sprocket" as item_name,
  200 as quantity)
SELECT * REPLACE ("widget" AS item_name)
FROM orders;

+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5        | widget    | 200      |
+----------+-----------+----------+

WITH orders AS
  (SELECT 5 as order_id,
  "sprocket" as item_name,
  200 as quantity)
SELECT * REPLACE (quantity/2 AS quantity)
FROM orders;

+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5        | sprocket  | 100      |
+----------+-----------+----------+

SELECT ALL

A SELECT ALL statement returns all rows, including duplicate rows. SELECT ALL is the default behavior of SELECT.

Using STRUCTs with SELECT

  • Queries that return a STRUCT at the root of the return type are not supported in Cloud Spanner APIs. For example, the following query is supported only as a subquery:

    SELECT STRUCT(1, 2) FROM Users;
    
  • Returning an array of structs is supported. For example, the following queries are supported in Cloud Spanner APIs:

    SELECT ARRAY(SELECT STRUCT(1 AS A, 2 AS B)) FROM Users;
    
    SELECT ARRAY(SELECT AS STRUCT 1 AS a, 2 AS b) FROM Users;
    
  • However, query shapes that can return an ARRAY<STRUCT<...>> typed NULL value or an ARRAY<STRUCT<...>> typed value with an element that is NULL are not supported in Cloud Spanner APIs, so the following query is supported only as a subquery:

    SELECT ARRAY(SELECT IF(STARTS_WITH(Users.username, "a"), NULL, STRUCT(1, 2)))
    FROM Users;
    

See Querying STRUCT elements in an ARRAY for more examples on how to query STRUCTs inside an ARRAY.

Also see notes about using STRUCTs in subqueries.

Aliases

See Using Aliases for information on syntax and visibility for SELECT list aliases.

FROM clause

from_item: {
    table_name [ table_hint_expr ] [ [ AS ] alias ] |
    join |
    ( query_expr ) [ table_hint_expr ] [ [ AS ] alias ] |
    field_path |
    { UNNEST( array_expression ) | UNNEST( array_path ) | array_path }
        [ table_hint_expr ] [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] |
    with_query_name [ table_hint_expr ] [ [ AS ] alias ]
}
table_hint_expr: '@{' table_hint_key = table_hint_value '}' table_hint_key: { FORCE_INDEX | GROUPBY_SCAN_OPTIMIZATION }

The FROM clause indicates the table or tables from which to retrieve rows, and specifies how to join those rows together to produce a single stream of rows for processing in the rest of the query.

table_name

The name of an existing table.

SELECT * FROM Roster;
Table hints

The following hints are supported for tables:

Hint key Possible values Description
FORCE_INDEX String. The name of an existing index in the database or _BASE_TABLE to use the base table rather than an index.
  • If set to the name of an index, use that index instead of the base table. If the index cannot provide all needed columns, perform a back join with the base table.
  • If set to the string _BASE_TABLE, use the base table for the index strategy instead of an index. Note that this is the only valid value when FORCE_INDEX is used in a statement hint expression.

Note: FORCE_INDEX is actually a directive, not a hint, which means an error is raised if the index does not exist.

GROUPBY_SCAN_OPTIMIZATION TRUE
FALSE

The group by scan optimization can make queries faster if they use GROUP BY or SELECT DISTINCT. It can be applied if the grouping keys can form a prefix of the underlying table or index key, and if the query requires only the first row from each group.

The optimization is applied if the optimizer estimates that it will make the query more efficient. The hint overrides that decision. If the hint is set to FALSE, the optimization is not considered. If the hint is set to TRUE, the optimization will be applied as long as it is legal to do so.

The following example shows how to use a secondary index when reading from a table, by appending an index directive of the form @{FORCE_INDEX=index_name} to the table name:

SELECT s.SingerId, s.FirstName, s.LastName, s.SingerInfo
FROM Singers@{FORCE_INDEX=SingersByFirstLastName} AS s
WHERE s.FirstName = "Catalina" AND s.LastName > "M";

You can include multiple indexes in a query, though only a single index is supported for each distinct table reference. Example:

SELECT s.SingerId, s.FirstName, s.LastName, s.SingerInfo, c.ConcertDate
FROM Singers@{FORCE_INDEX=SingersByFirstLastName} AS s JOIN
     Concerts@{FORCE_INDEX=ConcertsBySingerId} AS c ON s.SingerId = c.SingerId
WHERE s.FirstName = "Catalina" AND s.LastName > "M";

Read more about index directives in Secondary Indexes.

join

See JOIN Types.

select

( select ) [ [ AS ] alias ] is a table subquery.

field_path

In the FROM clause, field_path is any path that resolves to a field within a data type. field_path can go arbitrarily deep into a nested data structure.

Some examples of valid field_path values include:

SELECT * FROM T1 t1, t1.array_column;

SELECT * FROM T1 t1, t1.struct_column.array_field;

SELECT (SELECT ARRAY_AGG(c) FROM t1.array_column c) FROM T1 t1;

SELECT a.struct_field1 FROM T1 t1, t1.array_of_structs a;

SELECT (SELECT STRING_AGG(a.struct_field1) FROM t1.array_of_structs a) FROM T1 t1;

Field paths in the FROM clause must end in an array field. In addition, field paths cannot contain arrays before the end of the path. For example, the path array_column.some_array.some_array_field is invalid because it contains an array before the end of the path.

UNNEST

The UNNEST operator takes an ARRAY and returns a table, with one row for each element in the ARRAY. You can also use UNNEST outside of the FROM clause with the IN operator.

For input ARRAYs of most element types, the output of UNNEST generally has one column. This single column has an optional alias, which you can use to refer to the column elsewhere in the query. ARRAYS with these element types return multiple columns:

  • STRUCT

UNNEST destroys the order of elements in the input ARRAY. Use the optional WITH OFFSET clause to return a second column with the array element indexes (see below).

For an input ARRAY of STRUCTs, UNNEST returns a row for each STRUCT, with a separate column for each field in the STRUCT. The alias for each column is the name of the corresponding STRUCT field.

Example

SELECT *
FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]);

+---+-----+
| x | y   |
+---+-----+
| 3 | bar |
| 1 | foo |
+---+-----+

ARRAY unnesting can be either explicit or implicit. In explicit unnesting, array_expression must return an ARRAY value but does not need to resolve to an ARRAY, and the UNNEST keyword is required.

Example:

SELECT * FROM UNNEST ([1, 2, 3]);

In implicit unnesting, array_path must resolve to an ARRAY and the UNNEST keyword is optional.

Example:

SELECT x
FROM mytable AS t,
  t.struct_typed_column.array_typed_field1 AS x;

In this scenario, array_path can go arbitrarily deep into a data structure, but the last field must be ARRAY-typed. No previous field in the expression can be ARRAY-typed because it is not possible to extract a named field from an ARRAY.

UNNEST treats NULLs as follows:

  • NULL and empty arrays produces zero rows.
  • An array containing NULLs produces rows containing NULL values.

The optional WITH OFFSET clause returns a separate column containing the "offset" value (i.e. counting starts at zero) for each row produced by the UNNEST operation. This column has an optional alias; the default alias is offset.

Example:

SELECT * FROM UNNEST ( ) WITH OFFSET AS num;

See the Arrays topic for more ways to use UNNEST, including construction, flattening, and filtering.

with_query_name

The query names in a WITH clause (see WITH Clause) act like names of temporary tables that you can reference anywhere in the FROM clause. In the example below, subQ1 and subQ2 are with_query_names.

Example:

WITH
  subQ1 AS (SELECT * FROM Roster WHERE SchoolID = 52),
  subQ2 AS (SELECT SchoolID FROM subQ1)
SELECT DISTINCT * FROM subQ2;

The WITH clause hides any permanent tables with the same name for the duration of the query, unless you qualify the table name, for example:

db.Roster.

TABLESAMPLE operator

tablesample_type:
    TABLESAMPLE sample_method (sample_size percent_or_rows )

sample_method:
    { BERNOULLI | RESERVOIR }

sample_size:
    numeric_value_expression

percent_or_rows:
    { PERCENT | ROWS }

partition_by:
    PARTITION BY partition_expression [, ...]

Description

You can use the TABLESAMPLE operator to select a random sample of a data set. This operator is useful when working with tables that have large amounts of data and precise answers are not required.

  • sample_method: When using the TABLESAMPLE operator, you must specify the sampling algorithm to use:
    • BERNOULLI: Each row is independently selected with the probability given in the percent clause. As a result, you get approximately N * percent/100 rows.
    • RESERVOIR: Takes as parameter an actual sample size K (expressed as a number of rows). If the input is smaller than K, it outputs the entire input relation. If the input is larger than K, reservoir sampling outputs a sample of size exactly K, where any sample of size K is equally likely.
  • sample_size: The size of the sample.
  • percent_or_rows: The TABLESAMPLE operator requires that you choose either ROWS or PERCENT. If you choose PERCENT, the value must be between 0 and 100. If you choose ROWS, the value must be greater than or equal to 0.

Examples

The following examples illustrate the use of the TABLESAMPLE operator.

Select from a table using the RESERVOIR sampling method:

SELECT MessageId
FROM Messages TABLESAMPLE RESERVOIR (100 ROWS);

Select from a table using the BERNOULLI sampling method:

SELECT MessageId
FROM Messages TABLESAMPLE BERNOULLI (0.1 PERCENT);

Use TABLESAMPLE with a subquery:

SELECT Subject FROM
(SELECT MessageId, Subject FROM Messages WHERE ServerId="test")
TABLESAMPLE BERNOULLI(50 PERCENT)
WHERE MessageId > 3;

Use a TABLESAMPLE operation with a join to another table.

SELECT S.Subject
FROM
(SELECT MessageId, ThreadId FROM Messages WHERE ServerId="test") AS R
TABLESAMPLE RESERVOIR(5 ROWS),
Threads AS S
WHERE S.ServerId="test" AND R.ThreadId = S.ThreadId;

Aliases

See Using Aliases for information on syntax and visibility for FROM clause aliases.

JOIN types

join:
    from_item [ join_type ] [ join_method ] JOIN  [ join_hint_expr ] from_item
    [ ON bool_expression | USING ( join_column [, ...] ) ]

join_type:
    { INNER | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] }

join_method:
{ HASH }
join_hint_expr: '@{' join_hint_key = join_hint_value [, ...] '}' join_hint_key: { FORCE_JOIN_ORDER | JOIN_METHOD }

The JOIN clause merges two from_items so that the SELECT clause can query them as one source. The join_type and ON or USING clause (a "join condition") specify how to combine and discard rows from the two from_items to form a single source.

All JOIN clauses require a join_type.

A JOIN clause requires a join condition unless one of the following conditions is true:

  • join_type is CROSS.
  • One or both of the from_items is not a table, e.g. an array_path or field_path.

Join hints

The following hints are supported for JOIN:

Hint key Possible values Description
FORCE_JOIN_ORDER TRUE
FALSE (default)
If set to true, use the join order that's specified in the query.
JOIN_METHOD HASH_JOIN
APPLY_JOIN
When implementing a logical join, choose a specific alternative to use for the underlying join method. Learn more in Join methods.
To use a HASH join, either use HASH JOIN or JOIN@{JOIN_METHOD=HASH_JOIN}, but not both.
HASH_JOIN_BUILD_SIDE BUILD_LEFT
BUILD_RIGHT
Specifies which side of the hash join is used as the build side. Can only be used with JOIN_METHOD=HASH_JOIN
BATCH_MODE TRUE (default)
FALSE
Used to disable batched apply join in favor of row-at-a-time apply join. Can only be used with JOIN_METHOD=APPLY_JOIN.

Join methods

Join methods are specific implementations of the various logical join types. Some join methods are available only for certain join types. The choice of which join method to use depends on the specifics of your query and of the data being queried. The best way to figure out if a particular join method helps with the performance of your query is to try the method and view the resulting query execution plan. See Query Execution Operators, in particular the sections about the apply and hash join operators, for more details.

Join Method Description Operands
HASH_JOIN The hash join operator builds a hash table out of one side (the build side), and probes in the hash table for all the elements in the other side (the probe side). Different variants are used for various join types. View the query execution plan for your query to see which variant is used. Read more about the Hash join operator.
APPLY_JOIN The apply join operator gets each item from one side (the input side), and evaluates the subquery on other side (the map side) using the values of the item from the input side. Different variants are used for various join types. Cross apply is used for inner join, and outer apply is used for left joins. Read more about the Cross apply and Outer apply operators.

[INNER] JOIN

An INNER JOIN, or simply JOIN, effectively calculates the Cartesian product of the two from_items and discards all rows that do not meet the join condition. "Effectively" means that it is possible to implement an INNER JOIN without actually calculating the Cartesian product.

FROM A INNER JOIN B ON A.w = B.y

Table A       Table B       Result
+-------+     +-------+     +---------------+
| w | x |  *  | y | z |  =  | w | x | y | z |
+-------+     +-------+     +---------------+
| 1 | a |     | 2 | k |     | 2 | b | 2 | k |
| 2 | b |     | 3 | m |     | 3 | c | 3 | m |
| 3 | c |     | 3 | n |     | 3 | c | 3 | n |
| 3 | d |     | 4 | p |     | 3 | d | 3 | m |
+-------+     +-------+     | 3 | d | 3 | n |
                            +---------------+
FROM A INNER JOIN B USING (x)

Table A       Table B       Result
+-------+     +-------+     +-----------+
| x | y |  *  | x | z |  =  | x | y | z |
+-------+     +-------+     +-----------+
| 1 | a |     | 2 | k |     | 2 | b | k |
| 2 | b |     | 3 | m |     | 3 | c | m |
| 3 | c |     | 3 | n |     | 3 | c | n |
| 3 | d |     | 4 | p |     | 3 | d | m |
+-------+     +-------+     | 3 | d | n |
                            +-----------+

Example

This query performs an INNER JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
+---------------------------+

CROSS JOIN

CROSS JOIN returns the Cartesian product of the two from_items. In other words, it combines each row from the first from_item with each row from the second from_item.

If the rows of the two from_items are independent, then the result has M * N rows, given M rows in one from_item and N in the other. Note that this still holds for the case when either from_item has zero rows.

FROM A CROSS JOIN B

Table A       Table B       Result
+-------+     +-------+     +---------------+
| w | x |  *  | y | z |  =  | w | x | y | z |
+-------+     +-------+     +---------------+
| 1 | a |     | 2 | c |     | 1 | a | 2 | c |
| 2 | b |     | 3 | d |     | 1 | a | 3 | d |
+-------+     +-------+     | 2 | b | 2 | c |
                            | 2 | b | 3 | d |
                            +---------------+

You can use correlated CROSS JOINs to flatten ARRAY columns. In this case, the rows of the second from_item vary for each row of the first from_item.

FROM A CROSS JOIN A.y

Table A                    Result
+-------------------+      +-----------+
| w | x | y         |  ->  | w | x | y |
+-------------------+      +-----------+
| 1 | a | [P, Q]    |      | 1 | a | P |
| 2 | b | [R, S, T] |      | 1 | a | Q |
+-------------------+      | 2 | b | R |
                           | 2 | b | S |
                           | 2 | b | T |
                           +-----------+

CROSS JOINs can be written explicitly like this:

FROM a CROSS JOIN b

Or implicitly as a comma cross join like this:

FROM a, b

You cannot write comma cross joins inside parentheses:

FROM a CROSS JOIN (b, c)  // INVALID

See Sequences of JOINs for details on how a comma cross join behaves in a sequence of JOINs.

Examples

This query performs an explicit CROSS JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster CROSS JOIN TeamMascot;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Adams      | Knights      |
| Adams      | Lakers       |
| Adams      | Mustangs     |
| Buchanan   | Jaguars      |
| Buchanan   | Knights      |
| Buchanan   | Lakers       |
| Buchanan   | Mustangs     |
| ...                       |
+---------------------------+

This query performs a comma cross join that produces the same results as the explicit CROSS JOIN above:

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster, TeamMascot;

FULL [OUTER] JOIN

A FULL OUTER JOIN (or simply FULL JOIN) returns all fields for all rows in both from_items that meet the join condition.

FULL indicates that all rows from both from_items are returned, even if they do not meet the join condition.

OUTER indicates that if a given row from one from_item does not join to any row in the other from_item, the row will return with NULLs for all columns from the other from_item.

FROM A FULL OUTER JOIN B ON A.w = B.y

Table A       Table B       Result
+-------+     +-------+     +---------------------------+
| w | x |  *  | y | z |  =  | w    | x    | y    | z    |
+-------+     +-------+     +---------------------------+
| 1 | a |     | 2 | k |     | 1    | a    | NULL | NULL |
| 2 | b |     | 3 | m |     | 2    | b    | 2    | k    |
| 3 | c |     | 3 | n |     | 3    | c    | 3    | m    |
| 3 | d |     | 4 | p |     | 3    | c    | 3    | n    |
+-------+     +-------+     | 3    | d    | 3    | m    |
                            | 3    | d    | 3    | n    |
                            | NULL | NULL | 4    | p    |
                            +---------------------------+
FROM A FULL OUTER JOIN B USING (x)

Table A       Table B       Result
+-------+     +-------+     +--------------------+
| x | y |  *  | x | z |  =  | x    | y    | z    |
+-------+     +-------+     +--------------------+
| 1 | a |     | 2 | k |     | 1    | a    | NULL |
| 2 | b |     | 3 | m |     | 2    | b    | k    |
| 3 | c |     | 3 | n |     | 3    | c    | m    |
| 3 | d |     | 4 | p |     | 3    | c    | n    |
+-------+     +-------+     | 3    | d    | m    |
                            | 3    | d    | n    |
                            | 4    | NULL | p    |
                            +--------------------+

Example

This query performs a FULL JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster FULL JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
| Eisenhower | NULL         |
| NULL       | Mustangs     |
+---------------------------+

LEFT [OUTER] JOIN

The result of a LEFT OUTER JOIN (or simply LEFT JOIN) for two from_items always retains all rows of the left from_item in the JOIN clause, even if no rows in the right from_item satisfy the join predicate.

LEFT indicates that all rows from the left from_item are returned; if a given row from the left from_item does not join to any row in the right from_item, the row will return with NULLs for all columns from the right from_item. Rows from the right from_item that do not join to any row in the left from_item are discarded.

FROM A LEFT OUTER JOIN B ON A.w = B.y

Table A       Table B       Result
+-------+     +-------+     +---------------------------+
| w | x |  *  | y | z |  =  | w    | x    | y    | z    |
+-------+     +-------+     +---------------------------+
| 1 | a |     | 2 | k |     | 1    | a    | NULL | NULL |
| 2 | b |     | 3 | m |     | 2    | b    | 2    | k    |
| 3 | c |     | 3 | n |     | 3    | c    | 3    | m    |
| 3 | d |     | 4 | p |     | 3    | c    | 3    | n    |
+-------+     +-------+     | 3    | d    | 3    | m    |
                            | 3    | d    | 3    | n    |
                            +---------------------------+
FROM A LEFT OUTER JOIN B USING (x)

Table A       Table B       Result
+-------+     +-------+     +--------------------+
| x | y |  *  | x | z |  =  | x    | y    | z    |
+-------+     +-------+     +--------------------+
| 1 | a |     | 2 | k |     | 1    | a    | NULL |
| 2 | b |     | 3 | m |     | 2    | b    | k    |
| 3 | c |     | 3 | n |     | 3    | c    | m    |
| 3 | d |     | 4 | p |     | 3    | c    | n    |
+-------+     +-------+     | 3    | d    | m    |
                            | 3    | d    | n    |
                            +--------------------+

Example

This query performs a LEFT JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster LEFT JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
| Eisenhower | NULL         |
+---------------------------+

RIGHT [OUTER] JOIN

The result of a RIGHT OUTER JOIN (or simply RIGHT JOIN) is similar and symmetric to that of LEFT OUTER JOIN.

FROM A RIGHT OUTER JOIN B ON A.w = B.y

Table A       Table B       Result
+-------+     +-------+     +---------------------------+
| w | x |  *  | y | z |  =  | w    | x    | y    | z    |
+-------+     +-------+     +---------------------------+
| 1 | a |     | 2 | k |     | 2    | b    | 2    | k    |
| 2 | b |     | 3 | m |     | 3    | c    | 3    | m    |
| 3 | c |     | 3 | n |     | 3    | c    | 3    | n    |
| 3 | d |     | 4 | p |     | 3    | d    | 3    | m    |
+-------+     +-------+     | 3    | d    | 3    | n    |
                            | NULL | NULL | 4    | p    |
                            +---------------------------+
FROM A RIGHT OUTER JOIN B USING (x)

Table A       Table B       Result
+-------+     +-------+     +--------------------+
| x | y |  *  | x | z |  =  | x    | y    | z    |
+-------+     +-------+     +--------------------+
| 1 | a |     | 2 | k |     | 2    | b    | k    |
| 2 | b |     | 3 | m |     | 3    | c    | m    |
| 3 | c |     | 3 | n |     | 3    | c    | n    |
| 3 | d |     | 4 | p |     | 3    | d    | m    |
+-------+     +-------+     | 3    | d    | n    |
                            | 4    | NULL | p    |
                            +--------------------+

Example

This query performs a RIGHT JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster RIGHT JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
| NULL       | Mustangs     |
+---------------------------+

ON clause

The ON clause contains a bool_expression. A combined row (the result of joining two rows) meets the join condition if bool_expression returns TRUE.

FROM A JOIN B ON A.x = B.x

Table A   Table B   Result (A.x, B.x)
+---+     +---+     +-------+
| x |  *  | x |  =  | x | x |
+---+     +---+     +-------+
| 1 |     | 2 |     | 2 | 2 |
| 2 |     | 3 |     | 3 | 3 |
| 3 |     | 4 |     +-------+
+---+     +---+

Example

This query performs an INNER JOIN on the Roster and TeamMascot table.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
+---------------------------+

USING clause

The USING clause requires a column_list of one or more columns which occur in both input tables. It performs an equality comparison on that column, and the rows meet the join condition if the equality comparison returns TRUE.

FROM A JOIN B USING (x)

Table A   Table B   Result
+---+     +---+     +---+
| x |  *  | x |  =  | x |
+---+     +---+     +---+
| 1 |     | 2 |     | 2 |
| 2 |     | 3 |     | 3 |
| 3 |     | 4 |     +---+
+---+     +---+

Example

This query performs an INNER JOIN on the Roster and TeamMascot table.

This statement returns the rows from Roster and TeamMascot where Roster.SchooldID is the same as TeamMascot.SchooldID. The results include a single SchooldID column.

SELECT * FROM Roster INNER JOIN TeamMascot USING (SchoolID);

+----------------------------------------+
| SchoolID   | LastName   | Mascot       |
+----------------------------------------+
| 50         | Adams      | Jaguars      |
| 52         | Buchanan   | Lakers       |
| 52         | Coolidge   | Lakers       |
| 51         | Davis      | Knights      |
+----------------------------------------+

ON and USING equivalency

The ON and USING keywords are not equivalent, but they are similar. ON returns multiple columns, and USING returns one.

FROM A JOIN B ON A.x = B.x
FROM A JOIN B USING (x)

Table A   Table B   Result ON     Result USING
+---+     +---+     +-------+     +---+
| x |  *  | x |  =  | x | x |     | x |
+---+     +---+     +-------+     +---+
| 1 |     | 2 |     | 2 | 2 |     | 2 |
| 2 |     | 3 |     | 3 | 3 |     | 3 |
| 3 |     | 4 |     +-------+     +---+
+---+     +---+

Although ON and USING are not equivalent, they can return the same results if you specify the columns you want to return.

SELECT x FROM A JOIN B USING (x);
SELECT A.x FROM A JOIN B ON A.x = B.x;

Table A   Table B   Result
+---+     +---+     +---+
| x |  *  | x |  =  | x |
+---+     +---+     +---+
| 1 |     | 2 |     | 2 |
| 2 |     | 3 |     | 3 |
| 3 |     | 4 |     +---+
+---+     +---+

Sequences of JOINs

The FROM clause can contain multiple JOIN clauses in a sequence. JOINs are bound from left to right. For example:

FROM A JOIN B USING (x) JOIN C USING (x)

-- A JOIN B USING (x)        = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2                  = return value

You can also insert parentheses to group JOINs:

FROM ( (A JOIN B USING (x)) JOIN C USING (x) )

-- A JOIN B USING (x)        = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2                  = return value

With parentheses, you can group JOINs so that they are bound in a different order:

FROM ( A JOIN (B JOIN C USING (x)) USING (x) )

-- B JOIN C USING (x)       = result_1
-- A JOIN result_1          = result_2
-- result_2                 = return value

When comma cross joins are present in a query with a sequence of JOINs, they group from left to right like other JOIN types:

FROM A JOIN B USING (x) JOIN C USING (x), D

-- A JOIN B USING (x)        = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2 CROSS JOIN D     = return value

There cannot be a RIGHT JOIN or FULL JOIN after a comma join:

FROM A, B RIGHT JOIN C ON TRUE // INVALID
FROM A, B FULL JOIN C ON TRUE  // INVALID
FROM A, B JOIN C ON TRUE       // VALID

WHERE clause

WHERE bool_expression

The WHERE clause filters out rows by evaluating each row against bool_expression, and discards all rows that do not return TRUE (that is, rows that return FALSE or NULL).

Example:

SELECT * FROM Roster
WHERE SchoolID = 52;

The bool_expression can contain multiple sub-conditions.

Example:

SELECT * FROM Roster
WHERE STARTS_WITH(LastName, "Mc") OR STARTS_WITH(LastName, "Mac");

You cannot reference column aliases from the SELECT list in the WHERE clause.

Expressions in an INNER JOIN have an equivalent expression in the WHERE clause. For example, a query using INNER JOIN and ON has an equivalent expression using CROSS JOIN and WHERE.

Example - this query:

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster INNER JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;

is equivalent to:

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster CROSS JOIN TeamMascot
WHERE Roster.SchoolID = TeamMascot.SchoolID;

GROUP BY clause

GROUP BY expression [, ...]

The GROUP BY clause groups together rows in a table with non-distinct values for the expression in the GROUP BY clause. For multiple rows in the source table with non-distinct values for expression, the GROUP BY clause produces a single combined row. GROUP BY is commonly used when aggregate functions are present in the SELECT list, or to eliminate redundancy in the output. The data type of expression must be groupable.

Example:

SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY LastName;

The GROUP BY clause can refer to expression names in the SELECT list. The GROUP BY clause also allows ordinal references to expressions in the SELECT list using integer values. 1 refers to the first expression in the SELECT list, 2 the second, and so forth. The expression list can combine ordinals and expression names.

Example:

SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY LastName, FirstName;

The query above is equivalent to:

SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY 2, FirstName;

GROUP BY clauses may also refer to aliases. If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause.

Example:

SELECT SUM(PointsScored), LastName as last_name
FROM PlayerStats
GROUP BY last_name;

HAVING clause

HAVING bool_expression

The HAVING clause is similar to the WHERE clause: it filters out rows that do not return TRUE when they are evaluated against the bool_expression.

As with the WHERE clause, the bool_expression can be any expression that returns a boolean, and can contain multiple sub-conditions.

The HAVING clause differs from the WHERE clause in that:

  • The HAVING clause requires GROUP BY or aggregation to be present in the query.
  • The HAVING clause occurs after GROUP BY and aggregation, and before ORDER BY. This means that the HAVING clause is evaluated once for every aggregated row in the result set. This differs from the WHERE clause, which is evaluated before GROUP BY and aggregation.

The HAVING clause can reference columns available via the FROM clause, as well as SELECT list aliases. Expressions referenced in the HAVING clause must either appear in the GROUP BY clause or they must be the result of an aggregate function:

SELECT LastName
FROM Roster
GROUP BY LastName
HAVING SUM(PointsScored) > 15;

If a query contains aliases in the SELECT clause, those aliases override names in a FROM clause.

SELECT LastName, SUM(PointsScored) AS ps
FROM Roster
GROUP BY LastName
HAVING ps > 0;

Mandatory aggregation

Aggregation does not have to be present in the HAVING clause itself, but aggregation must be present in at least one of the following forms:

Aggregation function in the SELECT list.

SELECT LastName, SUM(PointsScored) AS total
FROM PlayerStats
GROUP BY LastName
HAVING total > 15;

Aggregation function in the 'HAVING' clause.

SELECT LastName
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;

Aggregation in both the SELECT list and HAVING clause.

When aggregation functions are present in both the SELECT list and HAVING clause, the aggregation functions and the columns they reference do not need to be the same. In the example below, the two aggregation functions, COUNT() and SUM(), are different and also use different columns.

SELECT LastName, COUNT(*)
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;

ORDER BY clause

ORDER BY expression
  [{ ASC | DESC }]
  [, ...]

The ORDER BY clause specifies a column or expression as the sort criterion for the result set. If an ORDER BY clause is not present, the order of the results of a query is not defined. Column aliases from a FROM clause or SELECT list are allowed. If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause.

Optional Clauses

  • ASC | DESC: Sort the results in ascending or descending order of expression values. ASC is the default value.

Examples

Use the default sort order (ascending).

SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
      SELECT 9, true)
ORDER BY x;
+------+-------+
| x    | y     |
+------+-------+
| 1    | true  |
| 9    | true  |
+------+-------+

Use descending sort order.

SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
      SELECT 9, true)
ORDER BY x DESC;
+------+-------+
| x    | y     |
+------+-------+
| 9    | true  |
| 1    | true  |
+------+-------+

It is possible to order by multiple columns. In the example below, the result set is ordered first by SchoolID and then by LastName:

SELECT LastName, PointsScored, OpponentID
FROM PlayerStats
ORDER BY SchoolID, LastName;

The following rules apply when ordering values:

  • NULLs: In the context of the ORDER BY clause, NULLs are the minimum possible value; that is, NULLs appear first in ASC sorts and last in DESC sorts.
  • Floating point data types: see Floating Point Semantics on ordering and grouping.

When used in conjunction with set operators, the ORDER BY clause applies to the result set of the entire query; it does not apply only to the closest SELECT statement. For this reason, it can be helpful (though it is not required) to use parentheses to show the scope of the ORDER BY.

This query without parentheses:

SELECT * FROM Roster
UNION ALL
SELECT * FROM TeamMascot
ORDER BY SchoolID;

is equivalent to this query with parentheses:

( SELECT * FROM Roster
  UNION ALL
  SELECT * FROM TeamMascot )
ORDER BY SchoolID;

but is not equivalent to this query, where the ORDER BY clause applies only to the second SELECT statement:

SELECT * FROM Roster
UNION ALL
( SELECT * FROM TeamMascot
  ORDER BY SchoolID );

You can also use integer literals as column references in ORDER BY clauses. An integer literal becomes an ordinal (for example, counting starts at 1) into the SELECT list.

Example - the following two queries are equivalent:

SELECT SUM(PointsScored), LastName
FROM PlayerStats
ORDER BY LastName;
SELECT SUM(PointsScored), LastName
FROM PlayerStats
ORDER BY 2;

COLLATE

You can use the COLLATE clause to refine how a data is ordered from an ORDER BY clause. Collation refers to a set of rules that determine how STRINGs are compared according to the conventions and standards of a particular language, region or country. These rules might define the correct character sequence, with options for specifying case-insensitivity.

You add collation to your statement as follows:

SELECT ...
FROM ...
ORDER BY value COLLATE collation_string

A collation_string contains a collation_name and can have an optional collation_attribute as a suffix, separated by a colon. The collation_string is a literal or a parameter. Usually, this name is two letters that represent the language optionally followed by an underscore and two letters that represent the region—for example, en_US. These names are defined by the Common Locale Data Repository (CLDR). A statement can also have a collation_name of unicode. This value means that the statement should return data using the default unicode collation.

In addition to the collation_name, a collation_string can have an optional collation_attribute as a suffix, separated by a colon. This attribute specifies if the data comparisons should be case sensitive. Allowed values are cs, for case sensitive, and ci, for case insensitive. If a collation_attribute is not supplied, the CLDR defaults are used.

Examples

Collate results using English - Canada:

SELECT Place
FROM Locations
ORDER BY Place COLLATE "en_CA"

Collate results using a parameter:

#@collate_param = "arg_EG"
SELECT Place
FROM Locations
ORDER BY Place COLLATE @collate_param

Using multiple COLLATE clauses in a statement:

SELECT APlace, BPlace, CPlace
FROM Locations
ORDER BY APlace COLLATE "en_US" ASC,
         BPlace COLLATE "ar_EG" DESC,
         CPlace COLLATE "en" DESC

Case insensitive collation:

SELECT Place
FROM Locations
ORDER BY Place COLLATE "en_US:ci"

Default Unicode case-insensitive collation:

SELECT Place
FROM Locations
ORDER BY Place COLLATE "unicode:ci"

Set operators

UNION { ALL | DISTINCT } | INTERSECT { ALL | DISTINCT } | EXCEPT { ALL | DISTINCT }

Set operators combine results from two or more input queries into a single result set. You must specify ALL or DISTINCT; if you specify ALL, then all rows are retained. If DISTINCT is specified, duplicate rows are discarded.

If a given row R appears exactly m times in the first input query and n times in the second input query (m >= 0, n >= 0):

  • For UNION ALL, R appears exactly m + n times in the result.
  • For INTERSECT ALL, R will appear exactly MIN(m, n) in the result.
  • For EXCEPT ALL, R appears exactly MAX(m - n, 0) in the result.
  • For UNION DISTINCT, the DISTINCT is computed after the UNION is computed, so R appears exactly one time.
  • For INTERSECT DISTINCT, the DISTINCT is computed after the result above is computed.
  • For EXCEPT DISTINCT, row R appears once in the output if m > 0 and n = 0.
  • If there are more than two input queries, the above operations generalize and the output is the same as if the inputs were combined incrementally from left to right.

The following rules apply:

  • For set operations other than UNION ALL, all column types must support equality comparison.
  • The input queries on each side of the operator must return the same number of columns.
  • The operators pair the columns returned by each input query according to the columns' positions in their respective SELECT lists. That is, the first column in the first input query is paired with the first column in the second input query.
  • The result set always uses the column names from the first input query.
  • The result set always uses the supertypes of input types in corresponding columns, so paired columns must also have either the same data type or a common supertype.
  • You must use parentheses to separate different set operations; for this purpose, set operations such as UNION ALL and UNION DISTINCT are different. If the statement only repeats the same set operation, parentheses are not necessary.

Examples:

query1 UNION ALL (query2 UNION DISTINCT query3)
query1 UNION ALL query2 UNION ALL query3

Invalid:

query1 UNION ALL query2 UNION DISTINCT query3
query1 UNION ALL query2 INTERSECT ALL query3;  // INVALID.

UNION

The UNION operator combines the result sets of two or more input queries by pairing columns from the result set of each query and vertically concatenating them.

INTERSECT

The INTERSECT operator returns rows that are found in the result sets of both the left and right input queries. Unlike EXCEPT, the positioning of the input queries (to the left versus right of the INTERSECT operator) does not matter.

EXCEPT

The EXCEPT operator returns rows from the left input query that are not present in the right input query.

Example:

SELECT * FROM UNNEST(ARRAY<int64>[1, 2, 3]) AS number
EXCEPT DISTINCT SELECT 1;

+--------+
| number |
+--------+
| 2      |
| 3      |
+--------+

LIMIT and OFFSET clauses

LIMIT count [ OFFSET skip_rows ]

LIMIT specifies a non-negative count of type INT64, and no more than count rows will be returned. LIMIT 0 returns 0 rows.

If there is a set operation, LIMIT is applied after the set operation is evaluated.

OFFSET specifies a non-negative number of rows to skip before applying LIMIT. skip_rows is of type INT64.

These clauses accept only literal or parameter values. The rows that are returned by LIMIT and OFFSET are unspecified unless these operators are used after ORDER BY.

Examples:

SELECT *
FROM UNNEST(ARRAY<STRING>['a', 'b', 'c', 'd', 'e']) AS letter
ORDER BY letter ASC LIMIT 2

+---------+
| letter  |
+---------+
| a       |
| b       |
+---------+
SELECT *
FROM UNNEST(ARRAY<STRING>['a', 'b', 'c', 'd', 'e']) AS letter
ORDER BY letter ASC LIMIT 3 OFFSET 1

+---------+
| letter  |
+---------+
| b       |
| c       |
| d       |
+---------+

WITH clause

The WITH clause binds the results of one or more named subqueries to temporary table names. Each introduced table name is visible in subsequent SELECT expressions within the same query expression. This includes the following kinds of SELECT expressions:

  • Any SELECT expressions in subsequent WITH bindings
  • Top level SELECT expressions in the query expression on both sides of a set operator such as UNION
  • SELECT expressions inside subqueries within the same query expression

Example:

WITH subQ1 AS (SELECT SchoolID FROM Roster),
     subQ2 AS (SELECT OpponentID FROM PlayerStats)
SELECT * FROM subQ1
UNION ALL
SELECT * FROM subQ2;

WITH is not supported in a subquery. This returns an error:

SELECT account
FROM (
  WITH result AS (SELECT * FROM NPCs)
  SELECT *
  FROM result);

WITH RECURSIVE is not supported.

WITH clause is not supported in DML statements.

Temporary tables defined by the WITH clause are stored in memory. Cloud Spanner SQL dynamically allocates memory for all temporary tables created by a query. If the available resources are not sufficient then the query will fail.

Using aliases

An alias is a temporary name given to a table, column, or expression present in a query. You can introduce explicit aliases in the SELECT list or FROM clause, or Cloud Spanner SQL will infer an implicit alias for some expressions. Expressions with neither an explicit nor implicit alias are anonymous and the query cannot reference them by name.

Explicit aliases

You can introduce explicit aliases in either the FROM clause or the SELECT list.

In a FROM clause, you can introduce explicit aliases for any item, including tables, arrays, subqueries, and UNNEST clauses, using [AS] alias. The AS keyword is optional.

Example:

SELECT s.FirstName, s2.SongName
FROM Singers AS s, (SELECT * FROM Songs) AS s2;

You can introduce explicit aliases for any expression in the SELECT list using [AS] alias. The AS keyword is optional.

Example:

SELECT s.FirstName AS name, LOWER(s.FirstName) AS lname
FROM Singers s;

Implicit aliases

In the SELECT list, if there is an expression that does not have an explicit alias, Cloud Spanner SQL assigns an implicit alias according to the following rules. There can be multiple columns with the same alias in the SELECT list.

  • For identifiers, the alias is the identifier. For example, SELECT abc implies AS abc.
  • For path expressions, the alias is the last identifier in the path. For example, SELECT abc.def.ghi implies AS ghi.
  • For field access using the "dot" member field access operator, the alias is the field name. For example, SELECT (struct_function()).fname implies AS fname.

In all other cases, there is no implicit alias, so the column is anonymous and cannot be referenced by name. The data from that column will still be returned and the displayed query results may have a generated label for that column, but the label cannot be used like an alias.

In a FROM clause, from_items are not required to have an alias. The following rules apply:

  • If there is an expression that does not have an explicit alias, Cloud Spanner SQL assigns an implicit alias in these cases:
    • For identifiers, the alias is the identifier. For example, FROM abc implies AS abc.
    • For path expressions, the alias is the last identifier in the path. For example, FROM abc.def.ghi implies AS ghi
    • The column produced using WITH OFFSET has the implicit alias offset.
  • Table subqueries do not have implicit aliases.
  • FROM UNNEST(x) does not have an implicit alias.

Alias visibility

After you introduce an explicit alias in a query, there are restrictions on where else in the query you can reference that alias. These restrictions on alias visibility are the result of Cloud Spanner SQL's name scoping rules.

Visibility in the FROM clause

Cloud Spanner SQL processes aliases in a FROM clause from left to right, and aliases are visible only to subsequent path expressions in a FROM clause.

Example:

Assume the Singers table had a Concerts column of ARRAY type.

SELECT FirstName
FROM Singers AS s, s.Concerts;

Invalid:

SELECT FirstName
FROM s.Concerts, Singers AS s;  // INVALID.

FROM clause aliases are not visible to subqueries in the same FROM clause. Subqueries in a FROM clause cannot contain correlated references to other tables in the same FROM clause.

Invalid:

SELECT FirstName
FROM Singers AS s, (SELECT (2020 - ReleaseDate) FROM s)  // INVALID.

You can use any column name from a table in the FROM as an alias anywhere in the query, with or without qualification with the table name.

Example:

SELECT FirstName, s.ReleaseDate
FROM Singers s WHERE ReleaseDate = 1975;

If the FROM clause contains an explicit alias, you must use the explicit alias instead of the implicit alias for the remainder of the query (see Implicit Aliases). A table alias is useful for brevity or to eliminate ambiguity in cases such as self-joins, where the same table is scanned multiple times during query processing.

Example:

SELECT * FROM Singers as s, Songs as s2
ORDER BY s.LastName

Invalid — ORDER BY does not use the table alias:

SELECT * FROM Singers as s, Songs as s2
ORDER BY Singers.LastName;  // INVALID.

Visibility in the SELECT list

Aliases in the SELECT list are visible only to the following clauses:

  • GROUP BY clause
  • ORDER BY clause
  • HAVING clause

Example:

SELECT LastName AS last, SingerID
FROM Singers
ORDER BY last;

Visibility in the GROUP BY, ORDER BY, and HAVING clauses

These three clauses, GROUP BY, ORDER BY, and HAVING, can refer to only the following values:

  • Tables in the FROM clause and any of their columns.
  • Aliases from the SELECT list.

GROUP BY and ORDER BY can also refer to a third group:

  • Integer literals, which refer to items in the SELECT list. The integer 1 refers to the first item in the SELECT list, 2 refers to the second item, etc.

Example:

SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY 1
ORDER BY 2 DESC;

The previous query is equivalent to:

SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY sid
ORDER BY s2id DESC;

Duplicate aliases

A SELECT list or subquery containing multiple explicit or implicit aliases of the same name is allowed, as long as the alias name is not referenced elsewhere in the query, since the reference would be ambiguous.

Example:

SELECT 1 AS a, 2 AS a;

+---+---+
| a | a |
+---+---+
| 1 | 2 |
+---+---+

Ambiguous aliases

Cloud Spanner SQL provides an error if accessing a name is ambiguous, meaning it can resolve to more than one unique object in the query or in a table schema, including the schema of a destination table.

Examples:

This query contains column names that conflict between tables, since both Singers and Songs have a column named SingerID:

SELECT SingerID
FROM Singers, Songs;

This query contains aliases that are ambiguous in the GROUP BY clause because they are duplicated in the SELECT list:

SELECT FirstName AS name, LastName AS name,
FROM Singers
GROUP BY name;

This query contains aliases that are ambiguous in the SELECT list and FROM clause because they share the same name. Assume table has columns x, y, and z. z is of type STRUCT and has fields v, w, and x.

Example:

SELECT x, z AS T
FROM table AS T
GROUP BY T.x;

The alias T is ambiguous and will produce an error because T.x in the GROUP BY clause could refer to either table.x or table.z.x.

A name is not ambiguous in GROUP BY, ORDER BY or HAVING if it is both a column name and a SELECT list alias, as long as the name resolves to the same underlying object.

Example:

SELECT LastName, BirthYear AS BirthYear
FROM Singers
GROUP BY BirthYear;

The alias BirthYear is not ambiguous because it resolves to the same underlying column, Singers.BirthYear.

Implicit aliases

In the SELECT list, if there is an expression that does not have an explicit alias, Cloud Spanner SQL assigns an implicit alias according to the following rules. There can be multiple columns with the same alias in the SELECT list.

  • For identifiers, the alias is the identifier. For example, SELECT abc implies AS abc.
  • For path expressions, the alias is the last identifier in the path. For example, SELECT abc.def.ghi implies AS ghi.
  • For field access using the "dot" member field access operator, the alias is the field name. For example, SELECT (struct_function()).fname implies AS fname.

In all other cases, there is no implicit alias, so the column is anonymous and cannot be referenced by name. The data from that column will still be returned and the displayed query results may have a generated label for that column, but the label cannot be used like an alias.

In a FROM clause, from_items are not required to have an alias. The following rules apply:

  • If there is an expression that does not have an explicit alias, Cloud Spanner SQL assigns an implicit alias in these cases:
    • For identifiers, the alias is the identifier. For example, FROM abc implies AS abc.
    • For path expressions, the alias is the last identifier in the path. For example, FROM abc.def.ghi implies AS ghi
    • The column produced using WITH OFFSET has the implicit alias offset.
  • Table subqueries do not have implicit aliases.
  • FROM UNNEST(x) does not have an implicit alias.

Range variables

In Cloud Spanner SQL, a range variable is a table expression alias in the FROM clause. Sometimes a range variable is known as a table alias. A range variable lets you reference rows being scanned from a table expression. A table expression represents an item in the FROM clause that returns a table. Common items that this expression can represent include tables, subqueries, joins, and parenthesized joins.

In general, a range variable provides a reference to the rows of a table expression. A range variable can be used to qualify a column reference and unambiguously identify the related table, for example range_variable.column_1.

When referencing a range variable on its own without a specified column suffix, the result of a table expression is the row type of the related table. Value tables have explicit row types, so for range variables related to value tables, the result type is the value table's row type. Other tables do not have explicit row types, and for those tables, the range variable type is a dynamically defined STRUCT that includes all of the columns in the table.

Examples

In these examples, the WITH clause is used to emulate a temporary table called Grid. This table has columns x and y. A range variable called Coordinate refers to the current row as the table is scanned. Coordinate can be used to access the entire row or columns in the row.

The following example selects column x from range variable Coordinate, which in effect selects column x from table Grid.

WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate.x FROM Grid AS Coordinate;

+---+
| x |
+---+
| 1 |
+---+

The following example selects all columns from range variable Coordinate, which in effect selects all columns from table Grid.

WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate.* FROM Grid AS Coordinate;

+---+---+
| x | y |
+---+---+
| 1 | 2 |
+---+---+

The following example selects the range variable Coordinate, which is a reference to rows in table Grid. Since Grid is not a value table, the result type of Coordinate is a STRUCT that contains all the columns from Grid.

WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate FROM Grid AS Coordinate;

+--------------+
| Coordinate   |
+--------------+
| {x: 1, y: 2} |
+--------------+

Appendix A: examples with sample data

These examples include statements which perform queries on the Roster and TeamMascot, and PlayerStats tables.

GROUP BY clause

Example:

SELECT LastName, SUM(PointsScored)
FROM PlayerStats
GROUP BY LastName;
LastName SUM
Adams 7
Buchanan 13
Coolidge 1

UNION

The UNION operator combines the result sets of two or more SELECT statements by pairing columns from the result set of each SELECT statement and vertically concatenating them.

Example:

SELECT Mascot AS X, SchoolID AS Y
FROM TeamMascot
UNION ALL
SELECT LastName, PointsScored
FROM PlayerStats;

Results:

X Y
Jaguars 50
Knights 51
Lakers 52
Mustangs 53
Adams 3
Buchanan 0
Coolidge 1
Adams 4
Buchanan 13

INTERSECT

This query returns the last names that are present in both Roster and PlayerStats.

SELECT LastName
FROM Roster
INTERSECT ALL
SELECT LastName
FROM PlayerStats;

Results:

LastName
Adams
Coolidge
Buchanan

EXCEPT

The query below returns last names in Roster that are not present in PlayerStats.

SELECT LastName
FROM Roster
EXCEPT DISTINCT
SELECT LastName
FROM PlayerStats;

Results:

LastName
Eisenhower
Davis

Reversing the order of the SELECT statements will return last names in PlayerStats that are not present in Roster:

SELECT LastName
FROM PlayerStats
EXCEPT DISTINCT
SELECT LastName
FROM Roster;

Results:

(empty)