Query syntax in Standard SQL

Query statements scan one or more tables or expressions and return the computed result rows. This topic describes the syntax for SQL queries in BigQuery.

SQL syntax

query_statement:
    query_expr

query_expr:
    [ WITH cte[, ...] ]
    { select | ( query_expr ) | set_operation }
    [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
    [ LIMIT count [ OFFSET skip_rows ] ]

select:
    SELECT [ AS { STRUCT | VALUE } ] [{ ALL | DISTINCT }]
        { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...] [ FROM from_clause[, ...] ] [ WHERE bool_expression ] [ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } ] [ HAVING bool_expression ] [ QUALIFY bool_expression ] [ WINDOW window_clause ]

Notation rules

  • Square brackets "[ ]" indicate optional clauses.
  • Parentheses "( )" indicate literal parentheses.
  • The vertical bar "|" indicates a logical OR.
  • Curly braces "{ }" enclose a set of options.
  • A comma followed by an ellipsis within square brackets "[, ... ]" indicates that the preceding item can repeat in a comma-separated list.

Sample tables

The following tables are used to illustrate the behavior of different query clauses in this reference.

Roster table

The Roster table includes a list of player names (LastName) and the unique ID assigned to their school (SchoolID). It looks like this:

+-----------------------+
| LastName   | SchoolID |
+-----------------------+
| Adams      | 50       |
| Buchanan   | 52       |
| Coolidge   | 52       |
| Davis      | 51       |
| Eisenhower | 77       |
+-----------------------+

You can use this WITH clause to emulate a temporary table name for the examples in this reference:

WITH Roster AS
 (SELECT 'Adams' as LastName, 50 as SchoolID UNION ALL
  SELECT 'Buchanan', 52 UNION ALL
  SELECT 'Coolidge', 52 UNION ALL
  SELECT 'Davis', 51 UNION ALL
  SELECT 'Eisenhower', 77)
SELECT * FROM Roster

PlayerStats table

The PlayerStats table includes a list of player names (LastName) and the unique ID assigned to the opponent they played in a given game (OpponentID) and the number of points scored by the athlete in that game (PointsScored).

+----------------------------------------+
| LastName   | OpponentID | PointsScored |
+----------------------------------------+
| Adams      | 51         | 3            |
| Buchanan   | 77         | 0            |
| Coolidge   | 77         | 1            |
| Adams      | 52         | 4            |
| Buchanan   | 50         | 13           |
+----------------------------------------+

You can use this WITH clause to emulate a temporary table name for the examples in this reference:

WITH PlayerStats AS
 (SELECT 'Adams' as LastName, 51 as OpponentID, 3 as PointsScored UNION ALL
  SELECT 'Buchanan', 77, 0 UNION ALL
  SELECT 'Coolidge', 77, 1 UNION ALL
  SELECT 'Adams', 52, 4 UNION ALL
  SELECT 'Buchanan', 50, 13)
SELECT * FROM PlayerStats

TeamMascot table

The TeamMascot table includes a list of unique school IDs (SchoolID) and the mascot for that school (Mascot).

+---------------------+
| SchoolID | Mascot   |
+---------------------+
| 50       | Jaguars  |
| 51       | Knights  |
| 52       | Lakers   |
| 53       | Mustangs |
+---------------------+

You can use this WITH clause to emulate a temporary table name for the examples in this reference:

WITH TeamMascot AS
 (SELECT 50 as SchoolID, 'Jaguars' as Mascot UNION ALL
  SELECT 51, 'Knights' UNION ALL
  SELECT 52, 'Lakers' UNION ALL
  SELECT 53, 'Mustangs')
SELECT * FROM TeamMascot

SELECT list

SELECT [ AS { typename | STRUCT | VALUE } ] [{ ALL | DISTINCT }]
    { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...]

The SELECT list defines the columns that the query will return. Expressions in the SELECT list can refer to columns in any of the from_items in its corresponding FROM clause.

Each item in the SELECT list is one of:

  • *
  • expression
  • expression.*

SELECT *

SELECT *, often referred to as select star, produces one output column for each column that is visible after executing the full query.

SELECT * FROM (SELECT "apple" AS fruit, "carrot" AS vegetable);

+-------+-----------+
| fruit | vegetable |
+-------+-----------+
| apple | carrot    |
+-------+-----------+

SELECT expression

Items in a SELECT list can be expressions. These expressions evaluate to a single value and produce one output column, with an optional explicit alias.

If the expression does not have an explicit alias, it receives an implicit alias according to the rules for implicit aliases, if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query.

SELECT expression.*

An item in a SELECT list can also take the form of expression.*. This produces one output column for each column or top-level field of expression. The expression must either be a table alias or evaluate to a single value of a data type with fields, such as a STRUCT.

The following query produces one output column for each column in the table groceries, aliased as g.

WITH groceries AS
  (SELECT "milk" AS dairy,
   "eggs" AS protein,
   "bread" AS grain)
SELECT g.*
FROM groceries AS g;

+-------+---------+-------+
| dairy | protein | grain |
+-------+---------+-------+
| milk  | eggs    | bread |
+-------+---------+-------+

More examples:

WITH locations AS
  (SELECT STRUCT("Seattle" AS city, "Washington" AS state) AS location
  UNION ALL
  SELECT STRUCT("Phoenix" AS city, "Arizona" AS state) AS location)
SELECT l.location.*
FROM locations l;

+---------+------------+
| city    | state      |
+---------+------------+
| Seattle | Washington |
| Phoenix | Arizona    |
+---------+------------+
WITH locations AS
  (SELECT ARRAY<STRUCT<city STRING, state STRING>>[("Seattle", "Washington"),
    ("Phoenix", "Arizona")] AS location)
SELECT l.LOCATION[offset(0)].*
FROM locations l;

+---------+------------+
| city    | state      |
+---------+------------+
| Seattle | Washington |
+---------+------------+

Modifiers for * operator

SELECT * EXCEPT

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output.

WITH orders AS
  (SELECT 5 as order_id,
  "sprocket" as item_name,
  200 as quantity)
SELECT * EXCEPT (order_id)
FROM orders;

+-----------+----------+
| item_name | quantity |
+-----------+----------+
| sprocket  | 200      |
+-----------+----------+

SELECT * REPLACE

A SELECT * REPLACE statement specifies one or more expression AS identifier clauses. Each identifier must match a column name from the SELECT * statement. In the output column list, the column that matches the identifier in a REPLACE clause is replaced by the expression in that REPLACE clause.

A SELECT * REPLACE statement does not change the names or order of columns. However, it can change the value and the value type.

WITH orders AS
  (SELECT 5 as order_id,
  "sprocket" as item_name,
  200 as quantity)
SELECT * REPLACE ("widget" AS item_name)
FROM orders;

+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5        | widget    | 200      |
+----------+-----------+----------+

WITH orders AS
  (SELECT 5 as order_id,
  "sprocket" as item_name,
  200 as quantity)
SELECT * REPLACE (quantity/2 AS quantity)
FROM orders;

+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5        | sprocket  | 100      |
+----------+-----------+----------+

Duplicate row handling

You can modify the results returned from a SELECT query, as follows.

SELECT DISTINCT

A SELECT DISTINCT statement discards duplicate rows and returns only the remaining rows. SELECT DISTINCT cannot return columns of the following types:

  • STRUCT
  • ARRAY

SELECT ALL

A SELECT ALL statement returns all rows, including duplicate rows. SELECT ALL is the default behavior of SELECT.

Value tables

In BigQuery, a value table is a table where the row type is a single value. In a regular table, each row is made up of columns, each of which has a name and a type. In a value table, the row type is just a single value, and there are no column names.

Note: In BigQuery, a query can only return a value table with a type of STRUCT.

In contexts where a query with exactly one column is expected, a value table query can be used instead. For example, scalar subqueries and array subqueries (see Subqueries) normally require a single-column query, but in BigQuery, they also allow using a value table query.

A query will produce a value table if it uses SELECT AS, using one of the syntaxes below:

SELECT AS STRUCT

SELECT AS STRUCT expr [[AS] struct_field_name1] [,...]

This produces a value table with a STRUCT row type, where the STRUCT field names and types match the column names and types produced in the SELECT list.

Example:

SELECT ARRAY(SELECT AS STRUCT 1 a, 2 b)

SELECT AS STRUCT can be used in a scalar or array subquery to produce a single STRUCT type grouping multiple values together. Scalar and array subqueries (see Subqueries) are normally not allowed to return multiple columns, but can return a single column with STRUCT type.

SELECT AS VALUE

SELECT AS VALUE produces a value table from any SELECT list that produces exactly one column. Instead of producing an output table with one column, possibly with a name, the output will be a value table where the row type is just the value type that was produced in the one SELECT column. Any alias the column had will be discarded in the value table.

Example:

SELECT AS VALUE STRUCT(1 AS a, 2 AS b) xyz

The query above produces a table with row type STRUCT<a int64, b int64>.

Aliases

See Using Aliases for information on syntax and visibility for SELECT list aliases.

FROM clause

FROM from_clause[, ...]

from_clause:
    from_item
    [ { pivot_operator | unpivot_operator } ]
    [ tablesample_operator ]

from_item:
    {
      table_name [ as_alias ] [ FOR SYSTEM_TIME AS OF timestamp_expression ] 
      | { join_operation | ( join_operation ) }
      | ( query_expr ) [ as_alias ]
      | field_path
      | unnest_operator
      | cte_name [ as_alias ]
    }

as_alias:
    [ AS ] alias

The FROM clause indicates the table or tables from which to retrieve rows, and specifies how to join those rows together to produce a single stream of rows for processing in the rest of the query.

pivot_operator

See PIVOT operator.

unpivot_operator

See UNPIVOT operator.

tablesample_operator

See TABLESAMPLE operator.

table_name

The name (optionally qualified) of an existing table.

SELECT * FROM Roster;
SELECT * FROM dataset.Roster;
SELECT * FROM project.dataset.Roster;

FOR SYSTEM_TIME AS OF

FOR SYSTEM_TIME AS OF references the historical versions of the table definition and rows that were current at timestamp_expression.

Limitations:

The source table in the FROM clause containing FOR SYSTEM_TIME AS OF must not be any of the following:

  • An ARRAY scan, including a flattened array or the output of the UNNEST operator.
  • A common table expression defined by a WITH clause.

timestamp_expression must be a constant expression. It cannot contain the following:

  • Subqueries.
  • Correlated references (references to columns of a table that appear at a higher level of the query statement, such as in the SELECT list).
  • User-defined functions (UDFs).

The value of timestamp_expression cannot fall into the following ranges:

  • After the current timestamp (in the future).
  • More than seven (7) days before the current timestamp.

A single query statement cannot reference a single table at more than one point in time, including the current time. That is, a query can reference a table multiple times at the same timestamp, but not the current version and a historical version, or two different historical versions.

Examples:

The following query returns a historical version of the table from one hour ago.

SELECT *
FROM t
  FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);

The following query returns a historical version of the table at an absolute point in time.

SELECT *
FROM t
  FOR SYSTEM_TIME AS OF '2017-01-01 10:00:00-07:00';

The following query returns an error because the timestamp_expression contains a correlated reference to a column in the containing query.

SELECT *
FROM t1
WHERE t1.a IN (SELECT t2.a
               FROM t2 FOR SYSTEM_TIME AS OF t1.timestamp_column);

The following operations show accessing a historical version of the table before table is replaced.

DECLARE before_replace_timestamp TIMESTAMP;

-- Create table books.
CREATE TABLE books AS
SELECT 'Hamlet' title, 'William Shakespeare' author;

-- Get current timestamp before table replacement.
SET before_replace_timestamp = CURRENT_TIMESTAMP();

-- Replace table with different schema(title and release_date).
CREATE OR REPLACE TABLE books AS
SELECT 'Hamlet' title, DATE '1603-01-01' release_date;

-- This query returns Hamlet, William Shakespeare as result.
SELECT * FROM books FOR SYSTEM_TIME AS OF before_replace_timestamp;

The following operations show accessing a historical version of the table before a DML job.

DECLARE JOB_START_TIMESTAMP TIMESTAMP;

-- Create table books.
CREATE OR REPLACE TABLE books AS
SELECT 'Hamlet' title, 'William Shakespeare' author;

-- Insert two rows into the books.
INSERT books (title, author)
VALUES('The Great Gatsby', 'F. Scott Fizgerald'),
      ('War and Peace', 'Leo Tolstoy');

SELECT * FROM books;

SET JOB_START_TIMESTAMP = (
  SELECT start_time
  FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
  WHERE job_type="QUERY"
    AND statement_type="INSERT"
  ORDER BY start_time DESC
  LIMIT 1
 );

-- This query only returns Hamlet, William Shakespeare as result.
SELECT * FROM books FOR SYSTEM_TIME AS OF JOB_START_TIMESTAMP;

join_operation

See JOIN operation.

query_expr

( query_expr ) [ [ AS ] alias ] is a table subquery.

field_path

In the FROM clause, field_path is any path that resolves to a field within a data type. field_path can go arbitrarily deep into a nested data structure.

Some examples of valid field_path values include:

SELECT * FROM T1 t1, t1.array_column;

SELECT * FROM T1 t1, t1.struct_column.array_field;

SELECT (SELECT ARRAY_AGG(c) FROM t1.array_column c) FROM T1 t1;

SELECT a.struct_field1 FROM T1 t1, t1.array_of_structs a;

SELECT (SELECT STRING_AGG(a.struct_field1) FROM t1.array_of_structs a) FROM T1 t1;

Field paths in the FROM clause must end in an array field. In addition, field paths cannot contain arrays before the end of the path. For example, the path array_column.some_array.some_array_field is invalid because it contains an array before the end of the path.

unnest_operator

See UNNEST operator.

cte_name

Common table expressions (CTEs) in a WITH Clause act like temporary tables that you can reference anywhere in the FROM clause. In the example below, subQ1 and subQ2 are CTEs.

Example:

WITH
  subQ1 AS (SELECT * FROM Roster WHERE SchoolID = 52),
  subQ2 AS (SELECT SchoolID FROM subQ1)
SELECT DISTINCT * FROM subQ2;

The WITH clause hides any permanent tables with the same name for the duration of the query, unless you qualify the table name, for example:

dataset.Roster or project.dataset.Roster.

UNNEST operator

unnest_operator:
    {
      UNNEST( array_expression )
      | UNNEST( array_path )
      | array_path
    }
    [ as_alias ]
    [ WITH OFFSET [ as_alias ] ]

The UNNEST operator takes an ARRAY and returns a table, with one row for each element in the ARRAY. You can also use UNNEST outside of the FROM clause with the IN operator.

For input ARRAYs of most element types, the output of UNNEST generally has one column. This single column has an optional alias, which you can use to refer to the column elsewhere in the query. ARRAYS with these element types return multiple columns:

  • STRUCT

UNNEST destroys the order of elements in the input ARRAY. Use the optional WITH OFFSET clause to return a second column with the array element indexes (see following).

For several ways to use UNNEST, including construction, flattening, and filtering, see Working with arrays.

UNNEST and STRUCTs

For an input ARRAY of STRUCTs, UNNEST returns a row for each STRUCT, with a separate column for each field in the STRUCT. The alias for each column is the name of the corresponding STRUCT field.

Example:

SELECT *
FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]);

+---+-----+
| x | y   |
+---+-----+
| 3 | bar |
| 1 | foo |
+---+-----+

Because the UNNEST operator returns a value table, you can alias UNNEST to define a range variable that you can reference elsewhere in the query. If you reference the range variable in the SELECT list, the query returns a STRUCT containing all of the fields of the original STRUCT in the input table.

Example:

SELECT *, struct_value
FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')])
       AS struct_value;

+---+-----+--------------+
| x | y   | struct_value |
+---+-----+--------------+
| 3 | bar | {3, bar}     |
| 1 | foo | {1, foo}     |
+---+-----+--------------+

Explicit and implicit UNNEST

ARRAY unnesting can be either explicit or implicit. In explicit unnesting, array_expression must return an ARRAY value but does not need to resolve to an ARRAY, and the UNNEST keyword is required.

Example:

SELECT * FROM UNNEST ([1, 2, 3]);

In implicit unnesting, array_path must resolve to an ARRAY and the UNNEST keyword is optional.

Example:

SELECT x
FROM mytable AS t,
  t.struct_typed_column.array_typed_field1 AS x;

In this scenario, array_path can go arbitrarily deep into a data structure, but the last field must be ARRAY-typed. No previous field in the expression can be ARRAY-typed because it is not possible to extract a named field from an ARRAY.

UNNEST and NULLs

UNNEST treats NULLs as follows:

  • NULL and empty arrays produces zero rows.
  • An array containing NULLs produces rows containing NULL values.

The optional WITH OFFSET clause returns a separate column containing the "offset" value (i.e. counting starts at zero) for each row produced by the UNNEST operation. This column has an optional alias; the default alias is offset.

Example:

SELECT * FROM UNNEST ( ) WITH OFFSET AS num;

PIVOT operator

FROM from_item[, ...] pivot_operator

pivot_operator:
    PIVOT(
        aggregate_function_call [as_alias][, ...]
        FOR input_column
        IN ( pivot_column [as_alias][, ...] )
    ) [AS alias]

as_alias:
    [AS] alias

The PIVOT operator rotates rows into columns, using aggregation. PIVOT is part of the FROM clause.

  • PIVOT can be used to modify any table expression.
  • Combining PIVOT with FOR SYSTEM_TIME AS OF is not allowed, although users may use PIVOT against a subquery input which itself uses FOR SYSTEM_TIME AS OF.
  • A WITH OFFSET clause immediately preceding the PIVOT operator is not allowed.

Conceptual example:

-- Before PIVOT is used to rotate sales and quarter into Q1, Q2, Q3, Q4 columns:
+---------+-------+---------+
| product | sales | quarter |
+---------+-------+---------+
| Kale    | 51    | Q1      |
| Kale    | 23    | Q2      |
| Kale    | 45    | Q3      |
| Kale    | 3     | Q4      |
| Apple   | 77    | Q1      |
| Apple   | 0     | Q2      |
| Apple   | 25    | Q3      |
| Apple   | 2     | Q4      |
+---------+-------+---------+

-- After PIVOT is used to rotate sales and quarter into Q1, Q2, Q3, Q4 columns:
+---------+----+----+----+----+
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Apple   | 77 | 0  | 25 | 2  |
| Kale    | 51 | 23 | 45 | 3  |
+---------+----+----+----+----+

Definitions

Top-level definitions:

  • from_item: The table or subquery on which to perform a pivot operation. The from_item must follow these rules.
  • pivot_operator: The pivot operation to perform on a from_item.
  • alias: An alias to use for an item in the query.

pivot_operator definitions:

  • aggregate_function_call: An aggregate function call that aggregates all input rows such that input_column matches a particular value in pivot_column. Each aggregation corresponding to a different pivot_column value produces a different column in the output. Follow these rules when creating an aggregate function call.
  • input_column: Takes a column and retrieves the row values for the column, following these rules.
  • pivot_column: A pivot column to create for each aggregate function call. If an alias is not provided, a default alias is created. A pivot column value type must match the value type in input_column so that the values can be compared. It is possible to have a value in pivot_column that does not match a value in input_column. Must be a constant and follow these rules.

Rules

Rules for a from_item passed to PIVOT:

  • The from_item may consist of any table or subquery result.
  • The from_item may not produce a value table.
  • The from_item may not be a subquery using SELECT AS STRUCT.

Rules for aggregate_function_call:

  • Must be an aggregate function. For example, SUM.
  • You may reference columns in a table passed to PIVOT, as well as correlated columns, but may not access columns defined by the PIVOT clause itself.
  • A table passed to PIVOT may be accessed through its alias if one is provided.
  • You can only use an aggregate function that takes one argument.
  • Except for COUNT, you can only use aggregate functions that ignore NULL inputs.
  • If you are using COUNT, you can use * as an argument.

Rules for input_column:

  • May access columns from the input table, as well as correlated columns, not columns defined by the PIVOT clause, itself.
  • Evaluated against each row in the input table; aggregate and analytic function calls are prohibited.
  • Non-determinism is okay.
  • The type must be groupable.
  • The input table may be accessed through its alias if one is provided.

Rules for pivot_column:

  • A pivot_column must be a constant.
  • Named constants, such as variables, are not supported.
  • Query parameters are not supported.
  • If a name is desired for a named constant or query parameter, specify it explicitly with an alias.
  • Corner cases exist where a distinct pivot_columns can end up with the same default column names. For example, an input column might contain both a NULL value and the string literal "NULL". When this happens, multiple pivot columns are created with the same name. To avoid this situation, use aliases for pivot column names.
  • If a pivot_column does not specify an alias, a column name is constructed as follows:
From To Example
NULL NULL Input: NULL
Output: "NULL"
INT64
NUMERIC
BIGNUMERIC
The number in string format with the following rules:
  • Positive numbers are preceded with _.
  • Negative numbers are preceded with minus_.
  • A decimal point is replaced with _point_.
Input: 1
Output: _1

Input: -1
Output: minus_1

Input: 1.0
Output: _1_point_0
BOOL TRUE or FALSE. Input: TRUE
Output: TRUE

Input: FALSE
Output: FALSE
STRING The string value. Input: "PlayerName"
Output: PlayerName
DATE The date in _YYYY_MM_DD format. Input: DATE '2013-11-25'
Output: _2013_11_25
ENUM The name of the enumeration constant. Input: COLOR.RED
Output: RED
STRUCT A string formed by computing the pivot_column name for each field and joining the results together with an underscore. The following rules apply:
  • If the field is named: <field_name>_<pivot_column_name_for_field_name>.
  • If the field is unnamed: <pivot_column_name_for_field_name>.

<pivot_column_name_for_field_name> is determined by applying the rules in this table, recursively. If no rule is available for any STRUCT field, the entire pivot column is unnamed.

Due to implicit type coercion from the IN list values to the type of <value-expression>, field names must be present in input_column to have an effect on the names of the pivot columns.

Input: STRUCT("one", "two")
Output: one_two

Input: STRUCT("one" AS a, "two" AS b)
Output: one_a_two_b
All other data types Not supported. You must provide an alias.

Examples

The following examples reference a table called Produce that looks like this:

WITH Produce AS (
  SELECT 'Kale' as product, 51 as sales, 'Q1' as quarter UNION ALL
  SELECT 'Kale', 23, 'Q2' UNION ALL
  SELECT 'Kale', 45, 'Q3' UNION ALL
  SELECT 'Kale', 3, 'Q4' UNION ALL
  SELECT 'Apple', 77, 'Q1' UNION ALL
  SELECT 'Apple', 0, 'Q2' UNION ALL
  SELECT 'Apple', 25, 'Q3' UNION ALL
  SELECT 'Apple', 2, 'Q4')
SELECT * FROM Produce

+---------+-------+---------+
| product | sales | quarter |
+---------+-------+---------+
| Kale    | 51    | Q1      |
| Kale    | 23    | Q2      |
| Kale    | 45    | Q3      |
| Kale    | 3     | Q4      |
| Apple   | 77    | Q1      |
| Apple   | 0     | Q2      |
| Apple   | 25    | Q3      |
| Apple   | 2     | Q4      |
+---------+-------+---------+

With the PIVOT operator, the rows in the quarter column are rotated into these new columns: Q1, Q2, Q3, Q4.

SELECT * FROM
  (SELECT * FROM Produce)
  PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))

+---------+----+----+----+----+
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Apple   | 77 | 0  | 25 | 2  |
| Kale    | 51 | 23 | 45 | 3  |
+---------+----+----+----+----+
SELECT * FROM
  (SELECT sales, quarter FROM Produce)
  PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))

+-----+----+----+----+
| Q1  | Q2 | Q3 | Q4 |
+-----+----+----+----+
| 128 | 23 | 70 | 5  |
+-----+----+----+----+

You can return selected columns only:

SELECT * FROM
  (SELECT * FROM Produce)
  PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3'))

+---------+----+----+----+
| product | Q1 | Q2 | Q3 |
+---------+----+----+----+
| Apple   | 77 | 0  | 25 |
| Kale    | 51 | 23 | 45 |
+---------+----+----+----+
SELECT * FROM
  (SELECT sales, quarter FROM Produce)
  PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3'))

+-----+----+----+
| Q1  | Q2 | Q3 |
+-----+----+----+
| 128 | 23 | 70 |
+-----+----+----+

UNPIVOT operator

FROM from_item[, ...] unpivot_operator

unpivot_operator:
    UNPIVOT [ { INCLUDE NULLS | EXCLUDE NULLS } ] (
        { single_column_unpivot | multi_column_unpivot }
    ) [unpivot_alias]

single_column_unpivot:
    values_column
    FOR name_column
    IN (columns_to_unpivot)

multi_column_unpivot:
    values_column_set
    FOR name_column
    IN (column_sets_to_unpivot)

values_column_set:
    (values_column[, ...])

columns_to_unpivot:
    unpivot_column [row_value_alias][, ...]

column_sets_to_unpivot:
    (unpivot_column [row_value_alias][, ...])

unpivot_alias and row_value_alias:
    [AS] alias

The UNPIVOT operator rotates columns into rows. UNPIVOT is part of the FROM clause.

  • UNPIVOT can be used to modify any table expression.
  • Combining UNPIVOT with FOR SYSTEM_TIME AS OF is not allowed, although users may use UNPIVOT against a subquery input which itself uses FOR SYSTEM_TIME AS OF.
  • A WITH OFFSET clause immediately preceding the UNPIVOT operator is not allowed.
  • PIVOT aggregations cannot be reversed with UNPIVOT.

Conceptual example:

-- Before UNPIVOT is used to rotate Q1, Q2, Q3, Q4 into sales and quarter columns:
+---------+----+----+----+----+
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Kale    | 51 | 23 | 45 | 3  |
| Apple   | 77 | 0  | 25 | 2  |
+---------+----+----+----+----+

-- After UNPIVOT is used to rotate Q1, Q2, Q3, Q4 into sales and quarter columns:
+---------+-------+---------+
| product | sales | quarter |
+---------+-------+---------+
| Kale    | 51    | Q1      |
| Kale    | 23    | Q2      |
| Kale    | 45    | Q3      |
| Kale    | 3     | Q4      |
| Apple   | 77    | Q1      |
| Apple   | 0     | Q2      |
| Apple   | 25    | Q3      |
| Apple   | 2     | Q4      |
+---------+-------+---------+

Definitions

Top-level definitions:

  • from_item: The table or subquery on which to perform a pivot operation. The from_item must follow these rules.
  • unpivot_operator: The pivot operation to perform on a from_item.

unpivot_operator definitions:

  • INCLUDE NULLS: Add rows with NULL values to the result.
  • EXCLUDE NULLS: Do not add rows with NULL values to the result. By default, UNPIVOT excludes rows with NULL values.
  • single_column_unpivot: Rotates columns into one values_column and one name_column.
  • multi_column_unpivot: Rotates columns into multiple values_columns and one name_column.
  • unpivot_alias: An alias for the results of the UNPIVOT operation. This alias can be referenced elsewhere in the query.

single_column_unpivot definitions:

  • values_column: A column to contain the row values from columns_to_unpivot. Follow these rules when creating a values column.
  • name_column: A column to contain the column names from columns_to_unpivot. Follow these rules when creating a name column.
  • columns_to_unpivot: The columns from the from_item to populate values_column and name_column. Follow these rules when creating an unpivot column.
    • row_value_alias: An optional alias for a column that is displayed for the column in name_column. If not specified, the string value of the column name is used. Follow these rules when creating a row value alias.

multi_column_unpivot definitions:

  • values_column_set: A set of columns to contain the row values from columns_to_unpivot. Follow these rules when creating a values column.
  • name_column: A set of columns to contain the column names from columns_to_unpivot. Follow these rules when creating a name column.
  • column_sets_to_unpivot: The columns from the from_item to unpivot. Follow these rules when creating an unpivot column.
    • row_value_alias: An optional alias for a column set that is displayed for the column set in name_column. If not specified, a string value for the column set is used and each column in the string is separated with an underscore (_). For example, (col1, col2) outputs col1_col2. Follow these rules when creating a row value alias.

Rules

Rules for a from_item passed to UNPIVOT:

  • The from_item may consist of any table or subquery result.
  • The from_item may not produce a value table.
  • Duplicate columns in a from_item cannot be referenced in the UNPIVOT clause.

Rules for unpivot_operator:

  • Expressions are not permitted.
  • Qualified names are not permitted. For example, mytable.mycolumn is not allowed.
  • In the case where the UNPIVOT result has duplicate column names:
    • SELECT * is allowed.
    • SELECT values_column causes ambiguity.

Rules for values_column:

  • It cannot be a name used for a name_column or an unpivot_column.
  • It can be the same name as a column from the from_item.

Rules for name_column:

  • It cannot be a name used for a values_column or an unpivot_column.
  • It can be the same name as a column from the from_item.

Rules for unpivot_column:

  • Must be a column name from the from_item.
  • It cannot reference duplicate from_item column names.
  • All columns in a column set must have equivalent data types.
    • Data types cannot be coerced to a common supertype.
    • If the data types are exact matches (for example, a struct with different field names), the data type of the first input is the data type of the output.
  • You cannot have the same name in the same column set. For example, (emp1, emp1) results in an error.
  • You can have a the same name in different column sets. For example, (emp1, emp2), (emp1, emp3) is valid.

Rules for row_value_alias:

  • This can be a STRING or an INT64 literal.
  • The data type for all row_value_alias clauses must be the same.
  • If the value is an INT64, the row_value_alias for each unpivot_column must be specified.

Examples

The following examples reference a table called Produce that looks like this:

WITH Produce AS (
  SELECT 'Kale' as product, 51 as Q1, 23 as Q2, 45 as Q3, 3 as Q4 UNION ALL
  SELECT 'Apple', 77, 0, 25, 2)
SELECT * FROM Produce

+---------+----+----+----+----+
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Kale    | 51 | 23 | 45 | 3  |
| Apple   | 77 | 0  | 25 | 2  |
+---------+----+----+----+----+

With the UNPIVOT operator, the columns Q1, Q2, Q3, and Q4 are rotated. The values of these columns now populate a new column called Sales and the names of these columns now populate a new column called Quarter. This is a single-column unpivot operation.

SELECT * FROM Produce
UNPIVOT(sales FOR quarter IN (Q1, Q2, Q3, Q4))

+---------+-------+---------+
| product | sales | quarter |
+---------+-------+---------+
| Kale    | 51    | Q1      |
| Kale    | 23    | Q2      |
| Kale    | 45    | Q3      |
| Kale    | 3     | Q4      |
| Apple   | 77    | Q1      |
| Apple   | 0     | Q2      |
| Apple   | 25    | Q3      |
| Apple   | 2     | Q4      |
+---------+-------+---------+

In this example, we UNPIVOT four quarters into two semesters. This is a multi-column unpivot operation.

SELECT * FROM Produce
UNPIVOT(
  (first_half_sales, second_half_sales)
  FOR semesters
  IN ((Q1, Q2) AS 'semester_1', (Q3, Q4) AS 'semester_2'))

+---------+------------------+-------------------+------------+
| product | first_half_sales | second_half_sales | semesters  |
+---------+------------------+-------------------+------------+
| Kale    | 51               | 23                | semester_1 |
| Kale    | 45               | 3                 | semester_2 |
| Apple   | 77               | 0                 | semester_1 |
| Apple   | 25               | 2                 | semester_2 |
+---------+------------------+-------------------+------------+

TABLESAMPLE operator

TABLESAMPLE SYSTEM ( percent PERCENT )

Description

You can use the TABLESAMPLE operator to select a random sample of a dataset. This operator is useful when you're working with tables that have large amounts of data and you don't need precise answers.

Sampling returns a variety of records while avoiding the costs associated with scanning and processing an entire table. Each execution of the query might return different results because each execution processes an independently computed sample. BigQuery does not cache the results of queries that include a TABLESAMPLE clause.

Replace percent with the percentage of the dataset that you want to include in the results. The value must be between 0 and 100. The value can be a literal value or a query parameter. It cannot be a variable.

For more information, see Table sampling.

Example

The following query selects approximately 10% of a table's data:

SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (10 PERCENT)

JOIN operation

join_operation:
    { cross_join_operation | condition_join_operation }

cross_join_operation:
    from_item cross_join_operator from_item

condition_join_operation:
    from_item condition_join_operator from_item join_condition

cross_join_operator:
    { CROSS JOIN | , }

condition_join_operator:
    {
      [INNER] JOIN
      | FULL [OUTER] JOIN
      | LEFT [OUTER] JOIN
      | RIGHT [OUTER] JOIN
    }

join_condition:
    { on_clause | using_clause }

on_clause:
    ON bool_expression

using_clause:
    USING ( join_column [, ...] )

The JOIN operation merges two from_items so that the SELECT clause can query them as one source. The join_type and ON or USING clause (a "join condition") specify how to combine and discard rows from the two from_items to form a single source.

[INNER] JOIN

An INNER JOIN, or simply JOIN, effectively calculates the Cartesian product of the two from_items and discards all rows that do not meet the join condition. "Effectively" means that it is possible to implement an INNER JOIN without actually calculating the Cartesian product.

FROM A INNER JOIN B ON A.w = B.y

Table A       Table B       Result
+-------+     +-------+     +---------------+
| w | x |  *  | y | z |  =  | w | x | y | z |
+-------+     +-------+     +---------------+
| 1 | a |     | 2 | k |     | 2 | b | 2 | k |
| 2 | b |     | 3 | m |     | 3 | c | 3 | m |
| 3 | c |     | 3 | n |     | 3 | c | 3 | n |
| 3 | d |     | 4 | p |     | 3 | d | 3 | m |
+-------+     +-------+     | 3 | d | 3 | n |
                            +---------------+
FROM A INNER JOIN B USING (x)

Table A       Table B       Result
+-------+     +-------+     +-----------+
| x | y |  *  | x | z |  =  | x | y | z |
+-------+     +-------+     +-----------+
| 1 | a |     | 2 | k |     | 2 | b | k |
| 2 | b |     | 3 | m |     | 3 | c | m |
| 3 | c |     | 3 | n |     | 3 | c | n |
| 3 | d |     | 4 | p |     | 3 | d | m |
+-------+     +-------+     | 3 | d | n |
                            +-----------+

Example

This query performs an INNER JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
+---------------------------+

CROSS JOIN

CROSS JOIN returns the Cartesian product of the two from_items. In other words, it combines each row from the first from_item with each row from the second from_item.

If the rows of the two from_items are independent, then the result has M * N rows, given M rows in one from_item and N in the other. Note that this still holds for the case when either from_item has zero rows.

In a FROM clause, a CROSS JOIN can be written like this:

FROM A CROSS JOIN B

Table A       Table B       Result
+-------+     +-------+     +---------------+
| w | x |  *  | y | z |  =  | w | x | y | z |
+-------+     +-------+     +---------------+
| 1 | a |     | 2 | c |     | 1 | a | 2 | c |
| 2 | b |     | 3 | d |     | 1 | a | 3 | d |
+-------+     +-------+     | 2 | b | 2 | c |
                            | 2 | b | 3 | d |
                            +---------------+

You can use a correlated cross join to convert or flatten an ARRAY into a set of rows. To learn more, see Convert elements in an array to rows in a table.

Examples

This query performs an CROSS JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster CROSS JOIN TeamMascot;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Adams      | Knights      |
| Adams      | Lakers       |
| Adams      | Mustangs     |
| Buchanan   | Jaguars      |
| Buchanan   | Knights      |
| Buchanan   | Lakers       |
| Buchanan   | Mustangs     |
| ...                       |
+---------------------------+

Comma cross join (,)

CROSS JOINs can be written implicitly with a comma. This is called a comma cross join.

A comma cross join looks like this in a FROM clause:

FROM A, B

Table A       Table B       Result
+-------+     +-------+     +---------------+
| w | x |  *  | y | z |  =  | w | x | y | z |
+-------+     +-------+     +---------------+
| 1 | a |     | 2 | c |     | 1 | a | 2 | c |
| 2 | b |     | 3 | d |     | 1 | a | 3 | d |
+-------+     +-------+     | 2 | b | 2 | c |
                            | 2 | b | 3 | d |
                            +---------------+

You cannot write comma cross joins inside parentheses. To learn more, see Join operations in a sequence.

FROM (A, B)  // INVALID

You can use a correlated comma cross join to convert or flatten an ARRAY into a set of rows. To learn more, see Convert elements in an array to rows in a table.

Examples

This query performs a comma cross join on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster, TeamMascot;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Adams      | Knights      |
| Adams      | Lakers       |
| Adams      | Mustangs     |
| Buchanan   | Jaguars      |
| Buchanan   | Knights      |
| Buchanan   | Lakers       |
| Buchanan   | Mustangs     |
| ...                       |
+---------------------------+

FULL [OUTER] JOIN

A FULL OUTER JOIN (or simply FULL JOIN) returns all fields for all rows in both from_items that meet the join condition.

FULL indicates that all rows from both from_items are returned, even if they do not meet the join condition.

OUTER indicates that if a given row from one from_item does not join to any row in the other from_item, the row will return with NULLs for all columns from the other from_item.

FROM A FULL OUTER JOIN B ON A.w = B.y

Table A       Table B       Result
+-------+     +-------+     +---------------------------+
| w | x |  *  | y | z |  =  | w    | x    | y    | z    |
+-------+     +-------+     +---------------------------+
| 1 | a |     | 2 | k |     | 1    | a    | NULL | NULL |
| 2 | b |     | 3 | m |     | 2    | b    | 2    | k    |
| 3 | c |     | 3 | n |     | 3    | c    | 3    | m    |
| 3 | d |     | 4 | p |     | 3    | c    | 3    | n    |
+-------+     +-------+     | 3    | d    | 3    | m    |
                            | 3    | d    | 3    | n    |
                            | NULL | NULL | 4    | p    |
                            +---------------------------+
FROM A FULL OUTER JOIN B USING (x)

Table A       Table B       Result
+-------+     +-------+     +--------------------+
| x | y |  *  | x | z |  =  | x    | y    | z    |
+-------+     +-------+     +--------------------+
| 1 | a |     | 2 | k |     | 1    | a    | NULL |
| 2 | b |     | 3 | m |     | 2    | b    | k    |
| 3 | c |     | 3 | n |     | 3    | c    | m    |
| 3 | d |     | 4 | p |     | 3    | c    | n    |
+-------+     +-------+     | 3    | d    | m    |
                            | 3    | d    | n    |
                            | 4    | NULL | p    |
                            +--------------------+

Example

This query performs a FULL JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster FULL JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
| Eisenhower | NULL         |
| NULL       | Mustangs     |
+---------------------------+

LEFT [OUTER] JOIN

The result of a LEFT OUTER JOIN (or simply LEFT JOIN) for two from_items always retains all rows of the left from_item in the JOIN operation, even if no rows in the right from_item satisfy the join predicate.

LEFT indicates that all rows from the left from_item are returned; if a given row from the left from_item does not join to any row in the right from_item, the row will return with NULLs for all columns from the right from_item. Rows from the right from_item that do not join to any row in the left from_item are discarded.

FROM A LEFT OUTER JOIN B ON A.w = B.y

Table A       Table B       Result
+-------+     +-------+     +---------------------------+
| w | x |  *  | y | z |  =  | w    | x    | y    | z    |
+-------+     +-------+     +---------------------------+
| 1 | a |     | 2 | k |     | 1    | a    | NULL | NULL |
| 2 | b |     | 3 | m |     | 2    | b    | 2    | k    |
| 3 | c |     | 3 | n |     | 3    | c    | 3    | m    |
| 3 | d |     | 4 | p |     | 3    | c    | 3    | n    |
+-------+     +-------+     | 3    | d    | 3    | m    |
                            | 3    | d    | 3    | n    |
                            +---------------------------+
FROM A LEFT OUTER JOIN B USING (x)

Table A       Table B       Result
+-------+     +-------+     +--------------------+
| x | y |  *  | x | z |  =  | x    | y    | z    |
+-------+     +-------+     +--------------------+
| 1 | a |     | 2 | k |     | 1    | a    | NULL |
| 2 | b |     | 3 | m |     | 2    | b    | k    |
| 3 | c |     | 3 | n |     | 3    | c    | m    |
| 3 | d |     | 4 | p |     | 3    | c    | n    |
+-------+     +-------+     | 3    | d    | m    |
                            | 3    | d    | n    |
                            +--------------------+

Example

This query performs a LEFT JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster LEFT JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
| Eisenhower | NULL         |
+---------------------------+

RIGHT [OUTER] JOIN

The result of a RIGHT OUTER JOIN (or simply RIGHT JOIN) is similar and symmetric to that of LEFT OUTER JOIN.

FROM A RIGHT OUTER JOIN B ON A.w = B.y

Table A       Table B       Result
+-------+     +-------+     +---------------------------+
| w | x |  *  | y | z |  =  | w    | x    | y    | z    |
+-------+     +-------+     +---------------------------+
| 1 | a |     | 2 | k |     | 2    | b    | 2    | k    |
| 2 | b |     | 3 | m |     | 3    | c    | 3    | m    |
| 3 | c |     | 3 | n |     | 3    | c    | 3    | n    |
| 3 | d |     | 4 | p |     | 3    | d    | 3    | m    |
+-------+     +-------+     | 3    | d    | 3    | n    |
                            | NULL | NULL | 4    | p    |
                            +---------------------------+
FROM A RIGHT OUTER JOIN B USING (x)

Table A       Table B       Result
+-------+     +-------+     +--------------------+
| x | y |  *  | x | z |  =  | x    | y    | z    |
+-------+     +-------+     +--------------------+
| 1 | a |     | 2 | k |     | 2    | b    | k    |
| 2 | b |     | 3 | m |     | 3    | c    | m    |
| 3 | c |     | 3 | n |     | 3    | c    | n    |
| 3 | d |     | 4 | p |     | 3    | d    | m    |
+-------+     +-------+     | 3    | d    | n    |
                            | 4    | NULL | p    |
                            +--------------------+

Example

This query performs a RIGHT JOIN on the Roster and TeamMascot tables.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster RIGHT JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
| NULL       | Mustangs     |
+---------------------------+

ON clause

A combined row (the result of joining two rows) meets the ON join condition if join condition returns TRUE.

FROM A JOIN B ON A.x = B.x

Table A   Table B   Result (A.x, B.x)
+---+     +---+     +-------+
| x |  *  | x |  =  | x | x |
+---+     +---+     +-------+
| 1 |     | 2 |     | 2 | 2 |
| 2 |     | 3 |     | 3 | 3 |
| 3 |     | 4 |     +-------+
+---+     +---+

Example

This query performs an INNER JOIN on the Roster and TeamMascot table.

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;

+---------------------------+
| LastName   | Mascot       |
+---------------------------+
| Adams      | Jaguars      |
| Buchanan   | Lakers       |
| Coolidge   | Lakers       |
| Davis      | Knights      |
+---------------------------+

USING clause

The USING clause requires a column list of one or more columns which occur in both input tables. It performs an equality comparison on that column, and the rows meet the join condition if the equality comparison returns TRUE.

FROM A JOIN B USING (x)

Table A   Table B   Result
+---+     +---+     +---+
| x |  *  | x |  =  | x |
+---+     +---+     +---+
| 1 |     | 2 |     | 2 |
| 2 |     | 3 |     | 3 |
| 3 |     | 4 |     +---+
+---+     +---+

Example

This query performs an INNER JOIN on the Roster and TeamMascot table.

This statement returns the rows from Roster and TeamMascot where Roster.SchooldID is the same as TeamMascot.SchooldID. The results include a single SchooldID column.

SELECT * FROM Roster INNER JOIN TeamMascot USING (SchoolID);

+----------------------------------------+
| SchoolID   | LastName   | Mascot       |
+----------------------------------------+
| 50         | Adams      | Jaguars      |
| 52         | Buchanan   | Lakers       |
| 52         | Coolidge   | Lakers       |
| 51         | Davis      | Knights      |
+----------------------------------------+

ON and USING equivalency

The ON and USING keywords are not equivalent, but they are similar. ON returns multiple columns, and USING returns one.

FROM A JOIN B ON A.x = B.x
FROM A JOIN B USING (x)

Table A   Table B   Result ON     Result USING
+---+     +---+     +-------+     +---+
| x |  *  | x |  =  | x | x |     | x |
+---+     +---+     +-------+     +---+
| 1 |     | 2 |     | 2 | 2 |     | 2 |
| 2 |     | 3 |     | 3 | 3 |     | 3 |
| 3 |     | 4 |     +-------+     +---+
+---+     +---+

Although ON and USING are not equivalent, they can return the same results if you specify the columns you want to return.

SELECT x FROM A JOIN B USING (x);
SELECT A.x FROM A JOIN B ON A.x = B.x;

Table A   Table B   Result
+---+     +---+     +---+
| x |  *  | x |  =  | x |
+---+     +---+     +---+
| 1 |     | 2 |     | 2 |
| 2 |     | 3 |     | 3 |
| 3 |     | 4 |     +---+
+---+     +---+

Join operations in a sequence

The FROM clause can contain multiple JOIN operations in a sequence. JOINs are bound from left to right. For example:

FROM A JOIN B USING (x) JOIN C USING (x)

-- A JOIN B USING (x)        = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2                  = return value

You can also insert parentheses to group JOINs:

FROM ( (A JOIN B USING (x)) JOIN C USING (x) )

-- A JOIN B USING (x)        = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2                  = return value

With parentheses, you can group JOINs so that they are bound in a different order:

FROM ( A JOIN (B JOIN C USING (x)) USING (x) )

-- B JOIN C USING (x)       = result_1
-- A JOIN result_1          = result_2
-- result_2                 = return value

A FROM clause can have multiple joins. Provided there are no comma cross joins in the FROM clause, joins do not require parenthesis, though parenthesis can help readability:

FROM A JOIN B JOIN C JOIN D USING (w) ON B.x = C.y ON A.z = B.x

If your clause contains comma cross joins, you must use parentheses:

FROM A, B JOIN C JOIN D ON C.x = D.y ON B.z = C.x    // INVALID
FROM A, B JOIN (C JOIN D ON C.x = D.y) ON B.z = C.x  // VALID

When comma cross joins are present in a query with a sequence of JOINs, they group from left to right like other JOIN types:

FROM A JOIN B USING (x) JOIN C USING (x), D

-- A JOIN B USING (x)        = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2 CROSS JOIN D     = return value

There cannot be a RIGHT JOIN or FULL JOIN after a comma cross join unless it is parenthsized:

FROM A, B RIGHT JOIN C ON TRUE // INVALID
FROM A, B FULL JOIN C ON TRUE  // INVALID
FROM A, B JOIN C ON TRUE       // VALID
FROM A, (B RIGHT JOIN C ON TRUE) // VALID
FROM A, (B FULL JOIN C ON TRUE)  // VALID

Correlated join operation

A join operation is correlated when the right from_item contains a reference to at least one range variable or column name introduced by the left from_item.

In a correlated join operation, rows from the right from_item are determined by a row from the left from_item. Consequently, RIGHT OUTER and FULL OUTER joins cannot be correlated because right from_item rows cannot be determined in the case when there is no row from the left from_item.

All correlated join operations must reference an array in the right from_item.

This is a conceptual example of a correlated join operation that includes a correlated subquery:

FROM A JOIN UNNEST(ARRAY(SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)) AS C
  • Left from_item: A
  • Right from_item: UNNEST(...) AS C
  • A correlated subquery: (SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)

This is another conceptual example of a correlated join operation. array_of_IDs is part of the left from_item but is referenced in the right from_item.

FROM A JOIN UNNEST(A.array_of_IDs) AS C

The UNNEST operator can be explicit or implicit. These are both allowed:

FROM A JOIN UNNEST(A.array_of_IDs) AS IDs
FROM A JOIN A.array_of_IDs AS IDs

In a correlated join operation, the right from_item is re-evaluated against each distinct row from the left from_item. In the following conceptual example, the correlated join operation first evaluates A and B, then A and C:

FROM
  A
  JOIN
  UNNEST(ARRAY(SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)) AS C
  ON A.Name = C.Name

Examples

This is an example of a correlated join, using the Roster and PlayerStats tables:

SELECT *
FROM
  Roster
JOIN
  UNNEST(
    ARRAY(
      SELECT AS STRUCT *
      FROM PlayerStats
      WHERE PlayerStats.OpponentID = Roster.SchoolID
    )) AS PlayerMatches
  ON PlayerMatches.LastName = 'Buchanan'

+------------+----------+----------+------------+--------------+
| LastName   | SchoolID | LastName | OpponentID | PointsScored |
+------------+----------+----------+------------+--------------+
| Adams      | 50       | Buchanan | 50         | 13           |
| Eisenhower | 77       | Buchanan | 77         | 0            |
+------------+----------+----------+------------+--------------+

WHERE clause

WHERE bool_expression

The WHERE clause filters the results of the FROM clause.

Only rows whose bool_expression evaluates to TRUE are included. Rows whose bool_expression evaluates to NULL or FALSE are discarded.

The evaluation of a query with a WHERE clause is typically completed in this order:

  • FROM
  • WHERE
  • GROUP BY and aggregation
  • HAVING
  • WINDOW
  • QUALIFY
  • DISTINCT
  • ORDER BY
  • LIMIT

The WHERE clause can only reference columns available via the FROM clause; it cannot reference SELECT list aliases.

Examples

This query returns returns all rows from the Roster table where the SchoolID column has the value 52:

SELECT * FROM Roster
WHERE SchoolID = 52;

The bool_expression can contain multiple sub-conditions:

SELECT * FROM Roster
WHERE STARTS_WITH(LastName, "Mc") OR STARTS_WITH(LastName, "Mac");

Expressions in an INNER JOIN have an equivalent expression in the WHERE clause. For example, a query using INNER JOIN and ON has an equivalent expression using CROSS JOIN and WHERE. For example, the following two queries are equivalent:

SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster INNER JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster CROSS JOIN TeamMascot
WHERE Roster.SchoolID = TeamMascot.SchoolID;

GROUP BY clause

GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) }

The GROUP BY clause groups together rows in a table with non-distinct values for the expression in the GROUP BY clause. For multiple rows in the source table with non-distinct values for expression, the GROUP BY clause produces a single combined row. GROUP BY is commonly used when aggregate functions are present in the SELECT list, or to eliminate redundancy in the output. The data type of expression must be groupable.

Example:

SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY LastName;

The GROUP BY clause can refer to expression names in the SELECT list. The GROUP BY clause also allows ordinal references to expressions in the SELECT list using integer values. 1 refers to the first expression in the SELECT list, 2 the second, and so forth. The expression list can combine ordinals and expression names.

Example:

SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY LastName, FirstName;

The query above is equivalent to:

SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY 2, FirstName;

GROUP BY clauses may also refer to aliases. If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause.

Example:

SELECT SUM(PointsScored), LastName as last_name
FROM PlayerStats
GROUP BY last_name;

GROUP BY ROLLUP returns the results of GROUP BY for prefixes of the expressions in the ROLLUP list, each of which is known as a grouping set. For the ROLLUP list (a, b, c), the grouping sets are (a, b, c), (a, b), (a), (). When evaluating the results of GROUP BY for a particular grouping set, GROUP BY ROLLUP treats expressions that are not in the grouping set as having a NULL value. A SELECT statement like this one:

SELECT a, b, SUM(c) FROM Input GROUP BY ROLLUP(a, b);

uses the rollup list (a, b). The result will include the results of GROUP BY for the grouping sets (a, b), (a), and (), which includes all rows. This returns the same rows as:

SELECT NULL, NULL, SUM(c) FROM Input               UNION ALL
SELECT a,    NULL, SUM(c) FROM Input GROUP BY a    UNION ALL
SELECT a,    b,    SUM(c) FROM Input GROUP BY a, b;

This allows the computation of aggregates for the grouping sets defined by the expressions in the ROLLUP list and the prefixes of that list.

Example:

WITH Sales AS (
  SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
  SELECT 123, 1, 8.99 UNION ALL
  SELECT 456, 1, 4.56 UNION ALL
  SELECT 123, 2, 9.99 UNION ALL
  SELECT 789, 3, 1.00 UNION ALL
  SELECT 456, 3, 4.25 UNION ALL
  SELECT 789, 3, 0.99
)
SELECT
  day,
  SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(day);

The query above outputs a row for each day in addition to the rolled up total across all days, as indicated by a NULL day:

+------+-------+
| day  | total |
+------+-------+
| NULL | 39.77 |
|    1 | 23.54 |
|    2 |  9.99 |
|    3 |  6.24 |
+------+-------+

Example:

WITH Sales AS (
  SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
  SELECT 123, 1, 8.99 UNION ALL
  SELECT 456, 1, 4.56 UNION ALL
  SELECT 123, 2, 9.99 UNION ALL
  SELECT 789, 3, 1.00 UNION ALL
  SELECT 456, 3, 4.25 UNION ALL
  SELECT 789, 3, 0.99
)
SELECT
  sku,
  day,
  SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;

The query above returns rows grouped by the following grouping sets:

  • sku and day
  • sku (day is NULL)
  • The empty grouping set (day and sku are NULL)

The sums for these grouping sets correspond to the total for each distinct sku-day combination, the total for each sku across all days, and the grand total:

+------+------+-------+
| sku  | day  | total |
+------+------+-------+
| NULL | NULL | 39.77 |
|  123 | NULL | 28.97 |
|  123 |    1 | 18.98 |
|  123 |    2 |  9.99 |
|  456 | NULL |  8.81 |
|  456 |    1 |  4.56 |
|  456 |    3 |  4.25 |
|  789 |    3 |  1.99 |
|  789 | NULL |  1.99 |
+------+------+-------+

HAVING clause

HAVING bool_expression

The HAVING clause filters the results produced by GROUP BY or aggregation. GROUP BY or aggregation must be present in the query. If aggregation is present, the HAVING clause is evaluated once for every aggregated row in the result set.

Only rows whose bool_expression evaluates to TRUE are included. Rows whose bool_expression evaluates to NULL or FALSE are discarded.

The evaluation of a query with a HAVING clause is typically completed in this order:

  • FROM
  • WHERE
  • GROUP BY and aggregation
  • HAVING
  • WINDOW
  • QUALIFY
  • DISTINCT
  • ORDER BY
  • LIMIT

The HAVING clause can reference columns available via the FROM clause, as well as SELECT list aliases. Expressions referenced in the HAVING clause must either appear in the GROUP BY clause or they must be the result of an aggregate function:

SELECT LastName
FROM Roster
GROUP BY LastName
HAVING SUM(PointsScored) > 15;

If a query contains aliases in the SELECT clause, those aliases override names in a FROM clause.

SELECT LastName, SUM(PointsScored) AS ps
FROM Roster
GROUP BY LastName
HAVING ps > 0;

Mandatory aggregation

Aggregation does not have to be present in the HAVING clause itself, but aggregation must be present in at least one of the following forms:

Aggregation function in the SELECT list.

SELECT LastName, SUM(PointsScored) AS total
FROM PlayerStats
GROUP BY LastName
HAVING total > 15;

Aggregation function in the 'HAVING' clause.

SELECT LastName
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;

Aggregation in both the SELECT list and HAVING clause.

When aggregation functions are present in both the SELECT list and HAVING clause, the aggregation functions and the columns they reference do not need to be the same. In the example below, the two aggregation functions, COUNT() and SUM(), are different and also use different columns.

SELECT LastName, COUNT(*)
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;

ORDER BY clause

ORDER BY expression
  [{ ASC | DESC }]
  [{ NULLS FIRST | NULLS LAST }]
  [, ...]

The ORDER BY clause specifies a column or expression as the sort criterion for the result set. If an ORDER BY clause is not present, the order of the results of a query is not defined. Column aliases from a FROM clause or SELECT list are allowed. If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause. The data type of expression must be orderable.

Optional Clauses

  • NULLS FIRST | NULLS LAST:
    • NULLS FIRST: Sort null values before non-null values.
    • NULLS LAST. Sort null values after non-null values.
  • ASC | DESC: Sort the results in ascending or descending order of expression values. ASC is the default value. If null ordering is not specified with NULLS FIRST or NULLS LAST:
    • NULLS FIRST is applied by default if the sort order is ascending.
    • NULLS LAST is applied by default if the sort order is descending.

Examples

Use the default sort order (ascending).

SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
      SELECT 9, true UNION ALL
      SELECT NULL, false)
ORDER BY x;
+------+-------+
| x    | y     |
+------+-------+
| NULL | false |
| 1    | true  |
| 9    | true  |
+------+-------+

Use the default sort order (ascending), but return null values last.

SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
      SELECT 9, true UNION ALL
      SELECT NULL, false)
ORDER BY x NULLS LAST;
+------+-------+
| x    | y     |
+------+-------+
| 1    | true  |
| 9    | true  |
| NULL | false |
+------+-------+

Use descending sort order.

SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
      SELECT 9, true UNION ALL
      SELECT NULL, false)
ORDER BY x DESC;
+------+-------+
| x    | y     |
+------+-------+
| 9    | true  |
| 1    | true  |
| NULL | false |
+------+-------+

Use descending sort order, but return null values first.

SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
      SELECT 9, true UNION ALL
      SELECT NULL, false)
ORDER BY x DESC NULLS FIRST;
+------+-------+
| x    | y     |
+------+-------+
| NULL | false |
| 9    | true  |
| 1    | true  |
+------+-------+

It is possible to order by multiple columns. In the example below, the result set is ordered first by SchoolID and then by LastName:

SELECT LastName, PointsScored, OpponentID
FROM PlayerStats
ORDER BY SchoolID, LastName;

The following rules apply when ordering values:

  • NULLs: In the context of the ORDER BY clause, NULLs are the minimum possible value; that is, NULLs appear first in ASC sorts and last in DESC sorts.
  • Floating point data types: see Floating Point Semantics on ordering and grouping.

When used in conjunction with set operators, the ORDER BY clause applies to the result set of the entire query; it does not apply only to the closest SELECT statement. For this reason, it can be helpful (though it is not required) to use parentheses to show the scope of the ORDER BY.

This query without parentheses:

SELECT * FROM Roster
UNION ALL
SELECT * FROM TeamMascot
ORDER BY SchoolID;

is equivalent to this query with parentheses:

( SELECT * FROM Roster
  UNION ALL
  SELECT * FROM TeamMascot )
ORDER BY SchoolID;

but is not equivalent to this query, where the ORDER BY clause applies only to the second SELECT statement:

SELECT * FROM Roster
UNION ALL
( SELECT * FROM TeamMascot
  ORDER BY SchoolID );

You can also use integer literals as column references in ORDER BY clauses. An integer literal becomes an ordinal (for example, counting starts at 1) into the SELECT list.

Example - the following two queries are equivalent:

SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY LastName
ORDER BY LastName;
SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY 2
ORDER BY 2;

QUALIFY clause

QUALIFY bool_expression

The QUALIFY clause filters the results of analytic functions. An analytic function is required to be present in the QUALIFY clause or the SELECT list.

Only rows whose bool_expression evaluates to TRUE are included. Rows whose bool_expression evaluates to NULL or FALSE are discarded.

The evaluation of a query with a QUALIFY clause is typically completed in this order:

  • FROM
  • WHERE
  • GROUP BY and aggregation
  • HAVING
  • WINDOW
  • QUALIFY
  • DISTINCT
  • ORDER BY
  • LIMIT

Limitations

The QUALIFY clause has an implementation limitation in that it must be used in conjunction with at least one of these clauses:

  • WHERE
  • GROUP BY
  • HAVING

Examples

The following query returns the most popular vegetables in the Produce table and their rank.

SELECT
  item,
  RANK() OVER (PARTITION BY category ORDER BY purchases DESC) as rank
FROM Produce
WHERE Produce.category = 'vegetable'
QUALIFY rank <= 3

+---------+------+
| item    | rank |
+---------+------+
| kale    | 1    |
| lettuce | 2    |
| cabbage | 3    |
+---------+------+

You don't have to include an analytic function in the SELECT list to use QUALIFY. The following query returns the most popular vegetables in the Produce table.

SELECT item
FROM Produce
WHERE Produce.category = 'vegetable'
QUALIFY RANK() OVER (PARTITION BY category ORDER BY purchases DESC) <= 3

+---------+
| item    |
+---------+
| kale    |
| lettuce |
| cabbage |
+---------+

WINDOW clause

WINDOW named_window_expression [, ...]

named_window_expression:
  named_window AS { named_window | ( [ window_specification ] ) }

A WINDOW clause defines a list of named windows. A named window represents a group of rows in a table upon which to use an analytic function. A named window can be defined with a window specification or reference another named window. If another named window is referenced, the definition of the referenced window must precede the referencing window.

Examples

These examples reference a table called Produce. They all return the same result. Note the different ways you can combine named windows and use them in an analytic function's OVER clause.

SELECT item, purchases, category, LAST_VALUE(item)
  OVER (item_window) AS most_popular
FROM Produce
WINDOW item_window AS (
  PARTITION BY category
  ORDER BY purchases
  ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
SELECT item, purchases, category, LAST_VALUE(item)
  OVER (d) AS most_popular
FROM Produce
WINDOW
  a AS (PARTITION BY category),
  b AS (a ORDER BY purchases),
  c AS (b ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING),
  d AS (c)
SELECT item, purchases, category, LAST_VALUE(item)
  OVER (c ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS most_popular
FROM Produce
WINDOW
  a AS (PARTITION BY category),
  b AS (a ORDER BY purchases),
  c AS b

Set operators

set_operation:
  query_expr set_operator query_expr

set_operator:
  UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT

Set operators combine results from two or more input queries into a single result set. You must specify ALL or DISTINCT; if you specify ALL, then all rows are retained. If DISTINCT is specified, duplicate rows are discarded.

If a given row R appears exactly m times in the first input query and n times in the second input query (m >= 0, n >= 0):

  • For UNION ALL, R appears exactly m + n times in the result.
  • For UNION DISTINCT, the DISTINCT is computed after the UNION is computed, so R appears exactly one time.
  • For INTERSECT DISTINCT, the DISTINCT is computed after the result above is computed.
  • For EXCEPT DISTINCT, row R appears once in the output if m > 0 and n = 0.
  • If there are more than two input queries, the above operations generalize and the output is the same as if the inputs were combined incrementally from left to right.

The following rules apply:

  • For set operations other than UNION ALL, all column types must support equality comparison.
  • The input queries on each side of the operator must return the same number of columns.
  • The operators pair the columns returned by each input query according to the columns' positions in their respective SELECT lists. That is, the first column in the first input query is paired with the first column in the second input query.
  • The result set always uses the column names from the first input query.
  • The result set always uses the supertypes of input types in corresponding columns, so paired columns must also have either the same data type or a common supertype.
  • You must use parentheses to separate different set operations; for this purpose, set operations such as UNION ALL and UNION DISTINCT are different. If the statement only repeats the same set operation, parentheses are not necessary.

Examples:

query1 UNION ALL (query2 UNION DISTINCT query3)
query1 UNION ALL query2 UNION ALL query3

Invalid:

query1 UNION ALL query2 UNION DISTINCT query3
query1 UNION ALL query2 INTERSECT ALL query3;  // INVALID.

UNION

The UNION operator combines the result sets of two or more input queries by pairing columns from the result set of each query and vertically concatenating them.

INTERSECT

The INTERSECT operator returns rows that are found in the result sets of both the left and right input queries. Unlike EXCEPT, the positioning of the input queries (to the left versus right of the INTERSECT operator) does not matter.

EXCEPT

The EXCEPT operator returns rows from the left input query that are not present in the right input query.

Example:

SELECT * FROM UNNEST(ARRAY<int64>[1, 2, 3]) AS number
EXCEPT DISTINCT SELECT 1;

+--------+
| number |
+--------+
| 2      |
| 3      |
+--------+

LIMIT and OFFSET clauses

LIMIT count [ OFFSET skip_rows ]

LIMIT specifies a non-negative count of type INT64, and no more than count rows will be returned. LIMIT 0 returns 0 rows.

If there is a set operation, LIMIT is applied after the set operation is evaluated.

OFFSET specifies a non-negative number of rows to skip before applying LIMIT. skip_rows is of type INT64.

These clauses accept only literal or parameter values. The rows that are returned by LIMIT and OFFSET are unspecified unless these operators are used after ORDER BY.

Examples:

SELECT *
FROM UNNEST(ARRAY<STRING>['a', 'b', 'c', 'd', 'e']) AS letter
ORDER BY letter ASC LIMIT 2

+---------+
| letter  |
+---------+
| a       |
| b       |
+---------+
SELECT *
FROM UNNEST(ARRAY<STRING>['a', 'b', 'c', 'd', 'e']) AS letter
ORDER BY letter ASC LIMIT 3 OFFSET 1

+---------+
| letter  |
+---------+
| b       |
| c       |
| d       |
+---------+

WITH clause

WITH cte[, ...]

A WITH clause contains one or more common table expressions (CTEs). A CTE acts like a temporary table that you can reference within a single query expression. Each CTE binds the results of a subquery to a table name, which can be used elsewhere in the same query expression, but rules apply.

BigQuery does not materialize the results of non-recursive CTEs inside the WITH clause. If a non-recursive CTE is referenced in multiple places in a query, then the CTE is executed once for each reference. The WITH clause with non-recursive CTEs is useful primarily for readability.

CTEs

cte:
    cte_name AS ( query_expr )

A common table expression (CTE) contains a subquery and a name associated with the CTE.

  • A CTE cannot reference itself.
  • A CTE can be referenced by the query expression that contains the WITH clause, but rules apply.
Examples

In this example, a WITH clause defines two CTEs that are referenced in the related set operation, where one CTE is referenced by each of the set operation's input query expressions:

WITH subQ1 AS (SELECT SchoolID FROM Roster),
     subQ2 AS (SELECT OpponentID FROM PlayerStats)
SELECT * FROM subQ1
UNION ALL
SELECT * FROM subQ2

You can break up more complex queries into a WITH clause and WITH SELECT statement instead of writing nested table subqueries. For example:

WITH q1 AS (my_query)
SELECT *
FROM
  (WITH q2 AS (SELECT * FROM q1) SELECT * FROM q2)
WITH q1 AS (my_query)
SELECT *
FROM
  (WITH q2 AS (SELECT * FROM q1),  # q1 resolves to my_query
        q3 AS (SELECT * FROM q1),  # q1 resolves to my_query
        q1 AS (SELECT * FROM q1),  # q1 (in the query) resolves to my_query
        q4 AS (SELECT * FROM q1)   # q1 resolves to the WITH subquery on the previous line.
    SELECT * FROM q1)              # q1 resolves to the third inner WITH subquery.

CTE rules and constraints

Common table expressions (CTEs) can be referenced inside the query expression that contains the WITH clause.

Here are some general rules and constraints to consider when working with CTEs:

  • Each CTE in the same WITH clause must have a unique name.
  • A CTE defined in a WITH clause is only visible to other CTEs in the same WITH clause that were defined after it.
  • A local CTE overrides an outer CTE or table with the same name.
  • A CTE on a subquery may not reference correlated columns from the outer query.

CTE visibility

References between common table expressions (CTEs) in the WITH clause can go backward but not forward.

This is what happens when you have two CTEs that reference themselves or each other in a WITH clause. Assume that A is the first CTE and B is the second CTE in the clause:

  • A references A = Invalid
  • A references B = Invalid
  • B references A = Valid
  • A references B references A = Invalid (cycles are not allowed)

This produces an error. A cannot reference itself because self-references are not supported:

WITH
  A AS (SELECT 1 AS n UNION ALL (SELECT n + 1 FROM A WHERE n < 3))
SELECT * FROM A

-- Error

This produces an error. A cannot reference B because references between CTEs can go backwards but not forwards:

WITH
  A AS (SELECT * FROM B),
  B AS (SELECT 1 AS n)
SELECT * FROM B

-- Error

B can reference A because references between CTEs can go backwards:

WITH
  A AS (SELECT 1 AS n),
  B AS (SELECT * FROM A)
SELECT * FROM B

+---+
| n |
+---+
| 1 |
+---+

This produces an error. A and B reference each other, which creates a cycle:

WITH
  A AS (SELECT * FROM B),
  B AS (SELECT * FROM A)
SELECT * FROM B

-- Error

Using aliases

An alias is a temporary name given to a table, column, or expression present in a query. You can introduce explicit aliases in the SELECT list or FROM clause, or BigQuery will infer an implicit alias for some expressions. Expressions with neither an explicit nor implicit alias are anonymous and the query cannot reference them by name.

Explicit aliases

You can introduce explicit aliases in either the FROM clause or the SELECT list.

In a FROM clause, you can introduce explicit aliases for any item, including tables, arrays, subqueries, and UNNEST clauses, using [AS] alias. The AS keyword is optional.

Example:

SELECT s.FirstName, s2.SongName
FROM Singers AS s, (SELECT * FROM Songs) AS s2;

You can introduce explicit aliases for any expression in the SELECT list using [AS] alias. The AS keyword is optional.

Example:

SELECT s.FirstName AS name, LOWER(s.FirstName) AS lname
FROM Singers s;

Implicit aliases

In the SELECT list, if there is an expression that does not have an explicit alias, BigQuery assigns an implicit alias according to the following rules. There can be multiple columns with the same alias in the SELECT list.

  • For identifiers, the alias is the identifier. For example, SELECT abc implies AS abc.
  • For path expressions, the alias is the last identifier in the path. For example, SELECT abc.def.ghi implies AS ghi.
  • For field access using the "dot" member field access operator, the alias is the field name. For example, SELECT (struct_function()).fname implies AS fname.

In all other cases, there is no implicit alias, so the column is anonymous and cannot be referenced by name. The data from that column will still be returned and the displayed query results may have a generated label for that column, but the label cannot be used like an alias.

In a FROM clause, from_items are not required to have an alias. The following rules apply:

  • If there is an expression that does not have an explicit alias, BigQuery assigns an implicit alias in these cases:
    • For identifiers, the alias is the identifier. For example, FROM abc implies AS abc.
    • For path expressions, the alias is the last identifier in the path. For example, FROM abc.def.ghi implies AS ghi
    • The column produced using WITH OFFSET has the implicit alias offset.
  • Table subqueries do not have implicit aliases.
  • FROM UNNEST(x) does not have an implicit alias.

Alias visibility

After you introduce an explicit alias in a query, there are restrictions on where else in the query you can reference that alias. These restrictions on alias visibility are the result of BigQuery's name scoping rules.

Visibility in the FROM clause

BigQuery processes aliases in a FROM clause from left to right, and aliases are visible only to subsequent path expressions in a FROM clause.

Example:

Assume the Singers table had a Concerts column of ARRAY type.

SELECT FirstName
FROM Singers AS s, s.Concerts;

Invalid:

SELECT FirstName
FROM s.Concerts, Singers AS s;  // INVALID.

FROM clause aliases are not visible to subqueries in the same FROM clause. Subqueries in a FROM clause cannot contain correlated references to other tables in the same FROM clause.

Invalid:

SELECT FirstName
FROM Singers AS s, (SELECT (2020 - ReleaseDate) FROM s)  // INVALID.

You can use any column name from a table in the FROM as an alias anywhere in the query, with or without qualification with the table name.

Example:

SELECT FirstName, s.ReleaseDate
FROM Singers s WHERE ReleaseDate = 1975;

If the FROM clause contains an explicit alias, you must use the explicit alias instead of the implicit alias for the remainder of the query (see Implicit Aliases). A table alias is useful for brevity or to eliminate ambiguity in cases such as self-joins, where the same table is scanned multiple times during query processing.

Example:

SELECT * FROM Singers as s, Songs as s2
ORDER BY s.LastName

Invalid — ORDER BY does not use the table alias:

SELECT * FROM Singers as s, Songs as s2
ORDER BY Singers.LastName;  // INVALID.

Visibility in the SELECT list

Aliases in the SELECT list are visible only to the following clauses:

  • GROUP BY clause
  • ORDER BY clause
  • HAVING clause

Example:

SELECT LastName AS last, SingerID
FROM Singers
ORDER BY last;

Visibility in the GROUP BY, ORDER BY, and HAVING clauses

These three clauses, GROUP BY, ORDER BY, and HAVING, can refer to only the following values:

  • Tables in the FROM clause and any of their columns.
  • Aliases from the SELECT list.

GROUP BY and ORDER BY can also refer to a third group:

  • Integer literals, which refer to items in the SELECT list. The integer 1 refers to the first item in the SELECT list, 2 refers to the second item, etc.

Example:

SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY 1
ORDER BY 2 DESC;

The previous query is equivalent to:

SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY sid
ORDER BY s2id DESC;

Duplicate aliases

A SELECT list or subquery containing multiple explicit or implicit aliases of the same name is allowed, as long as the alias name is not referenced elsewhere in the query, since the reference would be ambiguous.

When a top-level SELECT list contains duplicate column names and no destination table is specified, all duplicate columns, except for the first one, are automatically renamed to make them unique. The renamed columns appear in the query result.

Example:

SELECT 1 AS a, 2 AS a;

+---+-----+
| a | a_1 |
+---+-----+
| 1 | 2   |
+---+-----+

Duplicate column names in a table or view definition are not supported. These statements with queries that contain duplicate column names will fail:

CREATE TABLE my_dataset.my_table AS (SELECT 1 AS a, 2 AS a);
CREATE VIEW my_dataset.my_view AS (SELECT 1 AS a, 2 AS a);

Ambiguous aliases

BigQuery provides an error if accessing a name is ambiguous, meaning it can resolve to more than one unique object in the query or in a table schema, including the schema of a destination table.

Examples:

This query contains column names that conflict between tables, since both Singers and Songs have a column named SingerID:

SELECT SingerID
FROM Singers, Songs;

This query contains aliases that are ambiguous in the GROUP BY clause because they are duplicated in the SELECT list:

SELECT FirstName AS name, LastName AS name,
FROM Singers
GROUP BY name;

This query contains aliases that are ambiguous in the SELECT list and FROM clause because they share the same name. Assume table has columns x, y, and z. z is of type STRUCT and has fields v, w, and x.

Example:

SELECT x, z AS T
FROM table AS T
GROUP BY T.x;

The alias T is ambiguous and will produce an error because T.x in the GROUP BY clause could refer to either table.x or table.z.x.

A name is not ambiguous in GROUP BY, ORDER BY or HAVING if it is both a column name and a SELECT list alias, as long as the name resolves to the same underlying object.

Example:

SELECT LastName, BirthYear AS BirthYear
FROM Singers
GROUP BY BirthYear;

The alias BirthYear is not ambiguous because it resolves to the same underlying column, Singers.BirthYear.

Range variables

In BigQuery, a range variable is a table expression alias in the FROM clause. Sometimes a range variable is known as a table alias. A range variable lets you reference rows being scanned from a table expression. A table expression represents an item in the FROM clause that returns a table. Common items that this expression can represent include tables, value tables, subqueries, joins, and parenthesized joins.

In general, a range variable provides a reference to the rows of a table expression. A range variable can be used to qualify a column reference and unambiguously identify the related table, for example range_variable.column_1.

When referencing a range variable on its own without a specified column suffix, the result of a table expression is the row type of the related table. Value tables have explicit row types, so for range variables related to value tables, the result type is the value table's row type. Other tables do not have explicit row types, and for those tables, the range variable type is a dynamically defined STRUCT that includes all of the columns in the table.

Examples

In these examples, the WITH clause is used to emulate a temporary table called Grid. This table has columns x and y. A range variable called Coordinate refers to the current row as the table is scanned. Coordinate can be used to access the entire row or columns in the row.

The following example selects column x from range variable Coordinate, which in effect selects column x from table Grid.

WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate.x FROM Grid AS Coordinate;

+---+
| x |
+---+
| 1 |
+---+

The following example selects all columns from range variable Coordinate, which in effect selects all columns from table Grid.

WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate.* FROM Grid AS Coordinate;

+---+---+
| x | y |
+---+---+
| 1 | 2 |
+---+---+

The following example selects the range variable Coordinate, which is a reference to rows in table Grid. Since Grid is not a value table, the result type of Coordinate is a STRUCT that contains all the columns from Grid.

WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate FROM Grid AS Coordinate;

+--------------+
| Coordinate   |
+--------------+
| {x: 1, y: 2} |
+--------------+

Appendix A: examples with sample data

These examples include statements which perform queries on the Roster and TeamMascot, and PlayerStats tables.

GROUP BY clause

Example:

SELECT LastName, SUM(PointsScored)
FROM PlayerStats
GROUP BY LastName;
LastName SUM
Adams 7
Buchanan 13
Coolidge 1

UNION

The UNION operator combines the result sets of two or more SELECT statements by pairing columns from the result set of each SELECT statement and vertically concatenating them.

Example:

SELECT Mascot AS X, SchoolID AS Y
FROM TeamMascot
UNION ALL
SELECT LastName, PointsScored
FROM PlayerStats;

Results:

X Y
Jaguars 50
Knights 51
Lakers 52
Mustangs 53
Adams 3
Buchanan 0
Coolidge 1
Adams 4
Buchanan 13

INTERSECT

This query returns the last names that are present in both Roster and PlayerStats.

SELECT LastName
FROM Roster
INTERSECT DISTINCT
SELECT LastName
FROM PlayerStats;

Results:

LastName
Adams
Coolidge
Buchanan

EXCEPT

The query below returns last names in Roster that are not present in PlayerStats.

SELECT LastName
FROM Roster
EXCEPT DISTINCT
SELECT LastName
FROM PlayerStats;

Results:

LastName
Eisenhower
Davis

Reversing the order of the SELECT statements will return last names in PlayerStats that are not present in Roster:

SELECT LastName
FROM PlayerStats
EXCEPT DISTINCT
SELECT LastName
FROM Roster;

Results:

(empty)