Query statements scan one or more tables or expressions and return the computed result rows. This topic describes the syntax for SQL queries in BigQuery.
SQL syntax
query_statement: query_expr query_expr: [ WITH [ RECURSIVE ] { non_recursive_cte | recursive_cte }[, ...] ] { select | ( query_expr ) | set_operation } [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ LIMIT count [ OFFSET skip_rows ] ] select: SELECT [ { ALL | DISTINCT } ] [ AS { STRUCT | VALUE } ] select_list [ FROM from_clause[, ...] ] [ WHERE bool_expression ] [ GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) } ] [ HAVING bool_expression ] [ QUALIFY bool_expression ] [ WINDOW window_clause ]
Notation rules
- Square brackets
[ ]
indicate optional clauses. - Parentheses
( )
indicate literal parentheses. - The vertical bar
|
indicates a logicalOR
. - Curly braces
{ }
enclose a set of options. - A comma followed by an ellipsis within square brackets
[, ... ]
indicates that the preceding item can repeat in a comma-separated list.
Sample tables
The following tables are used to illustrate the behavior of different query clauses in this reference.
Roster table
The Roster
table includes a list of player names (LastName
) and the
unique ID assigned to their school (SchoolID
). It looks like this:
+-----------------------+
| LastName | SchoolID |
+-----------------------+
| Adams | 50 |
| Buchanan | 52 |
| Coolidge | 52 |
| Davis | 51 |
| Eisenhower | 77 |
+-----------------------+
You can use this WITH
clause to emulate a temporary table name for the
examples in this reference:
WITH Roster AS
(SELECT 'Adams' as LastName, 50 as SchoolID UNION ALL
SELECT 'Buchanan', 52 UNION ALL
SELECT 'Coolidge', 52 UNION ALL
SELECT 'Davis', 51 UNION ALL
SELECT 'Eisenhower', 77)
SELECT * FROM Roster
PlayerStats table
The PlayerStats
table includes a list of player names (LastName
) and the
unique ID assigned to the opponent they played in a given game (OpponentID
)
and the number of points scored by the athlete in that game (PointsScored
).
+----------------------------------------+
| LastName | OpponentID | PointsScored |
+----------------------------------------+
| Adams | 51 | 3 |
| Buchanan | 77 | 0 |
| Coolidge | 77 | 1 |
| Adams | 52 | 4 |
| Buchanan | 50 | 13 |
+----------------------------------------+
You can use this WITH
clause to emulate a temporary table name for the
examples in this reference:
WITH PlayerStats AS
(SELECT 'Adams' as LastName, 51 as OpponentID, 3 as PointsScored UNION ALL
SELECT 'Buchanan', 77, 0 UNION ALL
SELECT 'Coolidge', 77, 1 UNION ALL
SELECT 'Adams', 52, 4 UNION ALL
SELECT 'Buchanan', 50, 13)
SELECT * FROM PlayerStats
TeamMascot table
The TeamMascot
table includes a list of unique school IDs (SchoolID
) and the
mascot for that school (Mascot
).
+---------------------+
| SchoolID | Mascot |
+---------------------+
| 50 | Jaguars |
| 51 | Knights |
| 52 | Lakers |
| 53 | Mustangs |
+---------------------+
You can use this WITH
clause to emulate a temporary table name for the
examples in this reference:
WITH TeamMascot AS
(SELECT 50 as SchoolID, 'Jaguars' as Mascot UNION ALL
SELECT 51, 'Knights' UNION ALL
SELECT 52, 'Lakers' UNION ALL
SELECT 53, 'Mustangs')
SELECT * FROM TeamMascot
SELECT statement
SELECT [ { ALL | DISTINCT } ] [ AS { STRUCT | VALUE } ] select_list select_list: { select_all | select_expression } [, ...] select_all: [ expression. ]* [ EXCEPT ( column_name [, ...] ) ] [ REPLACE ( expression [ AS ] column_name [, ...] ) ] select_expression: expression [ [ AS ] alias ]
The SELECT
list defines the columns that the query will return. Expressions in
the SELECT
list can refer to columns in any of the from_item
s in its
corresponding FROM
clause.
Each item in the SELECT
list is one of:
*
expression
expression.*
SELECT *
SELECT *
, often referred to as select star, produces one output column for
each column that is visible after executing the full query.
SELECT * FROM (SELECT "apple" AS fruit, "carrot" AS vegetable);
+-------+-----------+
| fruit | vegetable |
+-------+-----------+
| apple | carrot |
+-------+-----------+
SELECT expression
Items in a SELECT
list can be expressions. These expressions evaluate to a
single value and produce one output column, with an optional explicit alias
.
If the expression does not have an explicit alias, it receives an implicit alias according to the rules for implicit aliases, if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query.
SELECT expression.*
An item in a SELECT
list can also take the form of expression.*
. This
produces one output column for each column or top-level field of expression
.
The expression must either be a table alias or evaluate to a single value of a
data type with fields, such as a STRUCT.
The following query produces one output column for each column in the table
groceries
, aliased as g
.
WITH groceries AS
(SELECT "milk" AS dairy,
"eggs" AS protein,
"bread" AS grain)
SELECT g.*
FROM groceries AS g;
+-------+---------+-------+
| dairy | protein | grain |
+-------+---------+-------+
| milk | eggs | bread |
+-------+---------+-------+
More examples:
WITH locations AS
(SELECT STRUCT("Seattle" AS city, "Washington" AS state) AS location
UNION ALL
SELECT STRUCT("Phoenix" AS city, "Arizona" AS state) AS location)
SELECT l.location.*
FROM locations l;
+---------+------------+
| city | state |
+---------+------------+
| Seattle | Washington |
| Phoenix | Arizona |
+---------+------------+
WITH locations AS
(SELECT ARRAY<STRUCT<city STRING, state STRING>>[("Seattle", "Washington"),
("Phoenix", "Arizona")] AS location)
SELECT l.LOCATION[offset(0)].*
FROM locations l;
+---------+------------+
| city | state |
+---------+------------+
| Seattle | Washington |
+---------+------------+
SELECT * EXCEPT
A SELECT * EXCEPT
statement specifies the names of one or more columns to
exclude from the result. All matching column names are omitted from the output.
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * EXCEPT (order_id)
FROM orders;
+-----------+----------+
| item_name | quantity |
+-----------+----------+
| sprocket | 200 |
+-----------+----------+
SELECT * REPLACE
A SELECT * REPLACE
statement specifies one or more
expression AS identifier
clauses. Each identifier must match a column name
from the SELECT *
statement. In the output column list, the column that
matches the identifier in a REPLACE
clause is replaced by the expression in
that REPLACE
clause.
A SELECT * REPLACE
statement does not change the names or order of columns.
However, it can change the value and the value type.
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * REPLACE ("widget" AS item_name)
FROM orders;
+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5 | widget | 200 |
+----------+-----------+----------+
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * REPLACE (quantity/2 AS quantity)
FROM orders;
+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5 | sprocket | 100 |
+----------+-----------+----------+
SELECT DISTINCT
A SELECT DISTINCT
statement discards duplicate rows and returns only the
remaining rows. SELECT DISTINCT
cannot return columns of the following types:
STRUCT
ARRAY
SELECT ALL
A SELECT ALL
statement returns all rows, including duplicate rows.
SELECT ALL
is the default behavior of SELECT
.
SELECT AS STRUCT
SELECT AS STRUCT expr [[AS] struct_field_name1] [,...]
This produces a value table with a
STRUCT row type, where the
STRUCT field names and types match the column names
and types produced in the SELECT
list.
Example:
SELECT ARRAY(SELECT AS STRUCT 1 a, 2 b)
SELECT AS STRUCT
can be used in a scalar or array subquery to produce a single
STRUCT type grouping multiple values together. Scalar
and array subqueries (see Subqueries) are normally not
allowed to return multiple columns, but can return a single column with
STRUCT type.
SELECT AS VALUE
SELECT AS VALUE
produces a value table from any
SELECT
list that produces exactly one column. Instead of producing an
output table with one column, possibly with a name, the output will be a
value table where the row type is just the value type that was produced in the
one SELECT
column. Any alias the column had will be discarded in the
value table.
Example:
SELECT AS VALUE STRUCT(1 AS a, 2 AS b) xyz
The query above produces a table with row type STRUCT<a int64, b int64>
.
FROM clause
FROM from_clause[, ...] from_clause: from_item [ { pivot_operator | unpivot_operator } ] [ tablesample_operator ] from_item: { table_name [ as_alias ] [ FOR SYSTEM_TIME AS OF timestamp_expression ] | { join_operation | ( join_operation ) } | ( query_expr ) [ as_alias ] | field_path | unnest_operator | cte_name [ as_alias ] } as_alias: [ AS ] alias
The FROM
clause indicates the table or tables from which to retrieve rows,
and specifies how to join those rows together to produce a single stream of
rows for processing in the rest of the query.
pivot_operator
See PIVOT operator.
unpivot_operator
See UNPIVOT operator.
tablesample_operator
See TABLESAMPLE operator.
table_name
The name (optionally qualified) of an existing table.
SELECT * FROM Roster; SELECT * FROM dataset.Roster; SELECT * FROM project.dataset.Roster;
FOR SYSTEM_TIME AS OF
FOR SYSTEM_TIME AS OF
references the historical versions of the table
definition and rows that were current at timestamp_expression
.
Limitations:
The source table in the FROM
clause containing FOR SYSTEM_TIME AS OF
must
not be any of the following:
- An
ARRAY
scan, including a flattened array or the output of theUNNEST
operator. - A common table expression defined by a
WITH
clause.
timestamp_expression
must be a constant expression. It cannot
contain the following:
- Subqueries.
- Correlated references (references to columns of a table that appear at
a higher level of the query statement, such as in the
SELECT
list). - User-defined functions (UDFs).
The value of timestamp_expression
cannot fall into the following ranges:
- After the current timestamp (in the future).
- More than seven (7) days before the current timestamp.
A single query statement cannot reference a single table at more than one point in time, including the current time. That is, a query can reference a table multiple times at the same timestamp, but not the current version and a historical version, or two different historical versions.
Examples:
The following query returns a historical version of the table from one hour ago.
SELECT *
FROM t
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
The following query returns a historical version of the table at an absolute point in time.
SELECT *
FROM t
FOR SYSTEM_TIME AS OF '2017-01-01 10:00:00-07:00';
The following query returns an error because the timestamp_expression
contains
a correlated reference to a column in the containing query.
SELECT *
FROM t1
WHERE t1.a IN (SELECT t2.a
FROM t2 FOR SYSTEM_TIME AS OF t1.timestamp_column);
The following operations show accessing a historical version of the table before table is replaced.
DECLARE before_replace_timestamp TIMESTAMP;
-- Create table books.
CREATE TABLE books AS
SELECT 'Hamlet' title, 'William Shakespeare' author;
-- Get current timestamp before table replacement.
SET before_replace_timestamp = CURRENT_TIMESTAMP();
-- Replace table with different schema(title and release_date).
CREATE OR REPLACE TABLE books AS
SELECT 'Hamlet' title, DATE '1603-01-01' release_date;
-- This query returns Hamlet, William Shakespeare as result.
SELECT * FROM books FOR SYSTEM_TIME AS OF before_replace_timestamp;
The following operations show accessing a historical version of the table before a DML job.
DECLARE JOB_START_TIMESTAMP TIMESTAMP;
-- Create table books.
CREATE OR REPLACE TABLE books AS
SELECT 'Hamlet' title, 'William Shakespeare' author;
-- Insert two rows into the books.
INSERT books (title, author)
VALUES('The Great Gatsby', 'F. Scott Fizgerald'),
('War and Peace', 'Leo Tolstoy');
SELECT * FROM books;
SET JOB_START_TIMESTAMP = (
SELECT start_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
WHERE job_type="QUERY"
AND statement_type="INSERT"
ORDER BY start_time DESC
LIMIT 1
);
-- This query only returns Hamlet, William Shakespeare as result.
SELECT * FROM books FOR SYSTEM_TIME AS OF JOB_START_TIMESTAMP;
The following query returns an error because the DML operates on the current version of the table, and a historical version of the table from one day ago.
INSERT INTO t1
SELECT * FROM t1
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
join_operation
See JOIN operation.
query_expr
( query_expr ) [ [ AS ] alias ]
is a table subquery.
field_path
In the FROM
clause, field_path
is any path that
resolves to a field within a data type. field_path
can go
arbitrarily deep into a nested data structure.
Some examples of valid field_path
values include:
SELECT * FROM T1 t1, t1.array_column;
SELECT * FROM T1 t1, t1.struct_column.array_field;
SELECT (SELECT ARRAY_AGG(c) FROM t1.array_column c) FROM T1 t1;
SELECT a.struct_field1 FROM T1 t1, t1.array_of_structs a;
SELECT (SELECT STRING_AGG(a.struct_field1) FROM t1.array_of_structs a) FROM T1 t1;
Field paths in the FROM
clause must end in an
array field. In
addition, field paths cannot contain arrays before the end of the path. For example, the path
array_column.some_array.some_array_field
is invalid because it
contains an array before the end of the path.
unnest_operator
See UNNEST operator.
cte_name
Common table expressions (CTEs) in a WITH
Clause act like
temporary tables that you can reference anywhere in the FROM
clause.
In the example below, subQ1
and subQ2
are CTEs.
Example:
WITH
subQ1 AS (SELECT * FROM Roster WHERE SchoolID = 52),
subQ2 AS (SELECT SchoolID FROM subQ1)
SELECT DISTINCT * FROM subQ2;
The WITH
clause hides any permanent tables with the same name
for the duration of the query, unless you qualify the table name, for example:
dataset.Roster
or project.dataset.Roster
.
UNNEST operator
unnest_operator: { UNNEST( array_expression ) | UNNEST( array_path ) | array_path } [ as_alias ] [ WITH OFFSET [ as_alias ] ] as_alias: [AS] alias
The UNNEST
operator takes an ARRAY
and returns a
table, with one row for each element in the ARRAY
.
You can also use UNNEST
outside of the FROM
clause with the
IN
operator.
For input ARRAY
s of most element types, the output of UNNEST
generally has
one column. This single column has an optional alias
, which you can use to
refer to the column elsewhere in the query. ARRAYS
with these element types
return multiple columns:
- STRUCT
UNNEST
destroys the order of elements in the input
ARRAY
. Use the optional WITH OFFSET
clause to
return a second column with the array element indexes.
For several ways to use UNNEST
, including construction, flattening, and
filtering, see Working with arrays.
UNNEST and STRUCTs
For an input ARRAY
of STRUCT
s, UNNEST
returns a row for each STRUCT
, with a separate column for each field in the
STRUCT
. The alias for each column is the name of the corresponding STRUCT
field.
Example:
SELECT *
FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')]);
+---+-----+
| x | y |
+---+-----+
| 3 | bar |
| 1 | foo |
+---+-----+
Because the UNNEST
operator returns a
value table,
you can alias UNNEST
to define a range variable that you can reference
elsewhere in the query. If you reference the range variable in the SELECT
list, the query returns a STRUCT
containing all of the fields of the original
STRUCT
in the input table.
Example:
SELECT *, struct_value
FROM UNNEST(ARRAY<STRUCT<x INT64, y STRING>>[(1, 'foo'), (3, 'bar')])
AS struct_value;
+---+-----+--------------+
| x | y | struct_value |
+---+-----+--------------+
| 3 | bar | {3, bar} |
| 1 | foo | {1, foo} |
+---+-----+--------------+
Explicit and implicit UNNEST
ARRAY
unnesting can be either explicit or implicit.
In explicit unnesting, array_expression
must return an
ARRAY
value but does not need to resolve to an ARRAY
, and the UNNEST
keyword is required.
Example:
SELECT * FROM UNNEST ([1, 2, 3]);
In implicit unnesting, array_path
must resolve to an ARRAY
and the
UNNEST
keyword is optional.
Example:
SELECT x
FROM mytable AS t,
t.struct_typed_column.array_typed_field1 AS x;
In this scenario, array_path
can go arbitrarily deep into a data
structure, but the last field must be ARRAY
-typed. No previous field in the
expression can be ARRAY
-typed because it is not possible to extract a named
field from an ARRAY
.
UNNEST and NULLs
UNNEST
treats NULL
s as follows:
NULL
and empty arrays produce zero rows.- An array containing
NULL
s produces rows containingNULL
values.
UNNEST and WITH OFFSET
The optional WITH OFFSET
clause returns a separate column containing the
offset value, in which counting starts at zero for each row produced by the
UNNEST
operation. This column has an optional alias; If the optional alias
is not used, the default column name is offset
.
Example:
SELECT * FROM UNNEST ([10,20,30]) as numbers WITH OFFSET;
+---------+--------+
| numbers | offset |
+---------+--------+
| 10 | 0 |
| 20 | 1 |
| 30 | 2 |
+---------+--------+
PIVOT operator
FROM from_item[, ...] pivot_operator pivot_operator: PIVOT( aggregate_function_call [as_alias][, ...] FOR input_column IN ( pivot_column [as_alias][, ...] ) ) [AS alias] as_alias: [AS] alias
The PIVOT
operator rotates rows into columns, using aggregation.
PIVOT
is part of the FROM
clause.
PIVOT
can be used to modify any table expression.- Combining
PIVOT
withFOR SYSTEM_TIME AS OF
is not allowed, although users may usePIVOT
against a subquery input which itself usesFOR SYSTEM_TIME AS OF
. - A
WITH OFFSET
clause immediately preceding thePIVOT
operator is not allowed.
Conceptual example:
-- Before PIVOT is used to rotate sales and quarter into Q1, Q2, Q3, Q4 columns:
+---------+-------+---------+------+
| product | sales | quarter | year |
+---------+-------+---------+------|
| Kale | 51 | Q1 | 2020 |
| Kale | 23 | Q2 | 2020 |
| Kale | 45 | Q3 | 2020 |
| Kale | 3 | Q4 | 2020 |
| Kale | 70 | Q1 | 2021 |
| Kale | 85 | Q2 | 2021 |
| Apple | 77 | Q1 | 2020 |
| Apple | 0 | Q2 | 2020 |
| Apple | 1 | Q1 | 2021 |
+---------+-------+---------+------+
-- After PIVOT is used to rotate sales and quarter into Q1, Q2, Q3, Q4 columns:
+---------+------+----+------+------+------+
| product | year | Q1 | Q2 | Q3 | Q4 |
+---------+------+----+------+------+------+
| Apple | 2020 | 77 | 0 | NULL | NULL |
| Apple | 2021 | 1 | NULL | NULL | NULL |
| Kale | 2020 | 51 | 23 | 45 | 3 |
| Kale | 2021 | 70 | 85 | NULL | NULL |
+---------+------+----+------+------+------+
Definitions
Top-level definitions:
from_item
: The table or subquery on which to perform a pivot operation. Thefrom_item
must follow these rules.pivot_operator
: The pivot operation to perform on afrom_item
.alias
: An alias to use for an item in the query.
pivot_operator
definitions:
aggregate_function_call
: An aggregate function call that aggregates all input rows such thatinput_column
matches a particular value inpivot_column
. Each aggregation corresponding to a differentpivot_column
value produces a different column in the output. Follow these rules when creating an aggregate function call.input_column
: Takes a column and retrieves the row values for the column, following these rules.pivot_column
: A pivot column to create for each aggregate function call. If an alias is not provided, a default alias is created. A pivot column value type must match the value type ininput_column
so that the values can be compared. It is possible to have a value inpivot_column
that does not match a value ininput_column
. Must be a constant and follow these rules.
Rules
Rules for a from_item
passed to PIVOT
:
- The
from_item
may consist of any table or subquery result. - The
from_item
may not produce a value table. - The
from_item
may not be a subquery usingSELECT AS STRUCT
.
Rules for aggregate_function_call
:
- Must be an aggregate function. For example,
SUM
. - You may reference columns in a table passed to
PIVOT
, as well as correlated columns, but may not access columns defined by thePIVOT
clause itself. - A table passed to
PIVOT
may be accessed through its alias if one is provided. - You can only use an aggregate function that takes one argument.
- Except for
COUNT
, you can only use aggregate functions that ignoreNULL
inputs. - If you are using
COUNT
, you can use*
as an argument.
- May access columns from the input table, as well as correlated columns,
not columns defined by the
PIVOT
clause, itself. - Evaluated against each row in the input table; aggregate and window function calls are prohibited.
- Non-determinism is okay.
- The type must be groupable.
- The input table may be accessed through its alias if one is provided.
- A
pivot_column
must be a constant. - Named constants, such as variables, are not supported.
- Query parameters are not supported.
- If a name is desired for a named constant or query parameter, specify it explicitly with an alias.
- Corner cases exist where a distinct
pivot_column
s can end up with the same default column names. For example, an input column might contain both aNULL
value and the string literal"NULL"
. When this happens, multiple pivot columns are created with the same name. To avoid this situation, use aliases for pivot column names. - If a
pivot_column
does not specify an alias, a column name is constructed as follows:
From | To | Example |
---|---|---|
NULL | NULL |
Input: NULL Output: "NULL" |
INT64 NUMERIC BIGNUMERIC |
The number in string format with the following rules:
|
Input: 1 Output: _1 Input: -1 Output: minus_1 Input: 1.0 Output: _1_point_0 |
BOOL | TRUE or FALSE . |
Input: TRUE Output: TRUE Input: FALSE Output: FALSE |
STRING | The string value. |
Input: "PlayerName" Output: PlayerName |
DATE | The date in _YYYY_MM_DD format. |
Input: DATE '2013-11-25' Output: _2013_11_25 |
ENUM | The name of the enumeration constant. |
Input: COLOR.RED Output: RED |
STRUCT |
A string formed by computing the pivot_column name for each
field and joining the results together with an underscore. The following
rules apply:
Due to implicit type coercion from the |
Input: STRUCT("one", "two") Output: one_two Input: STRUCT("one" AS a, "two" AS b) Output: one_a_two_b |
All other data types | Not supported. You must provide an alias. |
Examples
The following examples reference a table called Produce
that looks like this:
WITH Produce AS (
SELECT 'Kale' as product, 51 as sales, 'Q1' as quarter, 2020 as year UNION ALL
SELECT 'Kale', 23, 'Q2', 2020 UNION ALL
SELECT 'Kale', 45, 'Q3', 2020 UNION ALL
SELECT 'Kale', 3, 'Q4', 2020 UNION ALL
SELECT 'Kale', 70, 'Q1', 2021 UNION ALL
SELECT 'Kale', 85, 'Q2', 2021 UNION ALL
SELECT 'Apple', 77, 'Q1', 2020 UNION ALL
SELECT 'Apple', 0, 'Q2', 2020 UNION ALL
SELECT 'Apple', 1, 'Q1', 2021)
SELECT * FROM Produce
+---------+-------+---------+------+
| product | sales | quarter | year |
+---------+-------+---------+------|
| Kale | 51 | Q1 | 2020 |
| Kale | 23 | Q2 | 2020 |
| Kale | 45 | Q3 | 2020 |
| Kale | 3 | Q4 | 2020 |
| Kale | 70 | Q1 | 2021 |
| Kale | 85 | Q2 | 2021 |
| Apple | 77 | Q1 | 2020 |
| Apple | 0 | Q2 | 2020 |
| Apple | 1 | Q1 | 2021 |
+---------+-------+---------+------+
With the PIVOT
operator, the rows in the quarter
column are rotated into
these new columns: Q1
, Q2
, Q3
, Q4
. The aggregate function SUM
is
implicitly grouped by all unaggregated columns other than the pivot_column
:
product
and year
.
SELECT * FROM
Produce
PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))
+---------+------+----+------+------+------+
| product | year | Q1 | Q2 | Q3 | Q4 |
+---------+------+----+------+------+------+
| Apple | 2020 | 77 | 0 | NULL | NULL |
| Apple | 2021 | 1 | NULL | NULL | NULL |
| Kale | 2020 | 51 | 23 | 45 | 3 |
| Kale | 2021 | 70 | 85 | NULL | NULL |
+---------+------+----+------+------+------+
If you do not include year
, then SUM
is grouped only by product
.
SELECT * FROM
(SELECT product, sales, quarter FROM Produce)
PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))
+---------+-----+-----+------+------+
| product | Q1 | Q2 | Q3 | Q4 |
+---------+-----+-----+------+------+
| Apple | 78 | 0 | NULL | NULL |
| Kale | 121 | 108 | 45 | 3 |
+---------+-----+-----+------+------+
You can select a subset of values in the pivot_column
:
SELECT * FROM
(SELECT product, sales, quarter FROM Produce)
PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3')
+---------+-----+-----+------+
| product | Q1 | Q2 | Q3 |
+---------+-----+-----+------+
| Apple | 78 | 0 | NULL |
| Kale | 121 | 108 | 45 |
+---------+-----+-----+------+
SELECT * FROM
(SELECT sales, quarter FROM Produce)
PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3')
+-----+-----+----+
| Q1 | Q2 | Q3 |
+-----+-----+----+
| 199 | 108 | 45 |
+-----+-----+----+
You can include multiple aggregation functions in the PIVOT
. In this case, you
must specify an alias for each aggregation. These aliases are used to construct
the column names in the resulting table.
SELECT * FROM
(SELECT product, sales, quarter FROM Produce)
PIVOT(SUM(sales) total_sales, COUNT(*) num_records FOR quarter IN ('Q1', 'Q2'))
+--------+----------------+----------------+----------------+----------------+
|product | total_sales_Q1 | num_records_Q1 | total_sales_Q2 | num_records_Q2 |
+--------+----------------+----------------+----------------+----------------+
| Kale | 121 | 2 | 108 | 2 |
| Apple | 78 | 2 | 0 | 1 |
+--------+----------------+----------------+----------------+----------------+
UNPIVOT operator
FROM from_item[, ...] unpivot_operator unpivot_operator: UNPIVOT [ { INCLUDE NULLS | EXCLUDE NULLS } ] ( { single_column_unpivot | multi_column_unpivot } ) [unpivot_alias] single_column_unpivot: values_column FOR name_column IN (columns_to_unpivot) multi_column_unpivot: values_column_set FOR name_column IN (column_sets_to_unpivot) values_column_set: (values_column[, ...]) columns_to_unpivot: unpivot_column [row_value_alias][, ...] column_sets_to_unpivot: (unpivot_column [row_value_alias][, ...]) unpivot_alias and row_value_alias: [AS] alias
The UNPIVOT
operator rotates columns into rows. UNPIVOT
is part of the
FROM
clause.
UNPIVOT
can be used to modify any table expression.- Combining
UNPIVOT
withFOR SYSTEM_TIME AS OF
is not allowed, although users may useUNPIVOT
against a subquery input which itself usesFOR SYSTEM_TIME AS OF
. - A
WITH OFFSET
clause immediately preceding theUNPIVOT
operator is not allowed. PIVOT
aggregations cannot be reversed withUNPIVOT
.
Conceptual example:
-- Before UNPIVOT is used to rotate Q1, Q2, Q3, Q4 into sales and quarter columns:
+---------+----+----+----+----+
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Kale | 51 | 23 | 45 | 3 |
| Apple | 77 | 0 | 25 | 2 |
+---------+----+----+----+----+
-- After UNPIVOT is used to rotate Q1, Q2, Q3, Q4 into sales and quarter columns:
+---------+-------+---------+
| product | sales | quarter |
+---------+-------+---------+
| Kale | 51 | Q1 |
| Kale | 23 | Q2 |
| Kale | 45 | Q3 |
| Kale | 3 | Q4 |
| Apple | 77 | Q1 |
| Apple | 0 | Q2 |
| Apple | 25 | Q3 |
| Apple | 2 | Q4 |
+---------+-------+---------+
Definitions
Top-level definitions:
from_item
: The table or subquery on which to perform a pivot operation. Thefrom_item
must follow these rules.unpivot_operator
: The pivot operation to perform on afrom_item
.
unpivot_operator
definitions:
INCLUDE NULLS
: Add rows withNULL
values to the result.EXCLUDE NULLS
: Do not add rows withNULL
values to the result. By default,UNPIVOT
excludes rows withNULL
values.single_column_unpivot
: Rotates columns into onevalues_column
and onename_column
.multi_column_unpivot
: Rotates columns into multiplevalues_column
s and onename_column
.unpivot_alias
: An alias for the results of theUNPIVOT
operation. This alias can be referenced elsewhere in the query.
single_column_unpivot
definitions:
values_column
: A column to contain the row values fromcolumns_to_unpivot
. Follow these rules when creating a values column.name_column
: A column to contain the column names fromcolumns_to_unpivot
. Follow these rules when creating a name column.columns_to_unpivot
: The columns from thefrom_item
to populatevalues_column
andname_column
. Follow these rules when creating an unpivot column.row_value_alias
: An optional alias for a column that is displayed for the column inname_column
. If not specified, the string value of the column name is used. Follow these rules when creating a row value alias.
multi_column_unpivot
definitions:
values_column_set
: A set of columns to contain the row values fromcolumns_to_unpivot
. Follow these rules when creating a values column.name_column
: A set of columns to contain the column names fromcolumns_to_unpivot
. Follow these rules when creating a name column.column_sets_to_unpivot
: The columns from thefrom_item
to unpivot. Follow these rules when creating an unpivot column.row_value_alias
: An optional alias for a column set that is displayed for the column set inname_column
. If not specified, a string value for the column set is used and each column in the string is separated with an underscore (_
). For example,(col1, col2)
outputscol1_col2
. Follow these rules when creating a row value alias.
Rules
Rules for a from_item
passed to UNPIVOT
:
- The
from_item
may consist of any table or subquery result. - The
from_item
may not produce a value table. - Duplicate columns in a
from_item
cannot be referenced in theUNPIVOT
clause.
- Expressions are not permitted.
- Qualified names are not permitted. For example,
mytable.mycolumn
is not allowed. - In the case where the
UNPIVOT
result has duplicate column names:SELECT *
is allowed.SELECT values_column
causes ambiguity.
- It cannot be a name used for a
name_column
or anunpivot_column
. - It can be the same name as a column from the
from_item
.
- It cannot be a name used for a
values_column
or anunpivot_column
. - It can be the same name as a column from the
from_item
.
- Must be a column name from the
from_item
. - It cannot reference duplicate
from_item
column names. - All columns in a column set must have equivalent data types.
- Data types cannot be coerced to a common supertype.
- If the data types are exact matches (for example, a struct with different field names), the data type of the first input is the data type of the output.
- You cannot have the same name in the same column set. For example,
(emp1, emp1)
results in an error. - You can have a the same name in different column sets. For example,
(emp1, emp2), (emp1, emp3)
is valid.
- This can be a
STRING
or anINT64
literal. - The data type for all
row_value_alias
clauses must be the same. - If the value is an
INT64
, therow_value_alias
for eachunpivot_column
must be specified.
Examples
The following examples reference a table called Produce
that looks like this:
WITH Produce AS (
SELECT 'Kale' as product, 51 as Q1, 23 as Q2, 45 as Q3, 3 as Q4 UNION ALL
SELECT 'Apple', 77, 0, 25, 2)
SELECT * FROM Produce
+---------+----+----+----+----+
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Kale | 51 | 23 | 45 | 3 |
| Apple | 77 | 0 | 25 | 2 |
+---------+----+----+----+----+
With the UNPIVOT
operator, the columns Q1
, Q2
, Q3
, and Q4
are
rotated. The values of these columns now populate a new column called Sales
and the names of these columns now populate a new column called Quarter
.
This is a single-column unpivot operation.
SELECT * FROM Produce
UNPIVOT(sales FOR quarter IN (Q1, Q2, Q3, Q4))
+---------+-------+---------+
| product | sales | quarter |
+---------+-------+---------+
| Kale | 51 | Q1 |
| Kale | 23 | Q2 |
| Kale | 45 | Q3 |
| Kale | 3 | Q4 |
| Apple | 77 | Q1 |
| Apple | 0 | Q2 |
| Apple | 25 | Q3 |
| Apple | 2 | Q4 |
+---------+-------+---------+
In this example, we UNPIVOT
four quarters into two semesters.
This is a multi-column unpivot operation.
SELECT * FROM Produce
UNPIVOT(
(first_half_sales, second_half_sales)
FOR semesters
IN ((Q1, Q2) AS 'semester_1', (Q3, Q4) AS 'semester_2'))
+---------+------------------+-------------------+------------+
| product | first_half_sales | second_half_sales | semesters |
+---------+------------------+-------------------+------------+
| Kale | 51 | 23 | semester_1 |
| Kale | 45 | 3 | semester_2 |
| Apple | 77 | 0 | semester_1 |
| Apple | 25 | 2 | semester_2 |
+---------+------------------+-------------------+------------+
TABLESAMPLE operator
TABLESAMPLE SYSTEM ( percent PERCENT )
Description
You can use the TABLESAMPLE
operator to select a random sample of a dataset.
This operator is useful when you're working with tables that have large
amounts of data and you don't need precise answers.
Sampling returns a variety of records while avoiding the costs associated with
scanning and processing an entire table. Each execution of the query might
return different results because each execution processes an independently
computed sample. BigQuery does not cache the results of queries that
include a TABLESAMPLE
clause.
Replace percent
with the percentage of the dataset that you want to include in
the results. The value must be between 0
and 100
. The value can be a literal
value or a query parameter. It cannot be a variable.
For more information, see Table sampling.
Example
The following query selects approximately 10% of a table's data:
SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (10 PERCENT)
JOIN operation
join_operation: { cross_join_operation | condition_join_operation } cross_join_operation: from_item cross_join_operator from_item condition_join_operation: from_item condition_join_operator from_item join_condition cross_join_operator: { CROSS JOIN | , } condition_join_operator: { [INNER] JOIN | FULL [OUTER] JOIN | LEFT [OUTER] JOIN | RIGHT [OUTER] JOIN } join_condition: { on_clause | using_clause } on_clause: ON bool_expression using_clause: USING ( join_column [, ...] )
The JOIN
operation merges two from_item
s so that the SELECT
clause can
query them as one source. The join_type
and ON
or USING
clause (a
"join condition") specify how to combine and discard rows from the two
from_item
s to form a single source.
[INNER] JOIN
An INNER JOIN
, or simply JOIN
, effectively calculates the Cartesian product
of the two from_item
s and discards all rows that do not meet the join
condition. "Effectively" means that it is possible to implement an INNER JOIN
without actually calculating the Cartesian product.
FROM A INNER JOIN B ON A.w = B.y
Table A Table B Result
+-------+ +-------+ +---------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------+
| 1 | a | | 2 | k | | 2 | b | 2 | k |
| 2 | b | | 3 | m | | 3 | c | 3 | m |
| 3 | c | | 3 | n | | 3 | c | 3 | n |
| 3 | d | | 4 | p | | 3 | d | 3 | m |
+-------+ +-------+ | 3 | d | 3 | n |
+---------------+
FROM A INNER JOIN B USING (x)
Table A Table B Result
+-------+ +-------+ +-----------+
| x | y | * | x | z | = | x | y | z |
+-------+ +-------+ +-----------+
| 1 | a | | 2 | k | | 2 | b | k |
| 2 | b | | 3 | m | | 3 | c | m |
| 3 | c | | 3 | n | | 3 | c | n |
| 3 | d | | 4 | p | | 3 | d | m |
+-------+ +-------+ | 3 | d | n |
+-----------+
Example
This query performs an INNER JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
+---------------------------+
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
+---------------------------+
CROSS JOIN
CROSS JOIN
returns the Cartesian product of the two from_item
s. In other
words, it combines each row from the first from_item
with each row from the
second from_item
.
If the rows of the two from_item
s are independent, then the result has M *
N rows, given M rows in one from_item
and N in the other. Note that this
still holds for the case when either from_item
has zero rows.
In a FROM
clause, a CROSS JOIN
can be written like this:
FROM A CROSS JOIN B
Table A Table B Result
+-------+ +-------+ +---------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------+
| 1 | a | | 2 | c | | 1 | a | 2 | c |
| 2 | b | | 3 | d | | 1 | a | 3 | d |
+-------+ +-------+ | 2 | b | 2 | c |
| 2 | b | 3 | d |
+---------------+
You can use a correlated cross join to convert or
flatten an ARRAY
into a set of rows. To learn more, see
Convert elements in an array to rows in a table.
Examples
This query performs an CROSS JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster CROSS JOIN TeamMascot;
+---------------------------+
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Adams | Knights |
| Adams | Lakers |
| Adams | Mustangs |
| Buchanan | Jaguars |
| Buchanan | Knights |
| Buchanan | Lakers |
| Buchanan | Mustangs |
| ... |
+---------------------------+
Comma cross join (,)
CROSS JOIN
s can be written implicitly with a comma. This is
called a comma cross join.
A comma cross join looks like this in a FROM
clause:
FROM A, B
Table A Table B Result
+-------+ +-------+ +---------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------+
| 1 | a | | 2 | c | | 1 | a | 2 | c |
| 2 | b | | 3 | d | | 1 | a | 3 | d |
+-------+ +-------+ | 2 | b | 2 | c |
| 2 | b | 3 | d |
+---------------+
You cannot write comma cross joins inside parentheses. To learn more, see Join operations in a sequence.
FROM (A, B) // INVALID
You can use a correlated comma cross join to convert or
flatten an ARRAY
into a set of rows. To learn more, see
Convert elements in an array to rows in a table.
Examples
This query performs a comma cross join on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster, TeamMascot;
+---------------------------+
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Adams | Knights |
| Adams | Lakers |
| Adams | Mustangs |
| Buchanan | Jaguars |
| Buchanan | Knights |
| Buchanan | Lakers |
| Buchanan | Mustangs |
| ... |
+---------------------------+
FULL [OUTER] JOIN
A FULL OUTER JOIN
(or simply FULL JOIN
) returns all fields for all rows in
both from_item
s that meet the join condition.
FULL
indicates that all rows from both from_item
s are
returned, even if they do not meet the join condition.
OUTER
indicates that if a given row from one from_item
does not
join to any row in the other from_item
, the row will return with NULLs
for all columns from the other from_item
.
FROM A FULL OUTER JOIN B ON A.w = B.y
Table A Table B Result
+-------+ +-------+ +---------------------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------------------+
| 1 | a | | 2 | k | | 1 | a | NULL | NULL |
| 2 | b | | 3 | m | | 2 | b | 2 | k |
| 3 | c | | 3 | n | | 3 | c | 3 | m |
| 3 | d | | 4 | p | | 3 | c | 3 | n |
+-------+ +-------+ | 3 | d | 3 | m |
| 3 | d | 3 | n |
| NULL | NULL | 4 | p |
+---------------------------+
FROM A FULL OUTER JOIN B USING (x)
Table A Table B Result
+-------+ +-------+ +--------------------+
| x | y | * | x | z | = | x | y | z |
+-------+ +-------+ +--------------------+
| 1 | a | | 2 | k | | 1 | a | NULL |
| 2 | b | | 3 | m | | 2 | b | k |
| 3 | c | | 3 | n | | 3 | c | m |
| 3 | d | | 4 | p | | 3 | c | n |
+-------+ +-------+ | 3 | d | m |
| 3 | d | n |
| 4 | NULL | p |
+--------------------+
Example
This query performs a FULL JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster FULL JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
+---------------------------+
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
| Eisenhower | NULL |
| NULL | Mustangs |
+---------------------------+
LEFT [OUTER] JOIN
The result of a LEFT OUTER JOIN
(or simply LEFT JOIN
) for two
from_item
s always retains all rows of the left from_item
in the
JOIN
operation, even if no rows in the right from_item
satisfy the join
predicate.
LEFT
indicates that all rows from the left from_item
are
returned; if a given row from the left from_item
does not join to any row
in the right from_item
, the row will return with NULLs for all
columns from the right from_item
. Rows from the right from_item
that
do not join to any row in the left from_item
are discarded.
FROM A LEFT OUTER JOIN B ON A.w = B.y
Table A Table B Result
+-------+ +-------+ +---------------------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------------------+
| 1 | a | | 2 | k | | 1 | a | NULL | NULL |
| 2 | b | | 3 | m | | 2 | b | 2 | k |
| 3 | c | | 3 | n | | 3 | c | 3 | m |
| 3 | d | | 4 | p | | 3 | c | 3 | n |
+-------+ +-------+ | 3 | d | 3 | m |
| 3 | d | 3 | n |
+---------------------------+
FROM A LEFT OUTER JOIN B USING (x)
Table A Table B Result
+-------+ +-------+ +--------------------+
| x | y | * | x | z | = | x | y | z |
+-------+ +-------+ +--------------------+
| 1 | a | | 2 | k | | 1 | a | NULL |
| 2 | b | | 3 | m | | 2 | b | k |
| 3 | c | | 3 | n | | 3 | c | m |
| 3 | d | | 4 | p | | 3 | c | n |
+-------+ +-------+ | 3 | d | m |
| 3 | d | n |
+--------------------+
Example
This query performs a LEFT JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster LEFT JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
+---------------------------+
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
| Eisenhower | NULL |
+---------------------------+
RIGHT [OUTER] JOIN
The result of a RIGHT OUTER JOIN
(or simply RIGHT JOIN
) is similar and
symmetric to that of LEFT OUTER JOIN
.
FROM A RIGHT OUTER JOIN B ON A.w = B.y
Table A Table B Result
+-------+ +-------+ +---------------------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------------------+
| 1 | a | | 2 | k | | 2 | b | 2 | k |
| 2 | b | | 3 | m | | 3 | c | 3 | m |
| 3 | c | | 3 | n | | 3 | c | 3 | n |
| 3 | d | | 4 | p | | 3 | d | 3 | m |
+-------+ +-------+ | 3 | d | 3 | n |
| NULL | NULL | 4 | p |
+---------------------------+
FROM A RIGHT OUTER JOIN B USING (x)
Table A Table B Result
+-------+ +-------+ +--------------------+
| x | y | * | x | z | = | x | y | z |
+-------+ +-------+ +--------------------+
| 1 | a | | 2 | k | | 2 | b | k |
| 2 | b | | 3 | m | | 3 | c | m |
| 3 | c | | 3 | n | | 3 | c | n |
| 3 | d | | 4 | p | | 3 | d | m |
+-------+ +-------+ | 3 | d | n |
| 4 | NULL | p |
+--------------------+
Example
This query performs a RIGHT JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster RIGHT JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
+---------------------------+
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
| NULL | Mustangs |
+---------------------------+
ON clause
A combined row (the result of joining two rows) meets the ON
join condition
if join condition returns TRUE
.
FROM A JOIN B ON A.x = B.x
Table A Table B Result (A.x, B.x)
+---+ +---+ +-------+
| x | * | x | = | x | x |
+---+ +---+ +-------+
| 1 | | 2 | | 2 | 2 |
| 2 | | 3 | | 3 | 3 |
| 3 | | 4 | +-------+
+---+ +---+
Example
This query performs an INNER JOIN
on the
Roster
and TeamMascot
table.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
+---------------------------+
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
+---------------------------+
USING clause
The USING
clause requires a column list of one or more columns which
occur in both input tables. It performs an equality comparison on that column,
and the rows meet the join condition if the equality comparison returns TRUE
.
FROM A JOIN B USING (x)
Table A Table B Result
+---+ +---+ +---+
| x | * | x | = | x |
+---+ +---+ +---+
| 1 | | 2 | | 2 |
| 2 | | 3 | | 3 |
| 3 | | 4 | +---+
+---+ +---+
Example
This query performs an INNER JOIN
on the
Roster
and TeamMascot
table.
This statement returns the rows from Roster
and TeamMascot
where
Roster.SchoolID
is the same as TeamMascot.SchoolID
. The results include a
single SchoolID
column.
SELECT * FROM Roster INNER JOIN TeamMascot USING (SchoolID);
+----------------------------------------+
| SchoolID | LastName | Mascot |
+----------------------------------------+
| 50 | Adams | Jaguars |
| 52 | Buchanan | Lakers |
| 52 | Coolidge | Lakers |
| 51 | Davis | Knights |
+----------------------------------------+
ON and USING equivalency
The ON
and USING
keywords are not equivalent, but they are similar.
ON
returns multiple columns, and USING
returns one.
FROM A JOIN B ON A.x = B.x
FROM A JOIN B USING (x)
Table A Table B Result ON Result USING
+---+ +---+ +-------+ +---+
| x | * | x | = | x | x | | x |
+---+ +---+ +-------+ +---+
| 1 | | 2 | | 2 | 2 | | 2 |
| 2 | | 3 | | 3 | 3 | | 3 |
| 3 | | 4 | +-------+ +---+
+---+ +---+
Although ON
and USING
are not equivalent, they can return the same results
if you specify the columns you want to return.
SELECT x FROM A JOIN B USING (x);
SELECT A.x FROM A JOIN B ON A.x = B.x;
Table A Table B Result
+---+ +---+ +---+
| x | * | x | = | x |
+---+ +---+ +---+
| 1 | | 2 | | 2 |
| 2 | | 3 | | 3 |
| 3 | | 4 | +---+
+---+ +---+
Join operations in a sequence
The FROM
clause can contain multiple JOIN
operations in a sequence.
JOIN
s are bound from left to right. For example:
FROM A JOIN B USING (x) JOIN C USING (x)
-- A JOIN B USING (x) = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2 = return value
You can also insert parentheses to group JOIN
s:
FROM ( (A JOIN B USING (x)) JOIN C USING (x) )
-- A JOIN B USING (x) = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2 = return value
With parentheses, you can group JOIN
s so that they are bound in a different
order:
FROM ( A JOIN (B JOIN C USING (x)) USING (x) )
-- B JOIN C USING (x) = result_1
-- A JOIN result_1 = result_2
-- result_2 = return value
A FROM
clause can have multiple joins. Provided there are no comma cross joins
in the FROM
clause, joins do not require parenthesis, though parenthesis can
help readability:
FROM A JOIN B JOIN C JOIN D USING (w) ON B.x = C.y ON A.z = B.x
If your clause contains comma cross joins, you must use parentheses:
FROM A, B JOIN C JOIN D ON C.x = D.y ON B.z = C.x // INVALID
FROM A, B JOIN (C JOIN D ON C.x = D.y) ON B.z = C.x // VALID
When comma cross joins are present in a query with a sequence of JOINs, they
group from left to right like other JOIN
types:
FROM A JOIN B USING (x) JOIN C USING (x), D
-- A JOIN B USING (x) = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2 CROSS JOIN D = return value
There cannot be a RIGHT JOIN
or FULL JOIN
after a comma cross join unless it
is parenthsized:
FROM A, B RIGHT JOIN C ON TRUE // INVALID
FROM A, B FULL JOIN C ON TRUE // INVALID
FROM A, B JOIN C ON TRUE // VALID
FROM A, (B RIGHT JOIN C ON TRUE) // VALID
FROM A, (B FULL JOIN C ON TRUE) // VALID
Correlated join operation
A join operation is correlated when the right from_item
contains a
reference to at least one range variable or
column name introduced by the left from_item
.
In a correlated join operation, rows from the right from_item
are determined
by a row from the left from_item
. Consequently, RIGHT OUTER
and FULL OUTER
joins cannot be correlated because right from_item
rows cannot be determined
in the case when there is no row from the left from_item
.
All correlated join operations must reference an array in the right from_item
.
This is a conceptual example of a correlated join operation that includes a correlated subquery:
FROM A JOIN UNNEST(ARRAY(SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)) AS C
- Left
from_item
:A
- Right
from_item
:UNNEST(...) AS C
- A correlated subquery:
(SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)
This is another conceptual example of a correlated join operation.
array_of_IDs
is part of the left from_item
but is referenced in the
right from_item
.
FROM A JOIN UNNEST(A.array_of_IDs) AS C
The UNNEST
operator can be explicit or implicit.
These are both allowed:
FROM A JOIN UNNEST(A.array_of_IDs) AS IDs
FROM A JOIN A.array_of_IDs AS IDs
In a correlated join operation, the right from_item
is re-evaluated
against each distinct row from the left from_item
. In the following
conceptual example, the correlated join operation first
evaluates A
and B
, then A
and C
:
FROM
A
JOIN
UNNEST(ARRAY(SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)) AS C
ON A.Name = C.Name
Caveats
- In a correlated
LEFT JOIN
, when the input table on the right side is empty for some row from the left side, it is as if no rows from the right side satisfied the join condition in a regularLEFT JOIN
. When there are no joining rows, a row withNULL
values for all columns on the right side is generated to join with the row from the left side. - In a correlated
CROSS JOIN
, when the input table on the right side is empty for some row from the left side, it is as if no rows from the right side satisfied the join condition in a regular correlatedINNER JOIN
. This means that the row is dropped from the results.
Examples
This is an example of a correlated join, using the Roster and PlayerStats tables:
SELECT *
FROM
Roster
JOIN
UNNEST(
ARRAY(
SELECT AS STRUCT *
FROM PlayerStats
WHERE PlayerStats.OpponentID = Roster.SchoolID
)) AS PlayerMatches
ON PlayerMatches.LastName = 'Buchanan'
+------------+----------+----------+------------+--------------+
| LastName | SchoolID | LastName | OpponentID | PointsScored |
+------------+----------+----------+------------+--------------+
| Adams | 50 | Buchanan | 50 | 13 |
| Eisenhower | 77 | Buchanan | 77 | 0 |
+------------+----------+----------+------------+--------------+
A common pattern for a correlated LEFT JOIN
is to have an UNNEST
operation
on the right side that references an array from some column introduced by
input on the left side. For rows where that array is empty or NULL
,
the UNNEST
operation produces no rows on the right input. In that case, a row
with a NULL
entry in each column of the right input is created to join with
the row from the left input. For example:
SELECT A.name, item, ARRAY_LENGTH(A.items) item_count_for_name
FROM
UNNEST(
[
STRUCT(
'first' AS name,
[1, 2, 3, 4] AS items),
STRUCT(
'second' AS name,
[] AS items)]) AS A
LEFT JOIN
A.items AS item;
+--------+------+---------------------+
| name | item | item_count_for_name |
+--------+------+---------------------+
| first | 1 | 4 |
| first | 2 | 4 |
| first | 3 | 4 |
| first | 4 | 4 |
| second | NULL | 0 |
+--------+------+---------------------+
In the case of a correlated CROSS JOIN
, when the input on the right side
is empty for some row from the left side, the final row is dropped from the
results. For example:
SELECT A.name, item
FROM
UNNEST(
[
STRUCT(
'first' AS name,
[1, 2, 3, 4] AS items),
STRUCT(
'second' AS name,
[] AS items)]) AS A
CROSS JOIN
A.items AS item;
+-------+------+
| name | item |
+-------+------+
| first | 1 |
| first | 2 |
| first | 3 |
| first | 4 |
+-------+------+
WHERE clause
WHERE bool_expression
The WHERE
clause filters the results of the FROM
clause.
Only rows whose bool_expression
evaluates to TRUE
are included. Rows
whose bool_expression
evaluates to NULL
or FALSE
are
discarded.
The evaluation of a query with a WHERE
clause is typically completed in this
order:
FROM
WHERE
GROUP BY
and aggregationHAVING
WINDOW
QUALIFY
DISTINCT
ORDER BY
LIMIT
Evaluation order does not always match syntax order.
The WHERE
clause only references columns available via the FROM
clause;
it cannot reference SELECT
list aliases.
Examples
This query returns returns all rows from the Roster
table
where the SchoolID
column has the value 52
:
SELECT * FROM Roster
WHERE SchoolID = 52;
The bool_expression
can contain multiple sub-conditions:
SELECT * FROM Roster
WHERE STARTS_WITH(LastName, "Mc") OR STARTS_WITH(LastName, "Mac");
Expressions in an INNER JOIN
have an equivalent expression in the
WHERE
clause. For example, a query using INNER
JOIN
and ON
has an
equivalent expression using CROSS JOIN
and WHERE
. For example,
the following two queries are equivalent:
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster INNER JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster CROSS JOIN TeamMascot
WHERE Roster.SchoolID = TeamMascot.SchoolID;
GROUP BY clause
GROUP BY { expression [, ...] | ROLLUP ( expression [, ...] ) }
The GROUP BY
clause groups together rows in a table with non-distinct values
for the expression
in the GROUP BY
clause. For multiple rows in the
source table with non-distinct values for expression
, the
GROUP BY
clause produces a single combined row. GROUP BY
is commonly used
when aggregate functions are present in the SELECT
list, or to eliminate
redundancy in the output. The data type of expression
must be
groupable.
Example:
SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY LastName;
The GROUP BY
clause can refer to expression names in the SELECT
list. The
GROUP BY
clause also allows ordinal references to expressions in the SELECT
list using integer values. 1
refers to the first expression in the
SELECT
list, 2
the second, and so forth. The expression list can combine
ordinals and expression names.
Example:
SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY LastName, FirstName;
The query above is equivalent to:
SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY 2, FirstName;
GROUP BY
clauses may also refer to aliases. If a query contains aliases in
the SELECT
clause, those aliases override names in the corresponding FROM
clause.
Example:
SELECT SUM(PointsScored), LastName as last_name
FROM PlayerStats
GROUP BY last_name;
GROUP BY ROLLUP
returns the results of GROUP BY
for
prefixes of the expressions in the ROLLUP
list, each of which is known as a
grouping set. For the ROLLUP
list (a, b, c)
, the grouping sets are
(a, b, c)
, (a, b)
, (a)
, ()
. When evaluating the results of GROUP BY
for a particular grouping set, GROUP BY ROLLUP
treats expressions that are not
in the grouping set as having a NULL
value. A SELECT
statement like this
one:
SELECT a, b, SUM(c) FROM Input GROUP BY ROLLUP(a, b);
uses the rollup list (a, b)
. The result will include the
results of GROUP BY
for the grouping sets (a, b)
, (a)
, and ()
, which
includes all rows. This returns the same rows as:
SELECT NULL, NULL, SUM(c) FROM Input UNION ALL
SELECT a, NULL, SUM(c) FROM Input GROUP BY a UNION ALL
SELECT a, b, SUM(c) FROM Input GROUP BY a, b;
This allows the computation of aggregates for the grouping sets defined by the
expressions in the ROLLUP
list and the prefixes of that list.
Example:
WITH Sales AS (
SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
SELECT 123, 1, 8.99 UNION ALL
SELECT 456, 1, 4.56 UNION ALL
SELECT 123, 2, 9.99 UNION ALL
SELECT 789, 3, 1.00 UNION ALL
SELECT 456, 3, 4.25 UNION ALL
SELECT 789, 3, 0.99
)
SELECT
day,
SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(day);
The query above outputs a row for each day in addition to the rolled up total
across all days, as indicated by a NULL
day:
+------+-------+
| day | total |
+------+-------+
| NULL | 39.77 |
| 1 | 23.54 |
| 2 | 9.99 |
| 3 | 6.24 |
+------+-------+
Example:
WITH Sales AS (
SELECT 123 AS sku, 1 AS day, 9.99 AS price UNION ALL
SELECT 123, 1, 8.99 UNION ALL
SELECT 456, 1, 4.56 UNION ALL
SELECT 123, 2, 9.99 UNION ALL
SELECT 789, 3, 1.00 UNION ALL
SELECT 456, 3, 4.25 UNION ALL
SELECT 789, 3, 0.99
)
SELECT
sku,
day,
SUM(price) AS total
FROM Sales
GROUP BY ROLLUP(sku, day)
ORDER BY sku, day;
The query above returns rows grouped by the following grouping sets:
- sku and day
- sku (day is
NULL
) - The empty grouping set (day and sku are
NULL
)
The sums for these grouping sets correspond to the total for each distinct sku-day combination, the total for each sku across all days, and the grand total:
+------+------+-------+
| sku | day | total |
+------+------+-------+
| NULL | NULL | 39.77 |
| 123 | NULL | 28.97 |
| 123 | 1 | 18.98 |
| 123 | 2 | 9.99 |
| 456 | NULL | 8.81 |
| 456 | 1 | 4.56 |
| 456 | 3 | 4.25 |
| 789 | 3 | 1.99 |
| 789 | NULL | 1.99 |
+------+------+-------+
HAVING clause
HAVING bool_expression
The HAVING
clause filters the results produced by GROUP BY
or
aggregation. GROUP BY
or aggregation must be present in the query. If
aggregation is present, the HAVING
clause is evaluated once for every
aggregated row in the result set.
Only rows whose bool_expression
evaluates to TRUE
are included. Rows
whose bool_expression
evaluates to NULL
or FALSE
are
discarded.
The evaluation of a query with a HAVING
clause is typically completed in this
order:
FROM
WHERE
GROUP BY
and aggregationHAVING
WINDOW
QUALIFY
DISTINCT
ORDER BY
LIMIT
Evaluation order does not always match syntax order.
The HAVING
clause references columns available via the FROM
clause, as
well as SELECT
list aliases. Expressions referenced in the HAVING
clause
must either appear in the GROUP BY
clause or they must be the result of an
aggregate function:
SELECT LastName
FROM Roster
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
If a query contains aliases in the SELECT
clause, those aliases override names
in a FROM
clause.
SELECT LastName, SUM(PointsScored) AS ps
FROM Roster
GROUP BY LastName
HAVING ps > 0;
Mandatory aggregation
Aggregation does not have to be present in the HAVING
clause itself, but
aggregation must be present in at least one of the following forms:
Aggregation function in the SELECT
list.
SELECT LastName, SUM(PointsScored) AS total
FROM PlayerStats
GROUP BY LastName
HAVING total > 15;
Aggregation function in the 'HAVING' clause.
SELECT LastName
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
Aggregation in both the SELECT
list and HAVING
clause.
When aggregation functions are present in both the SELECT
list and HAVING
clause, the aggregation functions and the columns they reference do not need
to be the same. In the example below, the two aggregation functions,
COUNT()
and SUM()
, are different and also use different columns.
SELECT LastName, COUNT(*)
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
ORDER BY clause
ORDER BY expression [{ ASC | DESC }] [{ NULLS FIRST | NULLS LAST }] [, ...]
The ORDER BY
clause specifies a column or expression as the sort criterion for
the result set. If an ORDER BY clause is not present, the order of the results
of a query is not defined. Column aliases from a FROM
clause or SELECT
list
are allowed. If a query contains aliases in the SELECT
clause, those aliases
override names in the corresponding FROM
clause. The data type of
expression
must be orderable.
Optional Clauses
NULLS FIRST | NULLS LAST
:NULLS FIRST
: Sort null values before non-null values.NULLS LAST
. Sort null values after non-null values.
ASC | DESC
: Sort the results in ascending or descending order ofexpression
values.ASC
is the default value. If null ordering is not specified withNULLS FIRST
orNULLS LAST
:NULLS FIRST
is applied by default if the sort order is ascending.NULLS LAST
is applied by default if the sort order is descending.
Examples
Use the default sort order (ascending).
SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
SELECT 9, true UNION ALL
SELECT NULL, false)
ORDER BY x;
+------+-------+
| x | y |
+------+-------+
| NULL | false |
| 1 | true |
| 9 | true |
+------+-------+
Use the default sort order (ascending), but return null values last.
SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
SELECT 9, true UNION ALL
SELECT NULL, false)
ORDER BY x NULLS LAST;
+------+-------+
| x | y |
+------+-------+
| 1 | true |
| 9 | true |
| NULL | false |
+------+-------+
Use descending sort order.
SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
SELECT 9, true UNION ALL
SELECT NULL, false)
ORDER BY x DESC;
+------+-------+
| x | y |
+------+-------+
| 9 | true |
| 1 | true |
| NULL | false |
+------+-------+
Use descending sort order, but return null values first.
SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
SELECT 9, true UNION ALL
SELECT NULL, false)
ORDER BY x DESC NULLS FIRST;
+------+-------+
| x | y |
+------+-------+
| NULL | false |
| 9 | true |
| 1 | true |
+------+-------+
It is possible to order by multiple columns. In the example below, the result
set is ordered first by SchoolID
and then by LastName
:
SELECT LastName, PointsScored, OpponentID
FROM PlayerStats
ORDER BY SchoolID, LastName;
When used in conjunction with
set operators,
the ORDER BY
clause applies to the result set of the entire query; it does not
apply only to the closest SELECT
statement. For this reason, it can be helpful
(though it is not required) to use parentheses to show the scope of the ORDER
BY
.
This query without parentheses:
SELECT * FROM Roster
UNION ALL
SELECT * FROM TeamMascot
ORDER BY SchoolID;
is equivalent to this query with parentheses:
( SELECT * FROM Roster
UNION ALL
SELECT * FROM TeamMascot )
ORDER BY SchoolID;
but is not equivalent to this query, where the ORDER BY
clause applies only to
the second SELECT
statement:
SELECT * FROM Roster
UNION ALL
( SELECT * FROM TeamMascot
ORDER BY SchoolID );
You can also use integer literals as column references in ORDER BY
clauses. An
integer literal becomes an ordinal (for example, counting starts at 1) into
the SELECT
list.
Example - the following two queries are equivalent:
SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY LastName
ORDER BY LastName;
SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY 2
ORDER BY 2;
QUALIFY clause
QUALIFY bool_expression
The QUALIFY
clause filters the results of window functions.
A window function is required to be present in the QUALIFY
clause or the
SELECT
list.
Only rows whose bool_expression
evaluates to TRUE
are included. Rows
whose bool_expression
evaluates to NULL
or FALSE
are
discarded.
The evaluation of a query with a QUALIFY
clause is typically completed in this
order:
FROM
WHERE
GROUP BY
and aggregationHAVING
WINDOW
QUALIFY
DISTINCT
ORDER BY
LIMIT
Evaluation order does not always match syntax order.
Examples
The following query returns the most popular vegetables in the
Produce
table and their rank.
SELECT
item,
RANK() OVER (PARTITION BY category ORDER BY purchases DESC) as rank
FROM Produce
WHERE Produce.category = 'vegetable'
QUALIFY rank <= 3
+---------+------+
| item | rank |
+---------+------+
| kale | 1 |
| lettuce | 2 |
| cabbage | 3 |
+---------+------+
You don't have to include a window function in the SELECT
list to use
QUALIFY
. The following query returns the most popular vegetables in the
Produce
table.
SELECT item
FROM Produce
WHERE Produce.category = 'vegetable'
QUALIFY RANK() OVER (PARTITION BY category ORDER BY purchases DESC) <= 3
+---------+
| item |
+---------+
| kale |
| lettuce |
| cabbage |
+---------+
WINDOW clause
WINDOW named_window_expression [, ...] named_window_expression: named_window AS { named_window | ( [ window_specification ] ) }
A WINDOW
clause defines a list of named windows.
A named window represents a group of rows in a table upon which to use a
window function. A named window can be defined with
a window specification or reference another
named window. If another named window is referenced, the definition of the
referenced window must precede the referencing window.
Examples
These examples reference a table called Produce
.
They all return the same result. Note the different
ways you can combine named windows and use them in a window function's
OVER
clause.
SELECT item, purchases, category, LAST_VALUE(item)
OVER (item_window) AS most_popular
FROM Produce
WINDOW item_window AS (
PARTITION BY category
ORDER BY purchases
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
SELECT item, purchases, category, LAST_VALUE(item)
OVER (d) AS most_popular
FROM Produce
WINDOW
a AS (PARTITION BY category),
b AS (a ORDER BY purchases),
c AS (b ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING),
d AS (c)
SELECT item, purchases, category, LAST_VALUE(item)
OVER (c ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS most_popular
FROM Produce
WINDOW
a AS (PARTITION BY category),
b AS (a ORDER BY purchases),
c AS b
Set operators
set_operation: query_expr set_operator query_expr set_operator: UNION { ALL | DISTINCT } | INTERSECT DISTINCT | EXCEPT DISTINCT
Set operators combine results from two or
more input queries into a single result set. You
must specify ALL
or DISTINCT
; if you specify ALL
, then all rows are
retained. If DISTINCT
is specified, duplicate rows are discarded.
If a given row R appears exactly m times in the first input query and n times in the second input query (m >= 0, n >= 0):
- For
UNION ALL
, R appears exactly m + n times in the result. - For
UNION DISTINCT
, theDISTINCT
is computed after theUNION
is computed, so R appears exactly one time. - For
INTERSECT DISTINCT
, theDISTINCT
is computed after the result above is computed. - For
EXCEPT DISTINCT
, row R appears once in the output if m > 0 and n = 0. - If there are more than two input queries, the above operations generalize and the output is the same as if the inputs were combined incrementally from left to right.
The following rules apply:
- For set operations other than
UNION ALL
, all column types must support equality comparison. - The input queries on each side of the operator must return the same number of columns.
- The operators pair the columns returned by each input query according to
the columns' positions in their respective
SELECT
lists. That is, the first column in the first input query is paired with the first column in the second input query. - The result set always uses the column names from the first input query.
- The result set always uses the supertypes of input types in corresponding columns, so paired columns must also have either the same data type or a common supertype.
- You must use parentheses to separate different set
operations; for this purpose, set operations such as
UNION ALL
andUNION DISTINCT
are different. If the statement only repeats the same set operation, parentheses are not necessary.
Examples:
query1 UNION ALL (query2 UNION DISTINCT query3)
query1 UNION ALL query2 UNION ALL query3
Invalid:
query1 UNION ALL query2 UNION DISTINCT query3
query1 UNION ALL query2 INTERSECT ALL query3; // INVALID.
UNION
The UNION
operator combines the result sets of two or more input queries by
pairing columns from the result set of each query and vertically concatenating
them.
INTERSECT
The INTERSECT
operator returns rows that are found in the result sets of both
the left and right input queries. Unlike EXCEPT
, the positioning of the input
queries (to the left versus right of the INTERSECT
operator) does not matter.
EXCEPT
The EXCEPT
operator returns rows from the left input query that are
not present in the right input query.
Example:
SELECT * FROM UNNEST(ARRAY<int64>[1, 2, 3]) AS number
EXCEPT DISTINCT SELECT 1;
+--------+
| number |
+--------+
| 2 |
| 3 |
+--------+
LIMIT and OFFSET clauses
LIMIT count [ OFFSET skip_rows ]
LIMIT
specifies a non-negative count
of type INT64,
and no more than count
rows will be returned. LIMIT
0
returns 0 rows.
If there is a set operation, LIMIT
is applied after the set operation is
evaluated.
OFFSET
specifies a non-negative number of rows to skip before applying
LIMIT
. skip_rows
is of type INT64.
These clauses accept only literal or parameter values. The rows that are
returned by LIMIT
and OFFSET
are unspecified unless these
operators are used after ORDER BY
.
Examples:
SELECT *
FROM UNNEST(ARRAY<STRING>['a', 'b', 'c', 'd', 'e']) AS letter
ORDER BY letter ASC LIMIT 2
+---------+
| letter |
+---------+
| a |
| b |
+---------+
SELECT *
FROM UNNEST(ARRAY<STRING>['a', 'b', 'c', 'd', 'e']) AS letter
ORDER BY letter ASC LIMIT 3 OFFSET 1
+---------+
| letter |
+---------+
| b |
| c |
| d |
+---------+
WITH clause
WITH [ RECURSIVE ] { non_recursive_cte | recursive_cte }[, ...]
A WITH
clause contains one or more common table expressions (CTEs).
A CTE acts like a temporary table that you can reference within a single
query expression. Each CTE binds the results of a subquery
to a table name, which can be used elsewhere in the same query expression,
but rules apply.
CTEs can be non-recursive or
recursive and you can include both of these in your
WITH
clause. A recursive CTE references itself, where a
non-recursive CTE does not. If a recursive CTE is included in the WITH
clause,
the RECURSIVE
keyword must also be included.
You can include the RECURSIVE
keyword in a WITH
clause even if no
recursive CTEs are present. You can learn more about the RECURSIVE
keyword
here.
BigQuery only materializes
the results of recursive CTEs, but does not materialize the results
of non-recursive CTEs inside the WITH
clause. If a non-recursive CTE is
referenced in multiple places in a query, then the CTE is executed once for each
reference. The WITH
clause with non-recursive CTEs is useful primarily for
readability.
RECURSIVE keyword
A WITH
clause can optionally include the RECURSIVE
keyword, which does
two things:
- Enables recursion in the
WITH
clause. If this keyword is not present, you can only include non-recursive common table expressions (CTEs). If this keyword is present, you can use both recursive and non-recursive CTEs. - Changes the visibility of CTEs in the
WITH
clause. If this keyword is not present, a CTE is only visible to CTEs defined after it in theWITH
clause. If this keyword is present, a CTE is visible to all CTEs in theWITH
clause where it was defined.
Non-recursive CTEs
non_recursive_cte: cte_name AS ( query_expr )
A non-recursive common table expression (CTE) contains a non-recursive subquery and a name associated with the CTE.
- A non-recursive CTE cannot reference itself.
- A non-recursive CTE can be referenced by the query expression that
contains the
WITH
clause, but rules apply.
Examples
In this example, a WITH
clause defines two non-recursive CTEs that
are referenced in the related set operation, where one CTE is referenced by
each of the set operation's input query expressions:
WITH subQ1 AS (SELECT SchoolID FROM Roster),
subQ2 AS (SELECT OpponentID FROM PlayerStats)
SELECT * FROM subQ1
UNION ALL
SELECT * FROM subQ2
You can break up more complex queries into a WITH
clause and
WITH
SELECT
statement instead of writing nested table subqueries.
For example:
WITH q1 AS (my_query)
SELECT *
FROM
(WITH q2 AS (SELECT * FROM q1) SELECT * FROM q2)
WITH q1 AS (my_query)
SELECT *
FROM
(WITH q2 AS (SELECT * FROM q1), # q1 resolves to my_query
q3 AS (SELECT * FROM q1), # q1 resolves to my_query
q1 AS (SELECT * FROM q1), # q1 (in the query) resolves to my_query
q4 AS (SELECT * FROM q1) # q1 resolves to the WITH subquery on the previous line.
SELECT * FROM q1) # q1 resolves to the third inner WITH subquery.
Recursive CTEs
recursive_cte: cte_name AS ( recursive_union_operation ) recursive_union_operation: base_term union_operator recursive_term base_term: query_expr recursive_term: query_expr union_operator: UNION ALL
A recursive common table expression (CTE) contains a recursive subquery and a name associated with the CTE.
- A recursive CTE references itself.
- A recursive CTE can be referenced in the query expression that contains the
WITH
clause, but rules apply. - When a recursive CTE is defined in a
WITH
clause, theRECURSIVE
keyword must be present.
A recursive CTE is defined by a recursive union operation. The recursive union operation defines how input is recursively processed to produce the final CTE result. The recursive union operation has the following parts:
base_term
: Runs the first iteration of the recursive union operation. This term must follow the base term rules.union_operator
: TheUNION
operator returns the rows that are from the union of the base term and recursive term. WithUNION ALL
, each row produced in iterationN
becomes part of the final CTE result and input for iterationN+1
. Iteration stops when an iteration produces no rows to move into the next iteration.recursive_term
: Runs the remaining iterations. It must include one self-reference (recursive reference) to the recursive CTE. Only this term can include a self-reference. This term must follow the recursive term rules.
A recursive CTE looks like this:
WITH RECURSIVE
T1 AS ( (SELECT 1 AS n) UNION ALL (SELECT n + 1 AS n FROM T1 WHERE n < 3) )
SELECT n FROM T1
+---+
| n |
+---+
| 2 |
| 1 |
| 3 |
+---+
The first iteration of a recursive union operation runs the base term. Then, each subsequent iteration runs the recursive term and produces new rows which are unioned with the previous iteration. The recursive union operation terminates when an recursive term iteration produces no new rows.
If recursion does not terminate, the query fails after reaching 100 iterations, which can be customized at the project level.
Examples of allowed recursive CTEs
This is a simple recursive CTE:
WITH RECURSIVE
T1 AS (
(SELECT 1 AS n) UNION ALL
(SELECT n + 2 FROM T1 WHERE n < 4))
SELECT * FROM T1 ORDER BY n
+---+
| n |
+---+
| 1 |
| 3 |
| 5 |
+---+
Multiple subqueries in the same recursive CTE are okay, as long as each recursion has a cycle length of 1. It is also okay for recursive entries to depend on non-recursive entries and vice-versa:
WITH RECURSIVE
T0 AS (SELECT 1 AS n),
T1 AS ((SELECT * FROM T0) UNION ALL (SELECT n + 1 FROM T1 WHERE n < 4)),
T2 AS ((SELECT 1 AS n) UNION ALL (SELECT n + 1 FROM T2 WHERE n < 4)),
T3 AS (SELECT * FROM T1 INNER JOIN T2 USING (n))
SELECT * FROM T3 ORDER BY n
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
| 4 |
+---+
Aggregate functions can be invoked in subqueries, as long as they are not aggregating on the table being defined:
WITH RECURSIVE
T0 AS (SELECT * FROM UNNEST ([60, 20, 30])),
T1 AS ((SELECT 1 AS n) UNION ALL (SELECT n + (SELECT COUNT(*) FROM T0) FROM T1 WHERE n < 4))
SELECT * FROM T1 ORDER BY n
+---+
| n |
+---+
| 1 |
| 4 |
+---+
INNER JOIN
can be used inside subqueries:
WITH RECURSIVE
T0 AS (SELECT 1 AS n),
T1 AS ((SELECT 1 AS n) UNION ALL (SELECT n + 1 FROM T1 INNER JOIN T0 USING (n)))
SELECT * FROM T1 ORDER BY n
+---+
| n |
+---+
| 1 |
| 2 |
+---+
CROSS JOIN
can be used inside subqueries:
WITH RECURSIVE
T0 AS (SELECT 2 AS p),
T1 AS ((SELECT 1 AS n) UNION ALL (SELECT T1.n + T0.p FROM T1 CROSS JOIN T0 WHERE T1.n < 4))
SELECT * FROM T1 CROSS JOIN T0 ORDER BY n
+---+---+
| n | p |
+---+---+
| 1 | 2 |
| 3 | 2 |
| 5 | 2 |
+---+---+
Recursive CTEs can be used inside CREATE TABLE AS SELECT
statements. The
following example creates a table named new_table
in mydataset
:
CREATE OR REPLACE TABLE `myproject.mydataset.new_table` AS
WITH RECURSIVE
T1 AS (SELECT 1 AS n UNION ALL SELECT n + 1 FROM T1 WHERE n < 3)
SELECT * FROM T1
Recursive CTEs can be used inside CREATE VIEW AS SELECT
statements. The
following example creates a view named new_view
in mydataset
:
CREATE OR REPLACE VIEW `myproject.mydataset.new_view` AS
WITH RECURSIVE
T1 AS (SELECT 1 AS n UNION ALL SELECT n + 1 FROM T1 WHERE n < 3)
SELECT * FROM T1
Recursive CTEs can be used inside INSERT
statements. The following example
demonstrates how to insert data into a table by using recursive CTEs:
-- create a temp table.
CREATE TEMP TABLE tmp_table (n INT64);
-- insert some values into the temp table by using recursive CTEs.
INSERT INTO tmp_table(n)
WITH RECURSIVE
T1 AS (SELECT 1 AS n UNION ALL SELECT n + 1 FROM T1 WHERE n < 3)
SELECT * FROM T1
Examples of disallowed recursive CTEs
The following recursive CTE is disallowed because the self-reference does not include a set operator, base term, and recursive term.
WITH RECURSIVE
T1 AS (SELECT * FROM T1)
SELECT * FROM T1
-- Error
The following recursive CTE is disallowed because the self-reference to T1
is in the base term. The self reference is only allowed in the recursive term.
WITH RECURSIVE
T1 AS ((SELECT * FROM T1) UNION ALL (SELECT 1))
SELECT * FROM T1
-- Error
The following recursive CTE is disallowed because there are multiple self-references in the recursive term when there must only be one.
WITH RECURSIVE
T1 AS ((SELECT 1 AS n) UNION ALL ((SELECT * FROM T1) UNION ALL (SELECT * FROM T1)))
SELECT * FROM T1
-- Error
The following recursive CTE is disallowed because the self-reference is inside an expression subquery
WITH RECURSIVE
T1 AS ((SELECT 1 AS n) UNION ALL (SELECT (SELECT n FROM T1)))
SELECT * FROM T1
-- Error
The following recursive CTE is disallowed because there is a self-reference as input to an outer join.
WITH RECURSIVE
T0 AS (SELECT 1 AS n),
T1 AS ((SELECT 1 AS n) UNION ALL (SELECT * FROM T1 FULL OUTER JOIN T0 USING (n)))
SELECT * FROM T1;
-- Error
The following recursive CTE is disallowed because you cannot use aggregation with a self-reference.
WITH RECURSIVE
T1 AS (
(SELECT 1 AS n) UNION ALL
(SELECT COUNT(*) FROM T1))
SELECT * FROM T1;
-- Error
The following recursive CTE is disallowed because you cannot use the
window function OVER
clause with a self-reference.
WITH RECURSIVE
T1 AS (
(SELECT 1.0 AS n) UNION ALL
SELECT 1 + AVG(n) OVER(ROWS between 2 PRECEDING and 0 FOLLOWING) FROM T1 WHERE n < 10)
SELECT n FROM T1;
-- Error
The following recursive CTE is disallowed because you cannot use a
LIMIT
clause with a self-reference.
WITH RECURSIVE
T1 AS ((SELECT 1 AS n) UNION ALL (SELECT n FROM T1 LIMIT 3))
SELECT * FROM T1;
-- Error
The following recursive CTEs are disallowed because you cannot use an
ORDER BY
clause with a self-reference.
WITH RECURSIVE
T1 AS ((SELECT 1 AS n) UNION ALL (SELECT n + 1 FROM T1 ORDER BY n))
SELECT * FROM T1;
-- Error
The following recursive CTE is disallowed because table T1
can't be
recursively referenced from inside an inner WITH
clause
WITH RECURSIVE
T1 AS ((SELECT 1 AS n) UNION ALL (WITH t AS (SELECT n FROM T1) SELECT * FROM t))
SELECT * FROM T1
-- Error
CTE rules and constraints
Common table expressions (CTEs) can be referenced inside the query expression
that contains the WITH
clause.
General rules
Here are some general rules and constraints to consider when working with CTEs:
- Each CTE in the same
WITH
clause must have a unique name. - You must include the
RECURSIVE
keyword keyword if theWITH
clause contains a recursive CTE. - The
RECURSIVE
keyword in theWITH
clause changes the visibility of CTEs to other CTEs in the sameWITH
clause. You can learn more here. WITH
is not allowed insideWITH RECURSIVE
.WITH RECURSIVE
is allowed in theSELECT
statement.WITH RECURSIVE
is only allowed at the top level of the query.WITH RECURSIVE
is not allowed in functions.WITH RECURSIVE
is not allowed in materialized views.WITH RECURSIVE
is not allowed in BigQuery ML.CREATE RECURSIVE VIEW
is not supported. To work around this, use theWITH RECURSIVE
clause as thequery_expression
in theCREATE VIEW
statement. For more information, see CREATE VIEW.- A local CTE overrides an outer CTE or table with the same name.
- A CTE on a subquery may not reference correlated columns from the outer query.
Base term rules
The following rules apply to the base term in a recursive CTE:
- The base term is required to be non-recursive.
- The base term determines the names and types of all of the table columns.
- The base term cannot have STRUCT output columns.
Recursive term rules
The following rules apply to the recursive term in a recursive CTE:
- The recursive term must include exactly one reference to the recursively-defined table in the base term.
- The recursive term must contain the same number of columns as the base term, and the type of each column must be implicitly coercible to the type of the corresponding column in the base term.
- A recursive table reference cannot be used as an operand to a
FULL JOIN
, a right operand to aLEFT JOIN
, or a left operand to aRIGHT JOIN
. - A recursive table reference cannot be used with the
TABLESAMPLE
operator. - The recursive term must not use any non-deterministic operators.
- Use of the
IN
andEXISTS
expression subqueries is limited within the recursive term. For example:[NOT] IN
and[NOT] EXISTS
are not allowed in theSELECT
clause.NOT IN
is not allowed in theWHERE
clause.
The following rules apply to a subquery inside an recursive term:
- A subquery with a recursive table reference must be a
SELECT
expression, not a set operation, such asUNION ALL
. - A subquery cannot contain, directly or indirectly, a
recursive table reference anywhere outside of its
FROM
clause. - A subquery with a recursive table reference cannot contain an
ORDER BY
orLIMIT
clause. - A subquery with a recursive table reference cannot invoke aggregate functions.
- A subquery with a recursive table reference cannot invoke window functions.
- A subquery with a recursive table reference cannot contain the
DISTINCT
keyword orGROUP BY
clause.
CTE visibility
The visibility of a common table expression (CTE) within a query expression
is determined by whether or not you add the RECURSIVE
keyword to the
WITH
clause where the CTE was defined. You can learn more about these
differences in the following sections.
Visibility of CTEs in a WITH
clause with the RECURSIVE
keyword
When you include the RECURSIVE
keyword, references between CTEs in the WITH
clause can go backwards and forwards. Cycles are not allowed.
This is what happens when you have two CTEs that reference
themselves or each other in a WITH
clause with the RECURSIVE
keyword. Assume that A
is the first CTE and B
is the second
CTE in the clause:
- A references A = Valid
- A references B = Valid
- B references A = Valid
- A references B references A = Invalid (cycles are not allowed)
A
can reference itself because self-references are supported:
WITH RECURSIVE
A AS (SELECT 1 AS n UNION ALL (SELECT n + 1 FROM A WHERE n < 3))
SELECT * FROM A
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
+---+
A
can reference B
because references between CTEs can go forwards:
WITH RECURSIVE
A AS (SELECT * FROM B),
B AS (SELECT 1 AS n)
SELECT * FROM B
+---+
| n |
+---+
| 1 |
+---+
B
can reference A
because references between CTEs can go backwards:
WITH RECURSIVE
A AS (SELECT 1 AS n),
B AS (SELECT * FROM A)
SELECT * FROM B
+---+
| n |
+---+
| 1 |
+---+
This produces an error. A
and B
reference each other, which creates a cycle:
WITH RECURSIVE
A AS (SELECT * FROM B),
B AS (SELECT * FROM A)
SELECT * FROM B
-- Error
Visibility of CTEs in a WITH
clause without the RECURSIVE
keyword
When you don't include the RECURSIVE
keyword in the WITH
clause,
references between CTEs in the clause can go backward but not forward.
This is what happens when you have two CTEs that reference
themselves or each other in a WITH
clause without
the RECURSIVE
keyword. Assume that A
is the first CTE and B
is the second CTE in the clause:
- A references A = Invalid
- A references B = Invalid
- B references A = Valid
- A references B references A = Invalid (cycles are not allowed)
This produces an error. A
cannot reference itself because self-references are
not supported:
WITH
A AS (SELECT 1 AS n UNION ALL (SELECT n + 1 FROM A WHERE n < 3))
SELECT * FROM A
-- Error
This produces an error. A
cannot reference B
because references between
CTEs can go backwards but not forwards:
WITH
A AS (SELECT * FROM B),
B AS (SELECT 1 AS n)
SELECT * FROM B
-- Error
B
can reference A
because references between CTEs can go backwards:
WITH
A AS (SELECT 1 AS n),
B AS (SELECT * FROM A)
SELECT * FROM B
+---+
| n |
+---+
| 1 |
+---+
This produces an error. A
and B
reference each other, which creates a
cycle:
WITH
A AS (SELECT * FROM B),
B AS (SELECT * FROM A)
SELECT * FROM B
-- Error
Using aliases
An alias is a temporary name given to a table, column, or expression present in
a query. You can introduce explicit aliases in the SELECT
list or FROM
clause, or BigQuery will infer an implicit alias for some expressions.
Expressions with neither an explicit nor implicit alias are anonymous and the
query cannot reference them by name.
Explicit aliases
You can introduce explicit aliases in either the FROM
clause or the SELECT
list.
In a FROM
clause, you can introduce explicit aliases for any item, including
tables, arrays, subqueries, and UNNEST
clauses, using [AS] alias
. The AS
keyword is optional.
Example:
SELECT s.FirstName, s2.SongName
FROM Singers AS s, (SELECT * FROM Songs) AS s2;
You can introduce explicit aliases for any expression in the SELECT
list using
[AS] alias
. The AS
keyword is optional.
Example:
SELECT s.FirstName AS name, LOWER(s.FirstName) AS lname
FROM Singers s;
Implicit aliases
In the SELECT
list, if there is an expression that does not have an explicit
alias, BigQuery assigns an implicit alias according to the following
rules. There can be multiple columns with the same alias in the SELECT
list.
- For identifiers, the alias is the identifier. For example,
SELECT abc
impliesAS abc
. - For path expressions, the alias is the last identifier in the path. For
example,
SELECT abc.def.ghi
impliesAS ghi
. - For field access using the "dot" member field access operator, the alias is
the field name. For example,
SELECT (struct_function()).fname
impliesAS fname
.
In all other cases, there is no implicit alias, so the column is anonymous and cannot be referenced by name. The data from that column will still be returned and the displayed query results may have a generated label for that column, but the label cannot be used like an alias.
In a FROM
clause, from_item
s are not required to have an alias. The
following rules apply:
- If there is an expression that does not have an explicit alias, BigQuery assigns an implicit alias in these cases:
-
For identifiers, the alias is the identifier. For example,
FROM abc
impliesAS abc
. -
For path expressions, the alias is the last identifier in the path. For
example,
FROM abc.def.ghi
impliesAS ghi
-
The column produced using
WITH OFFSET
has the implicit aliasoffset
. - Table subqueries do not have implicit aliases.
-
FROM UNNEST(x)
does not have an implicit alias.
Alias visibility
After you introduce an explicit alias in a query, there are restrictions on where else in the query you can reference that alias. These restrictions on alias visibility are the result of BigQuery's name scoping rules.
Visibility in the FROM clause
BigQuery processes aliases in a FROM
clause from left to right,
and aliases are visible only to subsequent path expressions in a FROM
clause.
Example:
Assume the Singers
table had a Concerts
column of ARRAY
type.
SELECT FirstName
FROM Singers AS s, s.Concerts;
Invalid:
SELECT FirstName
FROM s.Concerts, Singers AS s; // INVALID.
FROM
clause aliases are not visible to subqueries in the same FROM
clause. Subqueries in a FROM
clause cannot contain correlated references to
other tables in the same FROM
clause.
Invalid:
SELECT FirstName
FROM Singers AS s, (SELECT (2020 - ReleaseDate) FROM s) // INVALID.
You can use any column name from a table in the FROM
as an alias anywhere in
the query, with or without qualification with the table name.
Example:
SELECT FirstName, s.ReleaseDate
FROM Singers s WHERE ReleaseDate = 1975;
If the FROM
clause contains an explicit alias, you must use the explicit alias
instead of the implicit alias for the remainder of the query (see
Implicit Aliases). A table alias is useful for brevity or
to eliminate ambiguity in cases such as self-joins, where the same table is
scanned multiple times during query processing.
Example:
SELECT * FROM Singers as s, Songs as s2
ORDER BY s.LastName
Invalid — ORDER BY
does not use the table alias:
SELECT * FROM Singers as s, Songs as s2
ORDER BY Singers.LastName; // INVALID.
Visibility in the SELECT list
Aliases in the SELECT
list are visible only to the following clauses:
GROUP BY
clauseORDER BY
clauseHAVING
clause
Example:
SELECT LastName AS last, SingerID
FROM Singers
ORDER BY last;
Visibility in the GROUP BY, ORDER BY, and HAVING clauses
These three clauses, GROUP BY
, ORDER BY
, and HAVING
, can refer to only the
following values:
- Tables in the
FROM
clause and any of their columns. - Aliases from the
SELECT
list.
GROUP BY
and ORDER BY
can also refer to a third group:
- Integer literals, which refer to items in the
SELECT
list. The integer1
refers to the first item in theSELECT
list,2
refers to the second item, etc.
Example:
SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY 1
ORDER BY 2 DESC;
The previous query is equivalent to:
SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY sid
ORDER BY s2id DESC;
Duplicate aliases
A SELECT
list or subquery containing multiple explicit or implicit aliases
of the same name is allowed, as long as the alias name is not referenced
elsewhere in the query, since the reference would be
ambiguous.
When a top-level SELECT
list contains duplicate column names and no
destination table is specified, all duplicate columns, except for the first one,
are automatically renamed to make them unique. The renamed columns appear in the
query result.
Example:
SELECT 1 AS a, 2 AS a;
+---+-----+
| a | a_1 |
+---+-----+
| 1 | 2 |
+---+-----+
Duplicate column names in a table or view definition are not supported. These statements with queries that contain duplicate column names will fail:
CREATE TABLE my_dataset.my_table AS (SELECT 1 AS a, 2 AS a);
CREATE VIEW my_dataset.my_view AS (SELECT 1 AS a, 2 AS a);
Ambiguous aliases
BigQuery provides an error if accessing a name is ambiguous, meaning it can resolve to more than one unique object in the query or in a table schema, including the schema of a destination table.
Examples:
This query contains column names that conflict between tables, since both
Singers
and Songs
have a column named SingerID
:
SELECT SingerID
FROM Singers, Songs;
This query contains aliases that are ambiguous in the GROUP BY
clause because
they are duplicated in the SELECT
list:
SELECT FirstName AS name, LastName AS name,
FROM Singers
GROUP BY name;
This query contains aliases that are ambiguous in the SELECT
list and FROM
clause because they share the same name. Assume table
has columns x
, y
,
and z
. z
is of type STRUCT and has fields
v
, w
, and x
.
Example:
SELECT x, z AS T
FROM table AS T
GROUP BY T.x;
The alias T
is ambiguous and will produce an error because T.x
in the GROUP
BY
clause could refer to either table.x
or table.z.x
.
A name is not ambiguous in GROUP BY
, ORDER BY
or HAVING
if it is both
a column name and a SELECT
list alias, as long as the name resolves to the
same underlying object.
Example:
SELECT LastName, BirthYear AS BirthYear
FROM Singers
GROUP BY BirthYear;
The alias BirthYear
is not ambiguous because it resolves to the same
underlying column, Singers.BirthYear
.
Range variables
In BigQuery, a range variable is a table expression alias in the
FROM
clause. Sometimes a range variable is known as a table alias
. A
range variable lets you reference rows being scanned from a table expression.
A table expression represents an item in the FROM
clause that returns a table.
Common items that this expression can represent include
tables, value tables, subqueries,
joins, and parenthesized joins.
In general, a range variable provides a reference to the rows of a table
expression. A range variable can be used to qualify a column reference and
unambiguously identify the related table, for example range_variable.column_1
.
When referencing a range variable on its own without a specified column suffix,
the result of a table expression is the row type of the related table.
Value tables have explicit row types, so for range variables related
to value tables, the result type is the value table's row type. Other tables
do not have explicit row types, and for those tables, the range variable
type is a dynamically defined STRUCT
that includes all of the
columns in the table.
Examples
In these examples, the WITH
clause is used to emulate a temporary table
called Grid
. This table has columns x
and y
. A range variable called
Coordinate
refers to the current row as the table is scanned. Coordinate
can be used to access the entire row or columns in the row.
The following example selects column x
from range variable Coordinate
,
which in effect selects column x
from table Grid
.
WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate.x FROM Grid AS Coordinate;
+---+
| x |
+---+
| 1 |
+---+
The following example selects all columns from range variable Coordinate
,
which in effect selects all columns from table Grid
.
WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate.* FROM Grid AS Coordinate;
+---+---+
| x | y |
+---+---+
| 1 | 2 |
+---+---+
The following example selects the range variable Coordinate
, which is a
reference to rows in table Grid
. Since Grid
is not a value table,
the result type of Coordinate
is a STRUCT
that contains all the columns
from Grid
.
WITH Grid AS (SELECT 1 x, 2 y)
SELECT Coordinate FROM Grid AS Coordinate;
+--------------+
| Coordinate |
+--------------+
| {x: 1, y: 2} |
+--------------+
Working with value tables
In BigQuery, a value table is a table where the row type is a single value. In a regular table, each row is made up of columns, each of which has a name and a type. In a value table, the row type is just a single value, and there are no column names.
A query will produce a value table if it uses SELECT AS STRUCT
or
SELECT AS VALUE
.
Note: In BigQuery, a query can only return a value table with a type of
STRUCT
.
In contexts where a query with exactly one column is expected, a value table query can be used instead. For example, scalar subqueries and array subqueries (see Subqueries) normally require a single-column query, but in BigQuery, they also allow using a value table query.
Appendix A: examples with sample data
These examples include statements which perform queries on the
Roster
and TeamMascot
,
and PlayerStats
tables.
GROUP BY clause
Example:
SELECT LastName, SUM(PointsScored)
FROM PlayerStats
GROUP BY LastName;
LastName | SUM |
---|---|
Adams | 7 |
Buchanan | 13 |
Coolidge | 1 |
UNION
The UNION
operator combines the result sets of two or more SELECT
statements
by pairing columns from the result set of each SELECT
statement and vertically
concatenating them.
Example:
SELECT Mascot AS X, SchoolID AS Y
FROM TeamMascot
UNION ALL
SELECT LastName, PointsScored
FROM PlayerStats;
Results:
X | Y |
---|---|
Jaguars | 50 |
Knights | 51 |
Lakers | 52 |
Mustangs | 53 |
Adams | 3 |
Buchanan | 0 |
Coolidge | 1 |
Adams | 4 |
Buchanan | 13 |
INTERSECT
This query returns the last names that are present in both Roster and PlayerStats.
SELECT LastName
FROM Roster
INTERSECT DISTINCT
SELECT LastName
FROM PlayerStats;
Results:
LastName |
---|
Adams |
Coolidge |
Buchanan |
EXCEPT
The query below returns last names in Roster that are not present in PlayerStats.
SELECT LastName
FROM Roster
EXCEPT DISTINCT
SELECT LastName
FROM PlayerStats;
Results:
LastName |
---|
Eisenhower |
Davis |
Reversing the order of the SELECT
statements will return last names in
PlayerStats that are not present in Roster:
SELECT LastName
FROM PlayerStats
EXCEPT DISTINCT
SELECT LastName
FROM Roster;
Results:
(empty)