Query statements scan one or more tables or expressions and return the computed result rows. This topic describes the syntax for SQL queries in GoogleSQL for BigQuery.
SQL syntax notation rules
The GoogleSQL documentation commonly uses the following syntax notation rules:
- Square brackets
[ ]
: Optional clause. - Curly braces with vertical bars
{ a | b | c }
: LogicalOR
. Select one option. - Ellipsis
...
: Preceding item can repeat. - Double quotes
"
: Syntax wrapped in double quotes (""
) is required.
SQL syntax
query_statement: query_expr query_expr: [ WITH [ RECURSIVE ] { non_recursive_cte | recursive_cte }[, ...] ] { select | ( query_expr ) | set_operation } [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ LIMIT count [ OFFSET skip_rows ] ] select: SELECT [ WITH differential_privacy_clause ] [ { ALL | DISTINCT } ] [ AS { STRUCT | VALUE } ] select_list [ FROM from_clause[, ...] ] [ WHERE bool_expression ] [ GROUP BY group_by_specification ] [ HAVING bool_expression ] [ QUALIFY bool_expression ] [ WINDOW window_clause ]
SELECT
statement
SELECT [ WITH differential_privacy_clause ] [ { ALL | DISTINCT } ] [ AS { STRUCT | VALUE } ] select_list select_list: { select_all | select_expression } [, ...] select_all: [ expression. ]* [ EXCEPT ( column_name [, ...] ) ] [ REPLACE ( expression [ AS ] column_name [, ...] ) ] select_expression: expression [ [ AS ] alias ]
The SELECT
list defines the columns that the query will return. Expressions in
the SELECT
list can refer to columns in any of the from_item
s in its
corresponding FROM
clause.
Each item in the SELECT
list is one of:
*
expression
expression.*
SELECT *
SELECT *
, often referred to as select star, produces one output column for
each column that is visible after executing the full query.
SELECT * FROM (SELECT "apple" AS fruit, "carrot" AS vegetable);
/*-------+-----------*
| fruit | vegetable |
+-------+-----------+
| apple | carrot |
*-------+-----------*/
SELECT expression
Items in a SELECT
list can be expressions. These expressions evaluate to a
single value and produce one output column, with an optional explicit alias
.
If the expression doesn't have an explicit alias, it receives an implicit alias according to the rules for implicit aliases, if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query.
SELECT expression.*
An item in a SELECT
list can also take the form of expression.*
. This
produces one output column for each column or top-level field of expression
.
The expression must either be a table alias or evaluate to a single value of a
data type with fields, such as a STRUCT.
The following query produces one output column for each column in the table
groceries
, aliased as g
.
WITH groceries AS
(SELECT "milk" AS dairy,
"eggs" AS protein,
"bread" AS grain)
SELECT g.*
FROM groceries AS g;
/*-------+---------+-------*
| dairy | protein | grain |
+-------+---------+-------+
| milk | eggs | bread |
*-------+---------+-------*/
More examples:
WITH locations AS
(SELECT STRUCT("Seattle" AS city, "Washington" AS state) AS location
UNION ALL
SELECT STRUCT("Phoenix" AS city, "Arizona" AS state) AS location)
SELECT l.location.*
FROM locations l;
/*---------+------------*
| city | state |
+---------+------------+
| Seattle | Washington |
| Phoenix | Arizona |
*---------+------------*/
WITH locations AS
(SELECT ARRAY<STRUCT<city STRING, state STRING>>[("Seattle", "Washington"),
("Phoenix", "Arizona")] AS location)
SELECT l.LOCATION[offset(0)].*
FROM locations l;
/*---------+------------*
| city | state |
+---------+------------+
| Seattle | Washington |
*---------+------------*/
SELECT * EXCEPT
A SELECT * EXCEPT
statement specifies the names of one or more columns to
exclude from the result. All matching column names are omitted from the output.
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * EXCEPT (order_id)
FROM orders;
/*-----------+----------*
| item_name | quantity |
+-----------+----------+
| sprocket | 200 |
*-----------+----------*/
SELECT * REPLACE
A SELECT * REPLACE
statement specifies one or more
expression AS identifier
clauses. Each identifier must match a column name
from the SELECT *
statement. In the output column list, the column that
matches the identifier in a REPLACE
clause is replaced by the expression in
that REPLACE
clause.
A SELECT * REPLACE
statement doesn't change the names or order of columns.
However, it can change the value and the value type.
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * REPLACE ("widget" AS item_name)
FROM orders;
/*----------+-----------+----------*
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5 | widget | 200 |
*----------+-----------+----------*/
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * REPLACE (quantity/2 AS quantity)
FROM orders;
/*----------+-----------+----------*
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5 | sprocket | 100 |
*----------+-----------+----------*/
SELECT DISTINCT
A SELECT DISTINCT
statement discards duplicate rows and returns only the
remaining rows. SELECT DISTINCT
cannot return columns of the following types:
In the following example, SELECT DISTINCT
is used to produce distinct arrays:
WITH PlayerStats AS (
SELECT ['Coolidge', 'Adams'] as Name, 3 as PointsScored UNION ALL
SELECT ['Adams', 'Buchanan'], 0 UNION ALL
SELECT ['Coolidge', 'Adams'], 1 UNION ALL
SELECT ['Kiran', 'Noam'], 1)
SELECT DISTINCT Name
/*------------------+
| Name |
+------------------+
| [Coolidge,Adams] |
| [Adams,Buchanan] |
| [Kiran,Noam] |
+------------------*/
In the following example, SELECT DISTINCT
is used to produce distinct structs:
WITH
PlayerStats AS (
SELECT
STRUCT<last_name STRING, first_name STRING, age INT64>(
'Adams', 'Noam', 20) AS Player,
3 AS PointsScored UNION ALL
SELECT ('Buchanan', 'Jie', 19), 0 UNION ALL
SELECT ('Adams', 'Noam', 20), 4 UNION ALL
SELECT ('Buchanan', 'Jie', 19), 13
)
SELECT DISTINCT Player
FROM PlayerStats;
/*--------------------------+
| player |
+--------------------------+
| { |
| last_name: "Adams", |
| first_name: "Noam", |
| age: 20 |
| } |
+--------------------------+
| { |
| last_name: "Buchanan", |
| first_name: "Jie", |
| age: 19 |
| } |
+---------------------------*/
SELECT ALL
A SELECT ALL
statement returns all rows, including duplicate rows.
SELECT ALL
is the default behavior of SELECT
.
SELECT AS STRUCT
SELECT AS STRUCT expr [[AS] struct_field_name1] [,...]
This produces a value table with a
STRUCT row type, where the
STRUCT field names and types match the column names
and types produced in the SELECT
list.
Example:
SELECT ARRAY(SELECT AS STRUCT 1 a, 2 b)
SELECT AS STRUCT
can be used in a scalar or array subquery to produce a single
STRUCT type grouping multiple values together. Scalar
and array subqueries (see Subqueries) are normally not
allowed to return multiple columns, but can return a single column with
STRUCT type.
SELECT AS VALUE
SELECT AS VALUE
produces a value table from any
SELECT
list that produces exactly one column. Instead of producing an
output table with one column, possibly with a name, the output will be a
value table where the row type is just the value type that was produced in the
one SELECT
column. Any alias the column had will be discarded in the
value table.
Example:
SELECT AS VALUE STRUCT(1 AS a, 2 AS b) xyz
The query above produces a table with row type STRUCT<a int64, b int64>
.
FROM
clause
FROM from_clause[, ...] from_clause: from_item [ { pivot_operator | unpivot_operator } ] [ tablesample_operator ] from_item: { table_name [ as_alias ] [ FOR SYSTEM_TIME AS OF timestamp_expression ] | { join_operation | ( join_operation ) } | ( query_expr ) [ as_alias ] | field_path | unnest_operator | cte_name [ as_alias ] } as_alias: [ AS ] alias
The FROM
clause indicates the table or tables from which to retrieve rows,
and specifies how to join those rows together to produce a single stream of
rows for processing in the rest of the query.
pivot_operator
See PIVOT operator.
unpivot_operator
See UNPIVOT operator.
tablesample_operator
See TABLESAMPLE operator.
table_name
The name (optionally qualified) of an existing table.
SELECT * FROM Roster; SELECT * FROM dataset.Roster; SELECT * FROM project.dataset.Roster;
FOR SYSTEM_TIME AS OF
FOR SYSTEM_TIME AS OF
references the historical versions of the table
definition and rows that were current at timestamp_expression
.
Limitations:
The source table in the FROM
clause containing FOR SYSTEM_TIME AS OF
must
not be any of the following:
- An array scan, including a
flattened array or the output
of the
UNNEST
operator. - A common table expression defined by a
WITH
clause.
timestamp_expression
must be a constant expression. It cannot
contain the following:
- Subqueries.
- Correlated references (references to columns of a table that appear at
a higher level of the query statement, such as in the
SELECT
list). - User-defined functions (UDFs).
The value of timestamp_expression
cannot fall into the following ranges:
- After the current timestamp (in the future).
- More than seven (7) days before the current timestamp.
A single query statement cannot reference a single table at more than one point in time, including the current time. That is, a query can reference a table multiple times at the same timestamp, but not the current version and a historical version, or two different historical versions.
The default time zone for timestamp_expression
in a
FOR SYSTEM_TIME AS OF
expression is America/Los_Angeles
, even though the
default time zone for timestamp literals is UTC
.
Examples:
The following query returns a historical version of the table from one hour ago.
SELECT *
FROM t
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);
The following query returns a historical version of the table at an absolute point in time.
SELECT *
FROM t
FOR SYSTEM_TIME AS OF '2017-01-01 10:00:00-07:00';
The following query returns an error because the timestamp_expression
contains
a correlated reference to a column in the containing query.
SELECT *
FROM t1
WHERE t1.a IN (SELECT t2.a
FROM t2 FOR SYSTEM_TIME AS OF t1.timestamp_column);
The following operations show accessing a historical version of the table before table is replaced.
DECLARE before_replace_timestamp TIMESTAMP;
-- Create table books.
CREATE TABLE books AS
SELECT 'Hamlet' title, 'William Shakespeare' author;
-- Get current timestamp before table replacement.
SET before_replace_timestamp = CURRENT_TIMESTAMP();
-- Replace table with different schema(title and release_date).
CREATE OR REPLACE TABLE books AS
SELECT 'Hamlet' title, DATE '1603-01-01' release_date;
-- This query returns Hamlet, William Shakespeare as result.
SELECT * FROM books FOR SYSTEM_TIME AS OF before_replace_timestamp;
The following operations show accessing a historical version of the table before a DML job.
DECLARE JOB_START_TIMESTAMP TIMESTAMP;
-- Create table books.
CREATE OR REPLACE TABLE books AS
SELECT 'Hamlet' title, 'William Shakespeare' author;
-- Insert two rows into the books.
INSERT books (title, author)
VALUES('The Great Gatsby', 'F. Scott Fizgerald'),
('War and Peace', 'Leo Tolstoy');
SELECT * FROM books;
SET JOB_START_TIMESTAMP = (
SELECT start_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
WHERE job_type="QUERY"
AND statement_type="INSERT"
ORDER BY start_time DESC
LIMIT 1
);
-- This query only returns Hamlet, William Shakespeare as result.
SELECT * FROM books FOR SYSTEM_TIME AS OF JOB_START_TIMESTAMP;
The following query returns an error because the DML operates on the current version of the table, and a historical version of the table from one day ago.
INSERT INTO t1
SELECT * FROM t1
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
join_operation
See Join operation.
query_expr
( query_expr ) [ [ AS ] alias ]
is a table subquery.
field_path
In the FROM
clause, field_path
is any path that
resolves to a field within a data type. field_path
can go
arbitrarily deep into a nested data structure.
Some examples of valid field_path
values include:
SELECT * FROM T1 t1, t1.array_column;
SELECT * FROM T1 t1, t1.struct_column.array_field;
SELECT (SELECT ARRAY_AGG(c) FROM t1.array_column c) FROM T1 t1;
SELECT a.struct_field1 FROM T1 t1, t1.array_of_structs a;
SELECT (SELECT STRING_AGG(a.struct_field1) FROM t1.array_of_structs a) FROM T1 t1;
Field paths in the FROM
clause must end in an
array field. In
addition, field paths cannot contain arrays before the end of the path. For example, the path
array_column.some_array.some_array_field
is invalid because it
contains an array before the end of the path.
unnest_operator
See UNNEST operator.
cte_name
Common table expressions (CTEs) in a WITH
Clause act like
temporary tables that you can reference anywhere in the FROM
clause.
In the example below, subQ1
and subQ2
are CTEs.
Example:
WITH
subQ1 AS (SELECT * FROM Roster WHERE SchoolID = 52),
subQ2 AS (SELECT SchoolID FROM subQ1)
SELECT DISTINCT * FROM subQ2;
The WITH
clause hides any permanent tables with the same name
for the duration of the query, unless you qualify the table name, for example:
dataset.Roster
or project.dataset.Roster
.
UNNEST
operator
unnest_operator: { UNNEST( array ) [ as_alias ] | array_path [ as_alias ] } [ WITH OFFSET [ as_alias ] ] array: { array_expression | array_path } as_alias: [AS] alias
The UNNEST
operator takes an array and returns a table with one row for each
element in the array. The output of UNNEST
is one value table column.
For these ARRAY
element types, SELECT *
against the value table column
returns multiple columns:
STRUCT
Input values:
array_expression
: An expression that produces an array.array_path
: The path to anARRAY
type.- In an implicit
UNNEST
operation, the path must start with a range variable name. - In an explicit
UNNEST
operation, the path can optionally start with a range variable name.
The
UNNEST
operation with any correlatedarray_path
must be on the right side of aCROSS JOIN
,LEFT JOIN
, orINNER JOIN
operation.- In an implicit
as_alias
: If specified, defines the explicit name of the value table column containing the array element values. It can be used to refer to the column elsewhere in the query.WITH OFFSET
:UNNEST
destroys the order of elements in the input array. Use this optional clause to return an additional column with the array element indexes, or offsets. Offset counting starts at zero for each row produced by theUNNEST
operation. This column has an optional alias; If the optional alias is not used, the default column name isoffset
.Example:
SELECT * FROM UNNEST ([10,20,30]) as numbers WITH OFFSET; /*---------+--------* | numbers | offset | +---------+--------+ | 10 | 0 | | 20 | 1 | | 30 | 2 | *---------+--------*/
You can also use UNNEST
outside of the FROM
clause with the
IN
operator.
For several ways to use UNNEST
, including construction, flattening, and
filtering, see Work with arrays.
To learn more about the ways you can use UNNEST
explicitly and implicitly,
see Explicit and implicit UNNEST
.
UNNEST
and structs
For an input array of structs, UNNEST
returns a row for each struct, with a separate column for each field in the
struct. The alias for each column is the name of the corresponding struct
field.
Example:
SELECT *
FROM UNNEST(
ARRAY<
STRUCT<
x INT64,
y STRING,
z STRUCT<a INT64, b INT64>>>[
(1, 'foo', (10, 11)),
(3, 'bar', (20, 21))]);
/*---+-----+----------*
| x | y | z |
+---+-----+----------+
| 1 | foo | {10, 11} |
| 3 | bar | {20, 21} |
*---+-----+----------*/
Because the UNNEST
operator returns a
value table,
you can alias UNNEST
to define a range variable that you can reference
elsewhere in the query. If you reference the range variable in the SELECT
list, the query returns a struct containing all of the fields of the original
struct in the input table.
Example:
SELECT *, struct_value
FROM UNNEST(
ARRAY<
STRUCT<
x INT64,
y STRING>>[
(1, 'foo'),
(3, 'bar')]) AS struct_value;
/*---+-----+--------------*
| x | y | struct_value |
+---+-----+--------------+
| 3 | bar | {3, bar} |
| 1 | foo | {1, foo} |
*---+-----+--------------*/
Explicit and implicit UNNEST
Array unnesting can be either explicit or implicit. To learn more, see the following sections.
Explicit unnesting
The UNNEST
keyword is required in explicit unnesting. For example:
WITH Coordinates AS (SELECT [1,2] AS position)
SELECT results FROM Coordinates, UNNEST(Coordinates.position) AS results;
In explicit unnesting, array_expression
must return an
array value but doesn't need to resolve to an array.
Implicit unnesting
The UNNEST
keyword is not used in implicit unnesting.
For example:
WITH Coordinates AS (SELECT [1,2] AS position)
SELECT results FROM Coordinates, Coordinates.position AS results;
Tables and implicit unnesting
When you use array_path
with implicit UNNEST
, array_path
must be prepended
with the table. For example:
WITH Coordinates AS (SELECT [1,2] AS position)
SELECT results FROM Coordinates, Coordinates.position AS results;
UNNEST
and NULL
values
UNNEST
treats NULL
values as follows:
NULL
and empty arrays produce zero rows.- An array containing
NULL
values produces rows containingNULL
values.
PIVOT
operator
FROM from_item[, ...] pivot_operator pivot_operator: PIVOT( aggregate_function_call [as_alias][, ...] FOR input_column IN ( pivot_column [as_alias][, ...] ) ) [AS alias] as_alias: [AS] alias
The PIVOT
operator rotates rows into columns, using aggregation.
PIVOT
is part of the FROM
clause.
PIVOT
can be used to modify any table expression.- Combining
PIVOT
withFOR SYSTEM_TIME AS OF
is not allowed, although users may usePIVOT
against a subquery input which itself usesFOR SYSTEM_TIME AS OF
. - A
WITH OFFSET
clause immediately preceding thePIVOT
operator is not allowed.
Conceptual example:
-- Before PIVOT is used to rotate sales and quarter into Q1, Q2, Q3, Q4 columns:
/*---------+-------+---------+------*
| product | sales | quarter | year |
+---------+-------+---------+------|
| Kale | 51 | Q1 | 2020 |
| Kale | 23 | Q2 | 2020 |
| Kale | 45 | Q3 | 2020 |
| Kale | 3 | Q4 | 2020 |
| Kale | 70 | Q1 | 2021 |
| Kale | 85 | Q2 | 2021 |
| Apple | 77 | Q1 | 2020 |
| Apple | 0 | Q2 | 2020 |
| Apple | 1 | Q1 | 2021 |
*---------+-------+---------+------*/
-- After PIVOT is used to rotate sales and quarter into Q1, Q2, Q3, Q4 columns:
/*---------+------+----+------+------+------*
| product | year | Q1 | Q2 | Q3 | Q4 |
+---------+------+----+------+------+------+
| Apple | 2020 | 77 | 0 | NULL | NULL |
| Apple | 2021 | 1 | NULL | NULL | NULL |
| Kale | 2020 | 51 | 23 | 45 | 3 |
| Kale | 2021 | 70 | 85 | NULL | NULL |
*---------+------+----+------+------+------*/
Definitions
Top-level definitions:
from_item
: The table or subquery on which to perform a pivot operation. Thefrom_item
must follow these rules.pivot_operator
: The pivot operation to perform on afrom_item
.alias
: An alias to use for an item in the query.
pivot_operator
definitions:
aggregate_function_call
: An aggregate function call that aggregates all input rows such thatinput_column
matches a particular value inpivot_column
. Each aggregation corresponding to a differentpivot_column
value produces a different column in the output. Follow these rules when creating an aggregate function call.input_column
: Takes a column and retrieves the row values for the column, following these rules.pivot_column
: A pivot column to create for each aggregate function call. If an alias is not provided, a default alias is created. A pivot column value type must match the value type ininput_column
so that the values can be compared. It is possible to have a value inpivot_column
that doesn't match a value ininput_column
. Must be a constant and follow these rules.
Rules
Rules for a from_item
passed to PIVOT
:
- The
from_item
may consist of any table or subquery result. - The
from_item
may not produce a value table. - The
from_item
may not be a subquery usingSELECT AS STRUCT
.
Rules for aggregate_function_call
:
- Must be an aggregate function. For example,
SUM
. - You may reference columns in a table passed to
PIVOT
, as well as correlated columns, but may not access columns defined by thePIVOT
clause itself. - A table passed to
PIVOT
may be accessed through its alias if one is provided. - You can only use an aggregate function that takes one argument.
- Except for
COUNT
, you can only use aggregate functions that ignoreNULL
inputs. - If you are using
COUNT
, you can use*
as an argument.
- May access columns from the input table, as well as correlated columns,
not columns defined by the
PIVOT
clause, itself. - Evaluated against each row in the input table; aggregate and window function calls are prohibited.
- Non-determinism is okay.
- The type must be groupable.
- The input table may be accessed through its alias if one is provided.
- A
pivot_column
must be a constant. - Named constants, such as variables, are not supported.
- Query parameters are not supported.
- If a name is desired for a named constant or query parameter, specify it explicitly with an alias.
- Corner cases exist where a distinct
pivot_column
s can end up with the same default column names. For example, an input column might contain both aNULL
value and the string literal"NULL"
. When this happens, multiple pivot columns are created with the same name. To avoid this situation, use aliases for pivot column names. - If a
pivot_column
doesn't specify an alias, a column name is constructed as follows:
From | To | Example |
---|---|---|
NULL | NULL |
Input: NULL Output: "NULL" |
INT64 NUMERIC BIGNUMERIC |
The number in string format with the following rules:
|
Input: 1 Output: _1 Input: -1 Output: minus_1 Input: 1.0 Output: _1_point_0 |
BOOL | TRUE or FALSE . |
Input: TRUE Output: TRUE Input: FALSE Output: FALSE |
STRING | The string value. |
Input: "PlayerName" Output: PlayerName |
DATE | The date in _YYYY_MM_DD format. |
Input: DATE '2013-11-25' Output: _2013_11_25 |
ENUM | The name of the enumeration constant. |
Input: COLOR.RED Output: RED |
STRUCT |
A string formed by computing the pivot_column name for each
field and joining the results together with an underscore. The following
rules apply:
Due to implicit type coercion from the |
Input: STRUCT("one", "two") Output: one_two Input: STRUCT("one" AS a, "two" AS b) Output: one_a_two_b |
All other data types | Not supported. You must provide an alias. |
Examples
The following examples reference a table called Produce
that looks like this:
WITH Produce AS (
SELECT 'Kale' as product, 51 as sales, 'Q1' as quarter, 2020 as year UNION ALL
SELECT 'Kale', 23, 'Q2', 2020 UNION ALL
SELECT 'Kale', 45, 'Q3', 2020 UNION ALL
SELECT 'Kale', 3, 'Q4', 2020 UNION ALL
SELECT 'Kale', 70, 'Q1', 2021 UNION ALL
SELECT 'Kale', 85, 'Q2', 2021 UNION ALL
SELECT 'Apple', 77, 'Q1', 2020 UNION ALL
SELECT 'Apple', 0, 'Q2', 2020 UNION ALL
SELECT 'Apple', 1, 'Q1', 2021)
SELECT * FROM Produce
/*---------+-------+---------+------*
| product | sales | quarter | year |
+---------+-------+---------+------|
| Kale | 51 | Q1 | 2020 |
| Kale | 23 | Q2 | 2020 |
| Kale | 45 | Q3 | 2020 |
| Kale | 3 | Q4 | 2020 |
| Kale | 70 | Q1 | 2021 |
| Kale | 85 | Q2 | 2021 |
| Apple | 77 | Q1 | 2020 |
| Apple | 0 | Q2 | 2020 |
| Apple | 1 | Q1 | 2021 |
*---------+-------+---------+------*/
With the PIVOT
operator, the rows in the quarter
column are rotated into
these new columns: Q1
, Q2
, Q3
, Q4
. The aggregate function SUM
is
implicitly grouped by all unaggregated columns other than the pivot_column
:
product
and year
.
SELECT * FROM
Produce
PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))
/*---------+------+----+------+------+------*
| product | year | Q1 | Q2 | Q3 | Q4 |
+---------+------+----+------+------+------+
| Apple | 2020 | 77 | 0 | NULL | NULL |
| Apple | 2021 | 1 | NULL | NULL | NULL |
| Kale | 2020 | 51 | 23 | 45 | 3 |
| Kale | 2021 | 70 | 85 | NULL | NULL |
*---------+------+----+------+------+------*/
If you don't include year
, then SUM
is grouped only by product
.
SELECT * FROM
(SELECT product, sales, quarter FROM Produce)
PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))
/*---------+-----+-----+------+------*
| product | Q1 | Q2 | Q3 | Q4 |
+---------+-----+-----+------+------+
| Apple | 78 | 0 | NULL | NULL |
| Kale | 121 | 108 | 45 | 3 |
*---------+-----+-----+------+------*/
You can select a subset of values in the pivot_column
:
SELECT * FROM
(SELECT product, sales, quarter FROM Produce)
PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3'))
/*---------+-----+-----+------*
| product | Q1 | Q2 | Q3 |
+---------+-----+-----+------+
| Apple | 78 | 0 | NULL |
| Kale | 121 | 108 | 45 |
*---------+-----+-----+------*/
SELECT * FROM
(SELECT sales, quarter FROM Produce)
PIVOT(SUM(sales) FOR quarter IN ('Q1', 'Q2', 'Q3'))
/*-----+-----+----*
| Q1 | Q2 | Q3 |
+-----+-----+----+
| 199 | 108 | 45 |
*-----+-----+----*/
You can include multiple aggregation functions in the PIVOT
. In this case, you
must specify an alias for each aggregation. These aliases are used to construct
the column names in the resulting table.
SELECT * FROM
(SELECT product, sales, quarter FROM Produce)
PIVOT(SUM(sales) AS total_sales, COUNT(*) AS num_records FOR quarter IN ('Q1', 'Q2'))
/*--------+----------------+----------------+----------------+----------------*
|product | total_sales_Q1 | num_records_Q1 | total_sales_Q2 | num_records_Q2 |
+--------+----------------+----------------+----------------+----------------+
| Kale | 121 | 2 | 108 | 2 |
| Apple | 78 | 2 | 0 | 1 |
*--------+----------------+----------------+----------------+----------------*/
UNPIVOT
operator
FROM from_item[, ...] unpivot_operator unpivot_operator: UNPIVOT [ { INCLUDE NULLS | EXCLUDE NULLS } ] ( { single_column_unpivot | multi_column_unpivot } ) [unpivot_alias] single_column_unpivot: values_column FOR name_column IN (columns_to_unpivot) multi_column_unpivot: values_column_set FOR name_column IN (column_sets_to_unpivot) values_column_set: (values_column[, ...]) columns_to_unpivot: unpivot_column [row_value_alias][, ...] column_sets_to_unpivot: (unpivot_column [row_value_alias][, ...]) unpivot_alias and row_value_alias: [AS] alias
The UNPIVOT
operator rotates columns into rows. UNPIVOT
is part of the
FROM
clause.
UNPIVOT
can be used to modify any table expression.- Combining
UNPIVOT
withFOR SYSTEM_TIME AS OF
is not allowed, although users may useUNPIVOT
against a subquery input which itself usesFOR SYSTEM_TIME AS OF
. - A
WITH OFFSET
clause immediately preceding theUNPIVOT
operator is not allowed. PIVOT
aggregations cannot be reversed withUNPIVOT
.
Conceptual example:
-- Before UNPIVOT is used to rotate Q1, Q2, Q3, Q4 into sales and quarter columns:
/*---------+----+----+----+----*
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Kale | 51 | 23 | 45 | 3 |
| Apple | 77 | 0 | 25 | 2 |
*---------+----+----+----+----*/
-- After UNPIVOT is used to rotate Q1, Q2, Q3, Q4 into sales and quarter columns:
/*---------+-------+---------*
| product | sales | quarter |
+---------+-------+---------+
| Kale | 51 | Q1 |
| Kale | 23 | Q2 |
| Kale | 45 | Q3 |
| Kale | 3 | Q4 |
| Apple | 77 | Q1 |
| Apple | 0 | Q2 |
| Apple | 25 | Q3 |
| Apple | 2 | Q4 |
*---------+-------+---------*/
Definitions
Top-level definitions:
from_item
: The table or subquery on which to perform a pivot operation. Thefrom_item
must follow these rules.unpivot_operator
: The pivot operation to perform on afrom_item
.
unpivot_operator
definitions:
INCLUDE NULLS
: Add rows withNULL
values to the result.EXCLUDE NULLS
: don't add rows withNULL
values to the result. By default,UNPIVOT
excludes rows withNULL
values.single_column_unpivot
: Rotates columns into onevalues_column
and onename_column
.multi_column_unpivot
: Rotates columns into multiplevalues_column
s and onename_column
.unpivot_alias
: An alias for the results of theUNPIVOT
operation. This alias can be referenced elsewhere in the query.
single_column_unpivot
definitions:
values_column
: A column to contain the row values fromcolumns_to_unpivot
. Follow these rules when creating a values column.name_column
: A column to contain the column names fromcolumns_to_unpivot
. Follow these rules when creating a name column.columns_to_unpivot
: The columns from thefrom_item
to populatevalues_column
andname_column
. Follow these rules when creating an unpivot column.row_value_alias
: An optional alias for a column that is displayed for the column inname_column
. If not specified, the string value of the column name is used. Follow these rules when creating a row value alias.
multi_column_unpivot
definitions:
values_column_set
: A set of columns to contain the row values fromcolumns_to_unpivot
. Follow these rules when creating a values column.name_column
: A set of columns to contain the column names fromcolumns_to_unpivot
. Follow these rules when creating a name column.column_sets_to_unpivot
: The columns from thefrom_item
to unpivot. Follow these rules when creating an unpivot column.row_value_alias
: An optional alias for a column set that is displayed for the column set inname_column
. If not specified, a string value for the column set is used and each column in the string is separated with an underscore (_
). For example,(col1, col2)
outputscol1_col2
. Follow these rules when creating a row value alias.
Rules
Rules for a from_item
passed to UNPIVOT
:
- The
from_item
may consist of any table or subquery result. - The
from_item
may not produce a value table. - Duplicate columns in a
from_item
cannot be referenced in theUNPIVOT
clause.
- Expressions are not permitted.
- Qualified names are not permitted. For example,
mytable.mycolumn
is not allowed. - In the case where the
UNPIVOT
result has duplicate column names:SELECT *
is allowed.SELECT values_column
causes ambiguity.
- It cannot be a name used for a
name_column
or anunpivot_column
. - It can be the same name as a column from the
from_item
.
- It cannot be a name used for a
values_column
or anunpivot_column
. - It can be the same name as a column from the
from_item
.
- Must be a column name from the
from_item
. - It cannot reference duplicate
from_item
column names. - All columns in a column set must have equivalent data types.
- Data types cannot be coerced to a common supertype.
- If the data types are exact matches (for example, a struct with different field names), the data type of the first input is the data type of the output.
- You cannot have the same name in the same column set. For example,
(emp1, emp1)
results in an error. - You can have a the same name in different column sets. For example,
(emp1, emp2), (emp1, emp3)
is valid.
- This can be a string or an
INT64
literal. - The data type for all
row_value_alias
clauses must be the same. - If the value is an
INT64
, therow_value_alias
for eachunpivot_column
must be specified.
Examples
The following examples reference a table called Produce
that looks like this:
WITH Produce AS (
SELECT 'Kale' as product, 51 as Q1, 23 as Q2, 45 as Q3, 3 as Q4 UNION ALL
SELECT 'Apple', 77, 0, 25, 2)
SELECT * FROM Produce
/*---------+----+----+----+----*
| product | Q1 | Q2 | Q3 | Q4 |
+---------+----+----+----+----+
| Kale | 51 | 23 | 45 | 3 |
| Apple | 77 | 0 | 25 | 2 |
*---------+----+----+----+----*/
With the UNPIVOT
operator, the columns Q1
, Q2
, Q3
, and Q4
are
rotated. The values of these columns now populate a new column called Sales
and the names of these columns now populate a new column called Quarter
.
This is a single-column unpivot operation.
SELECT * FROM Produce
UNPIVOT(sales FOR quarter IN (Q1, Q2, Q3, Q4))
/*---------+-------+---------*
| product | sales | quarter |
+---------+-------+---------+
| Kale | 51 | Q1 |
| Kale | 23 | Q2 |
| Kale | 45 | Q3 |
| Kale | 3 | Q4 |
| Apple | 77 | Q1 |
| Apple | 0 | Q2 |
| Apple | 25 | Q3 |
| Apple | 2 | Q4 |
*---------+-------+---------*/
In this example, we UNPIVOT
four quarters into two semesters.
This is a multi-column unpivot operation.
SELECT * FROM Produce
UNPIVOT(
(first_half_sales, second_half_sales)
FOR semesters
IN ((Q1, Q2) AS 'semester_1', (Q3, Q4) AS 'semester_2'))
/*---------+------------------+-------------------+------------*
| product | first_half_sales | second_half_sales | semesters |
+---------+------------------+-------------------+------------+
| Kale | 51 | 23 | semester_1 |
| Kale | 45 | 3 | semester_2 |
| Apple | 77 | 0 | semester_1 |
| Apple | 25 | 2 | semester_2 |
*---------+------------------+-------------------+------------*/
TABLESAMPLE
operator
TABLESAMPLE SYSTEM ( percent PERCENT )
Description
You can use the TABLESAMPLE
operator to select a random sample of a dataset.
This operator is useful when you're working with tables that have large
amounts of data and you don't need precise answers.
Sampling returns a variety of records while avoiding the costs associated with
scanning and processing an entire table. Each execution of the query might
return different results because each execution processes an independently
computed sample. GoogleSQL doesn't cache the results of queries that
include a TABLESAMPLE
clause.
Replace percent
with the percentage of the dataset that you want to include in
the results. The value must be between 0
and 100
. The value can be a literal
value or a query parameter. It cannot be a variable.
For more information, see Table sampling.
Example
The following query selects approximately 10% of a table's data:
SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (10 PERCENT)
Join operation
join_operation: { cross_join_operation | condition_join_operation } cross_join_operation: from_item cross_join_operator from_item condition_join_operation: from_item condition_join_operator from_item join_condition cross_join_operator: { CROSS JOIN | , } condition_join_operator: { [INNER] JOIN | FULL [OUTER] JOIN | LEFT [OUTER] JOIN | RIGHT [OUTER] JOIN } join_condition: { on_clause | using_clause } on_clause: ON bool_expression using_clause: USING ( column_list )
The JOIN
operation merges two from_item
s so that the SELECT
clause can
query them as one source. The join operator and join condition specify how to
combine and discard rows from the two from_item
s to form a single source.
[INNER] JOIN
An INNER JOIN
, or simply JOIN
, effectively calculates the Cartesian product
of the two from_item
s and discards all rows that don't meet the join
condition. Effectively means that it is possible to implement an INNER JOIN
without actually calculating the Cartesian product.
FROM A INNER JOIN B ON A.w = B.y
/*
Table A Table B Result
+-------+ +-------+ +---------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------+
| 1 | a | | 2 | k | | 2 | b | 2 | k |
| 2 | b | | 3 | m | | 3 | c | 3 | m |
| 3 | c | | 3 | n | | 3 | c | 3 | n |
| 3 | d | | 4 | p | | 3 | d | 3 | m |
+-------+ +-------+ | 3 | d | 3 | n |
+---------------+
*/
FROM A INNER JOIN B USING (x)
/*
Table A Table B Result
+-------+ +-------+ +-----------+
| x | y | * | x | z | = | x | y | z |
+-------+ +-------+ +-----------+
| 1 | a | | 2 | k | | 2 | b | k |
| 2 | b | | 3 | m | | 3 | c | m |
| 3 | c | | 3 | n | | 3 | c | n |
| 3 | d | | 4 | p | | 3 | d | m |
+-------+ +-------+ | 3 | d | n |
+-----------+
*/
Example
This query performs an INNER JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
/*---------------------------*
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
*---------------------------*/
CROSS JOIN
CROSS JOIN
returns the Cartesian product of the two from_item
s. In other
words, it combines each row from the first from_item
with each row from the
second from_item
.
If the rows of the two from_item
s are independent, then the result has
M * N rows, given M rows in one from_item
and N in the other. Note that
this still holds for the case when either from_item
has zero rows.
In a FROM
clause, a CROSS JOIN
can be written like this:
FROM A CROSS JOIN B
/*
Table A Table B Result
+-------+ +-------+ +---------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------+
| 1 | a | | 2 | c | | 1 | a | 2 | c |
| 2 | b | | 3 | d | | 1 | a | 3 | d |
+-------+ +-------+ | 2 | b | 2 | c |
| 2 | b | 3 | d |
+---------------+
*/
You can use a correlated cross join to convert or flatten an array into a set of rows. To learn more, see Convert elements in an array to rows in a table.
Examples
This query performs an CROSS JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster CROSS JOIN TeamMascot;
/*---------------------------*
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Adams | Knights |
| Adams | Lakers |
| Adams | Mustangs |
| Buchanan | Jaguars |
| Buchanan | Knights |
| Buchanan | Lakers |
| Buchanan | Mustangs |
| ... |
*---------------------------*/
Comma cross join (,)
CROSS JOIN
s can be written implicitly with a comma. This is
called a comma cross join.
A comma cross join looks like this in a FROM
clause:
FROM A, B
/*
Table A Table B Result
+-------+ +-------+ +---------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------+
| 1 | a | | 2 | c | | 1 | a | 2 | c |
| 2 | b | | 3 | d | | 1 | a | 3 | d |
+-------+ +-------+ | 2 | b | 2 | c |
| 2 | b | 3 | d |
+---------------+
*/
You cannot write comma cross joins inside parentheses. To learn more, see Join operations in a sequence.
FROM (A, B) // INVALID
You can use a correlated comma cross join to convert or flatten an array into a set of rows. To learn more, see Convert elements in an array to rows in a table.
Examples
This query performs a comma cross join on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster, TeamMascot;
/*---------------------------*
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Adams | Knights |
| Adams | Lakers |
| Adams | Mustangs |
| Buchanan | Jaguars |
| Buchanan | Knights |
| Buchanan | Lakers |
| Buchanan | Mustangs |
| ... |
*---------------------------*/
FULL [OUTER] JOIN
A FULL OUTER JOIN
(or simply FULL JOIN
) returns all fields for all matching
rows in both from_items
that meet the join condition. If a given row from one
from_item
doesn't join to any row in the other from_item
, the row returns
with NULL
values for all columns from the other from_item
.
FROM A FULL OUTER JOIN B ON A.w = B.y
/*
Table A Table B Result
+-------+ +-------+ +---------------------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------------------+
| 1 | a | | 2 | k | | 1 | a | NULL | NULL |
| 2 | b | | 3 | m | | 2 | b | 2 | k |
| 3 | c | | 3 | n | | 3 | c | 3 | m |
| 3 | d | | 4 | p | | 3 | c | 3 | n |
+-------+ +-------+ | 3 | d | 3 | m |
| 3 | d | 3 | n |
| NULL | NULL | 4 | p |
+---------------------------+
*/
FROM A FULL OUTER JOIN B USING (x)
/*
Table A Table B Result
+-------+ +-------+ +--------------------+
| x | y | * | x | z | = | x | y | z |
+-------+ +-------+ +--------------------+
| 1 | a | | 2 | k | | 1 | a | NULL |
| 2 | b | | 3 | m | | 2 | b | k |
| 3 | c | | 3 | n | | 3 | c | m |
| 3 | d | | 4 | p | | 3 | c | n |
+-------+ +-------+ | 3 | d | m |
| 3 | d | n |
| 4 | NULL | p |
+--------------------+
*/
Example
This query performs a FULL JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster FULL JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
/*---------------------------*
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
| Eisenhower | NULL |
| NULL | Mustangs |
*---------------------------*/
LEFT [OUTER] JOIN
The result of a LEFT OUTER JOIN
(or simply LEFT JOIN
) for two
from_item
s always retains all rows of the left from_item
in the
JOIN
operation, even if no rows in the right from_item
satisfy the join
predicate.
All rows from the left from_item
are retained;
if a given row from the left from_item
doesn't join to any row
in the right from_item
, the row will return with NULL
values for all
columns exclusively from the right from_item
. Rows from the right
from_item
that don't join to any row in the left from_item
are discarded.
FROM A LEFT OUTER JOIN B ON A.w = B.y
/*
Table A Table B Result
+-------+ +-------+ +---------------------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------------------+
| 1 | a | | 2 | k | | 1 | a | NULL | NULL |
| 2 | b | | 3 | m | | 2 | b | 2 | k |
| 3 | c | | 3 | n | | 3 | c | 3 | m |
| 3 | d | | 4 | p | | 3 | c | 3 | n |
+-------+ +-------+ | 3 | d | 3 | m |
| 3 | d | 3 | n |
+---------------------------+
*/
FROM A LEFT OUTER JOIN B USING (x)
/*
Table A Table B Result
+-------+ +-------+ +--------------------+
| x | y | * | x | z | = | x | y | z |
+-------+ +-------+ +--------------------+
| 1 | a | | 2 | k | | 1 | a | NULL |
| 2 | b | | 3 | m | | 2 | b | k |
| 3 | c | | 3 | n | | 3 | c | m |
| 3 | d | | 4 | p | | 3 | c | n |
+-------+ +-------+ | 3 | d | m |
| 3 | d | n |
+--------------------+
*/
Example
This query performs a LEFT JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster LEFT JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
/*---------------------------*
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
| Eisenhower | NULL |
*---------------------------*/
RIGHT [OUTER] JOIN
The result of a RIGHT OUTER JOIN
(or simply RIGHT JOIN
) for two
from_item
s always retains all rows of the right from_item
in the
JOIN
operation, even if no rows in the left from_item
satisfy the join
predicate.
All rows from the right from_item
are returned;
if a given row from the right from_item
doesn't join to any row
in the left from_item
, the row will return with NULL
values for all
columns exclusively from the left from_item
. Rows from the left from_item
that don't join to any row in the right from_item
are discarded.
FROM A RIGHT OUTER JOIN B ON A.w = B.y
/*
Table A Table B Result
+-------+ +-------+ +---------------------------+
| w | x | * | y | z | = | w | x | y | z |
+-------+ +-------+ +---------------------------+
| 1 | a | | 2 | k | | 2 | b | 2 | k |
| 2 | b | | 3 | m | | 3 | c | 3 | m |
| 3 | c | | 3 | n | | 3 | c | 3 | n |
| 3 | d | | 4 | p | | 3 | d | 3 | m |
+-------+ +-------+ | 3 | d | 3 | n |
| NULL | NULL | 4 | p |
+---------------------------+
*/
FROM A RIGHT OUTER JOIN B USING (x)
/*
Table A Table B Result
+-------+ +-------+ +--------------------+
| x | y | * | x | z | = | x | y | z |
+-------+ +-------+ +--------------------+
| 1 | a | | 2 | k | | 2 | b | k |
| 2 | b | | 3 | m | | 3 | c | m |
| 3 | c | | 3 | n | | 3 | c | n |
| 3 | d | | 4 | p | | 3 | d | m |
+-------+ +-------+ | 3 | d | n |
| 4 | NULL | p |
+--------------------+
*/
Example
This query performs a RIGHT JOIN
on the Roster
and TeamMascot
tables.
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster RIGHT JOIN TeamMascot ON Roster.SchoolID = TeamMascot.SchoolID;
/*---------------------------*
| LastName | Mascot |
+---------------------------+
| Adams | Jaguars |
| Buchanan | Lakers |
| Coolidge | Lakers |
| Davis | Knights |
| NULL | Mustangs |
*---------------------------*/
Join conditions
In a join operation, a join condition helps specify how to
combine rows in two from_items
to form a single source.
The two types of join conditions are the ON
clause and
USING
clause. You must use a join condition when you perform a
conditional join operation. You can't use a join condition when you perform a
cross join operation.
ON
clause
ON bool_expression
Description
Given a row from each table, if the ON
clause evaluates to TRUE
, the query
generates a consolidated row with the result of combining the given rows.
Definitions:
bool_expression
: The boolean expression that specifies the condition for the join. This is frequently a comparison operation or logical combination of comparison operators.
Details:
Similarly to CROSS JOIN
, ON
produces a column once for each column in each
input table.
A NULL
join condition evaluation is equivalent to a FALSE
evaluation.
If a column-order sensitive operation such as UNION
or SELECT *
is used
with the ON
join condition, the resulting table contains all of the columns
from the left-hand input in order, and then all of the columns from the
right-hand input in order.
Examples
The following examples show how to use the ON
clause:
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3),
B AS ( SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT * FROM A INNER JOIN B ON A.x = B.x;
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3),
B AS ( SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT A.x, B.x FROM A INNER JOIN B ON A.x = B.x;
/*
Table A Table B Result (A.x, B.x)
+---+ +---+ +-------+
| x | * | x | = | x | x |
+---+ +---+ +-------+
| 1 | | 2 | | 2 | 2 |
| 2 | | 3 | | 3 | 3 |
| 3 | | 4 | +-------+
+---+ +---+
*/
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT * FROM A LEFT OUTER JOIN B ON A.x = B.x;
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT A.x, B.x FROM A LEFT OUTER JOIN B ON A.x = B.x;
/*
Table A Table B Result
+------+ +---+ +-------------+
| x | * | x | = | x | x |
+------+ +---+ +-------------+
| 1 | | 2 | | 1 | NULL |
| 2 | | 3 | | 2 | 2 |
| 3 | | 4 | | 3 | 3 |
| NULL | | 5 | | NULL | NULL |
+------+ +---+ +-------------+
*/
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT * FROM A FULL OUTER JOIN B ON A.x = B.x;
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT A.x, B.x FROM A FULL OUTER JOIN B ON A.x = B.x;
/*
Table A Table B Result
+------+ +---+ +-------------+
| x | * | x | = | x | x |
+------+ +---+ +-------------+
| 1 | | 2 | | 1 | NULL |
| 2 | | 3 | | 2 | 2 |
| 3 | | 4 | | 3 | 3 |
| NULL | | 5 | | NULL | NULL |
+------+ +---+ | NULL | 4 |
| NULL | 5 |
+-------------+
*/
USING
clause
USING ( column_name_list )
column_name_list:
column_name[, ...]
Description
When you are joining two tables, USING
performs an
equality comparison operation on the columns named in
column_name_list
. Each column name in column_name_list
must appear in both
input tables. For each pair of rows from the input tables, if the
equality comparisons all evaluate to TRUE
, one row is added to the resulting
column.
Definitions:
column_name_list
: A list of columns to include in the join condition.column_name
: The column that exists in both of the tables that you are joining.
Details:
A NULL
join condition evaluation is equivalent to a FALSE
evaluation.
If a column-order sensitive operation such as UNION
or SELECT *
is used
with the USING
join condition, the resulting table contains columns in this
order:
- The columns from
column_name_list
in the order they appear in theUSING
clause. - All other columns of the left-hand input in the order they appear in the input.
- All other columns of the right-hand input in the order they appear in the input.
A column name in the USING
clause must not be qualified by a
table name.
If the join is an INNER JOIN
or a LEFT OUTER JOIN
, the output
columns are populated from the values in the first table. If the
join is a RIGHT OUTER JOIN
, the output columns are populated from the values
in the second table. If the join is a FULL OUTER JOIN
, the output columns
are populated by coalescing the values from the left and right
tables in that order.
Examples
The following example shows how to use the USING
clause with one
column name in the column name list:
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 9 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 9 UNION ALL SELECT 9 UNION ALL SELECT 5)
SELECT * FROM A INNER JOIN B USING (x);
/*
Table A Table B Result
+------+ +---+ +---+
| x | * | x | = | x |
+------+ +---+ +---+
| 1 | | 2 | | 2 |
| 2 | | 9 | | 9 |
| 9 | | 9 | | 9 |
| NULL | | 5 | +---+
+------+ +---+
*/
The following example shows how to use the USING
clause with
multiple column names in the column name list:
WITH
A AS (
SELECT 1 as x, 15 as y UNION ALL
SELECT 2, 10 UNION ALL
SELECT 9, 16 UNION ALL
SELECT NULL, 12),
B AS (
SELECT 2 as x, 10 as y UNION ALL
SELECT 9, 17 UNION ALL
SELECT 9, 16 UNION ALL
SELECT 5, 15)
SELECT * FROM A INNER JOIN B USING (x, y);
/*
Table A Table B Result
+-----------+ +---------+ +---------+
| x | y | * | x | y | = | x | y |
+-----------+ +---------+ +---------+
| 1 | 15 | | 2 | 10 | | 2 | 10 |
| 2 | 10 | | 9 | 17 | | 9 | 16 |
| 9 | 16 | | 9 | 16 | +---------+
| NULL | 12 | | 5 | 15 |
+-----------+ +---------+
*/
The following examples show additional ways in which to use the USING
clause
with one column name in the column name list:
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 9 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 9 UNION ALL SELECT 9 UNION ALL SELECT 5)
SELECT x, A.x, B.x FROM A INNER JOIN B USING (x)
/*
Table A Table B Result
+------+ +---+ +--------------------+
| x | * | x | = | x | A.x | B.x |
+------+ +---+ +--------------------+
| 1 | | 2 | | 2 | 2 | 2 |
| 2 | | 9 | | 9 | 9 | 9 |
| 9 | | 9 | | 9 | 9 | 9 |
| NULL | | 5 | +--------------------+
+------+ +---+
*/
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 9 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 9 UNION ALL SELECT 9 UNION ALL SELECT 5)
SELECT x, A.x, B.x FROM A LEFT OUTER JOIN B USING (x)
/*
Table A Table B Result
+------+ +---+ +--------------------+
| x | * | x | = | x | A.x | B.x |
+------+ +---+ +--------------------+
| 1 | | 2 | | 1 | 1 | NULL |
| 2 | | 9 | | 2 | 2 | 2 |
| 9 | | 9 | | 9 | 9 | 9 |
| NULL | | 5 | | 9 | 9 | 9 |
+------+ +---+ | NULL | NULL | NULL |
+--------------------+
*/
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 2 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 9 UNION ALL SELECT 9 UNION ALL SELECT 5)
SELECT x, A.x, B.x FROM A RIGHT OUTER JOIN B USING (x)
/*
Table A Table B Result
+------+ +---+ +--------------------+
| x | * | x | = | x | A.x | B.x |
+------+ +---+ +--------------------+
| 1 | | 2 | | 2 | 2 | 2 |
| 2 | | 9 | | 2 | 2 | 2 |
| 2 | | 9 | | 9 | NULL | 9 |
| NULL | | 5 | | 9 | NULL | 9 |
+------+ +---+ | 5 | NULL | 5 |
+--------------------+
*/
WITH
A AS ( SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 2 UNION ALL SELECT NULL),
B AS ( SELECT 2 as x UNION ALL SELECT 9 UNION ALL SELECT 9 UNION ALL SELECT 5)
SELECT x, A.x, B.x FROM A FULL OUTER JOIN B USING (x);
/*
Table A Table B Result
+------+ +---+ +--------------------+
| x | * | x | = | x | A.x | B.x |
+------+ +---+ +--------------------+
| 1 | | 2 | | 1 | 1 | NULL |
| 2 | | 9 | | 2 | 2 | 2 |
| 2 | | 9 | | 2 | 2 | 2 |
| NULL | | 5 | | NULL | NULL | NULL |
+------+ +---+ | 9 | NULL | 9 |
| 9 | NULL | 9 |
| 5 | NULL | 5 |
+--------------------+
*/
The following example shows how to use the USING
clause with
only some column names in the column name list.
WITH
A AS (
SELECT 1 as x, 15 as y UNION ALL
SELECT 2, 10 UNION ALL
SELECT 9, 16 UNION ALL
SELECT NULL, 12),
B AS (
SELECT 2 as x, 10 as y UNION ALL
SELECT 9, 17 UNION ALL
SELECT 9, 16 UNION ALL
SELECT 5, 15)
SELECT * FROM A INNER JOIN B USING (x);
/*
Table A Table B Result
+-----------+ +---------+ +-----------------+
| x | y | * | x | y | = | x | A.y | B.y |
+-----------+ +---------+ +-----------------+
| 1 | 15 | | 2 | 10 | | 2 | 10 | 10 |
| 2 | 10 | | 9 | 17 | | 9 | 16 | 17 |
| 9 | 16 | | 9 | 16 | | 9 | 16 | 16 |
| NULL | 12 | | 5 | 15 | +-----------------+
+-----------+ +---------+
*/
The following query performs an INNER JOIN
on the
Roster
and TeamMascot
table.
The query returns the rows from Roster
and TeamMascot
where
Roster.SchoolID
is the same as TeamMascot.SchoolID
. The results include a
single SchoolID
column.
SELECT * FROM Roster INNER JOIN TeamMascot USING (SchoolID);
/*----------------------------------------*
| SchoolID | LastName | Mascot |
+----------------------------------------+
| 50 | Adams | Jaguars |
| 52 | Buchanan | Lakers |
| 52 | Coolidge | Lakers |
| 51 | Davis | Knights |
*----------------------------------------*/
ON
and USING
equivalency
The ON
and USING
join conditions are not
equivalent, but they share some rules and sometimes they can produce similar
results.
In the following examples, observe what is returned when all rows
are produced for inner and outer joins. Also, look at how
each join condition handles NULL
values.
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT * FROM A INNER JOIN B ON A.x = B.x;
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT * FROM A INNER JOIN B USING (x);
/*
Table A Table B Result ON Result USING
+---+ +---+ +-------+ +---+
| x | * | x | = | x | x | | x |
+---+ +---+ +-------+ +---+
| 1 | | 2 | | 2 | 2 | | 2 |
| 2 | | 3 | | 3 | 3 | | 3 |
| 3 | | 4 | +-------+ +---+
+---+ +---+
*/
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT * FROM A LEFT OUTER JOIN B ON A.x = B.x;
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT * FROM A LEFT OUTER JOIN B USING (x);
/*
Table A Table B Result ON Result USING
+------+ +---+ +-------------+ +------+
| x | * | x | = | x | x | | x |
+------+ +---+ +-------------+ +------+
| 1 | | 2 | | 1 | NULL | | 1 |
| 2 | | 3 | | 2 | 2 | | 2 |
| 3 | | 4 | | 3 | 3 | | 3 |
| NULL | | 5 | | NULL | NULL | | NULL |
+------+ +---+ +-------------+ +------+
*/
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT * FROM A FULL OUTER JOIN B ON A.x = B.x;
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT * FROM A FULL OUTER JOIN B USING (x);
/*
Table A Table B Result ON Result USING
+---+ +---+ +-------------+ +---+
| x | * | x | = | x | x | | x |
+---+ +---+ +-------------+ +---+
| 1 | | 2 | | 1 | NULL | | 1 |
| 2 | | 3 | | 2 | 2 | | 2 |
| 3 | | 4 | | 3 | 3 | | 3 |
+---+ +---+ | NULL | 4 | | 4 |
+-------------+ +---+
*/
Although ON
and USING
are not equivalent, they can return the same
results in some situations if you specify the columns you want to return.
In the following examples, observe what is returned when a specific row
is produced for inner and outer joins. Also, look at how each
join condition handles NULL
values.
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT A.x FROM A INNER JOIN B ON A.x = B.x;
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT x FROM A INNER JOIN B USING (x);
/*
Table A Table B Result ON Result USING
+------+ +---+ +---+ +---+
| x | * | x | = | x | | x |
+------+ +---+ +---+ +---+
| 1 | | 2 | | 2 | | 2 |
| 2 | | 3 | | 3 | | 3 |
| 3 | | 4 | +---+ +---+
| NULL | | 5 |
+------+ +---+
*/
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT A.x FROM A LEFT OUTER JOIN B ON A.x = B.x;
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT x FROM A LEFT OUTER JOIN B USING (x);
/*
Table A Table B Result ON Result USING
+------+ +---+ +------+ +------+
| x | * | x | = | x | | x |
+------+ +---+ +------+ +------+
| 1 | | 2 | | 1 | | 1 |
| 2 | | 3 | | 2 | | 2 |
| 3 | | 4 | | 3 | | 3 |
| NULL | | 5 | | NULL | | NULL |
+------+ +---+ +------+ +------+
*/
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT A.x FROM A FULL OUTER JOIN B ON A.x = B.x;
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT x FROM A FULL OUTER JOIN B USING (x);
/*
Table A Table B Result ON Result USING
+------+ +---+ +------+ +------+
| x | * | x | = | x | | x |
+------+ +---+ +------+ +------+
| 1 | | 2 | | 1 | | 1 |
| 2 | | 3 | | 2 | | 2 |
| 3 | | 4 | | 3 | | 3 |
| NULL | | 5 | | NULL | | NULL |
+------+ +---+ | NULL | | 4 |
| NULL | | 5 |
+------+ +------+
*/
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT B.x FROM A FULL OUTER JOIN B ON A.x = B.x;
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT x FROM A FULL OUTER JOIN B USING (x);
/*
Table A Table B Result ON Result USING
+------+ +---+ +------+ +------+
| x | * | x | = | x | | x |
+------+ +---+ +------+ +------+
| 1 | | 2 | | 2 | | 1 |
| 2 | | 3 | | 3 | | 2 |
| 3 | | 4 | | NULL | | 3 |
| NULL | | 5 | | NULL | | NULL |
+------+ +---+ | 4 | | 4 |
| 5 | | 5 |
+------+ +------+
*/
In the following example, observe what is returned when COALESCE
is used
with the ON
clause. It provides the same results as a query
with the USING
clause.
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT COALESCE(A.x, B.x) FROM A FULL OUTER JOIN B ON A.x = B.x;
WITH
A AS (SELECT 1 as x UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT NULL),
B AS (SELECT 2 as x UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5)
SELECT x FROM A FULL OUTER JOIN B USING (x);
/*
Table A Table B Result ON Result USING
+------+ +---+ +------+ +------+
| x | * | x | = | x | | x |
+------+ +---+ +------+ +------+
| 1 | | 2 | | 1 | | 1 |
| 2 | | 3 | | 2 | | 2 |
| 3 | | 4 | | 3 | | 3 |
| NULL | | 5 | | NULL | | NULL |
+------+ +---+ | 4 | | 4 |
| 5 | | 5 |
+------+ +------+
*/
Join operations in a sequence
The FROM
clause can contain multiple JOIN
operations in a sequence.
JOIN
s are bound from left to right. For example:
FROM A JOIN B USING (x) JOIN C USING (x)
-- A JOIN B USING (x) = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2 = return value
You can also insert parentheses to group JOIN
s:
FROM ( (A JOIN B USING (x)) JOIN C USING (x) )
-- A JOIN B USING (x) = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2 = return value
With parentheses, you can group JOIN
s so that they are bound in a different
order:
FROM ( A JOIN (B JOIN C USING (x)) USING (x) )
-- B JOIN C USING (x) = result_1
-- A JOIN result_1 = result_2
-- result_2 = return value
A FROM
clause can have multiple joins. Provided there are no comma cross joins
in the FROM
clause, joins don't require parenthesis, though parenthesis can
help readability:
FROM A JOIN B JOIN C JOIN D USING (w) ON B.x = C.y ON A.z = B.x
If your clause contains comma cross joins, you must use parentheses:
FROM A, B JOIN C JOIN D ON C.x = D.y ON B.z = C.x // INVALID
FROM A, B JOIN (C JOIN D ON C.x = D.y) ON B.z = C.x // VALID
When comma cross joins are present in a query with a sequence of JOINs, they
group from left to right like other JOIN
types:
FROM A JOIN B USING (x) JOIN C USING (x), D
-- A JOIN B USING (x) = result_1
-- result_1 JOIN C USING (x) = result_2
-- result_2 CROSS JOIN D = return value
There cannot be a RIGHT JOIN
or FULL JOIN
after a comma cross join unless it
is parenthesized:
FROM A, B RIGHT JOIN C ON TRUE // INVALID
FROM A, B FULL JOIN C ON TRUE // INVALID
FROM A, B JOIN C ON TRUE // VALID
FROM A, (B RIGHT JOIN C ON TRUE) // VALID
FROM A, (B FULL JOIN C ON TRUE) // VALID
Correlated join operation
A join operation is correlated when the right from_item
contains a
reference to at least one range variable or
column name introduced by the left from_item
.
In a correlated join operation, rows from the right from_item
are determined
by a row from the left from_item
. Consequently, RIGHT OUTER
and FULL OUTER
joins cannot be correlated because right from_item
rows cannot be determined
in the case when there is no row from the left from_item
.
All correlated join operations must reference an array in the right from_item
.
This is a conceptual example of a correlated join operation that includes a correlated subquery:
FROM A JOIN UNNEST(ARRAY(SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)) AS C
- Left
from_item
:A
- Right
from_item
:UNNEST(...) AS C
- A correlated subquery:
(SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)
This is another conceptual example of a correlated join operation.
array_of_IDs
is part of the left from_item
but is referenced in the
right from_item
.
FROM A JOIN UNNEST(A.array_of_IDs) AS C
The UNNEST
operator can be explicit or implicit.
These are both allowed:
FROM A JOIN UNNEST(A.array_of_IDs) AS IDs
FROM A JOIN A.array_of_IDs AS IDs
In a correlated join operation, the right from_item
is re-evaluated
against each distinct row from the left from_item
. In the following
conceptual example, the correlated join operation first
evaluates A
and B
, then A
and C
:
FROM
A
JOIN
UNNEST(ARRAY(SELECT AS STRUCT * FROM B WHERE A.ID = B.ID)) AS C
ON A.Name = C.Name
Caveats
- In a correlated
LEFT JOIN
, when the input table on the right side is empty for some row from the left side, it is as if no rows from the right side satisfied the join condition in a regularLEFT JOIN
. When there are no joining rows, a row withNULL
values for all columns on the right side is generated to join with the row from the left side. - In a correlated
CROSS JOIN
, when the input table on the right side is empty for some row from the left side, it is as if no rows from the right side satisfied the join condition in a regular correlatedINNER JOIN
. This means that the row is dropped from the results.
Examples
This is an example of a correlated join, using the Roster and PlayerStats tables:
SELECT *
FROM
Roster
JOIN
UNNEST(
ARRAY(
SELECT AS STRUCT *
FROM PlayerStats
WHERE PlayerStats.OpponentID = Roster.SchoolID
)) AS PlayerMatches
ON PlayerMatches.LastName = 'Buchanan'
/*------------+----------+----------+------------+--------------*
| LastName | SchoolID | LastName | OpponentID | PointsScored |
+------------+----------+----------+------------+--------------+
| Adams | 50 | Buchanan | 50 | 13 |
| Eisenhower | 77 | Buchanan | 77 | 0 |
*------------+----------+----------+------------+--------------*/
A common pattern for a correlated LEFT JOIN
is to have an UNNEST
operation
on the right side that references an array from some column introduced by
input on the left side. For rows where that array is empty or NULL
,
the UNNEST
operation produces no rows on the right input. In that case, a row
with a NULL
entry in each column of the right input is created to join with
the row from the left input. For example:
SELECT A.name, item, ARRAY_LENGTH(A.items) item_count_for_name
FROM
UNNEST(
[
STRUCT(
'first' AS name,
[1, 2, 3, 4] AS items),
STRUCT(
'second' AS name,
[] AS items)]) AS A
LEFT JOIN
A.items AS item;
/*--------+------+---------------------*
| name | item | item_count_for_name |
+--------+------+---------------------+
| first | 1 | 4 |
| first | 2 | 4 |
| first | 3 | 4 |
| first | 4 | 4 |
| second | NULL | 0 |
*--------+------+---------------------*/
In the case of a correlated CROSS JOIN
, when the input on the right side
is empty for some row from the left side, the final row is dropped from the
results. For example:
SELECT A.name, item
FROM
UNNEST(
[
STRUCT(
'first' AS name,
[1, 2, 3, 4] AS items),
STRUCT(
'second' AS name,
[] AS items)]) AS A
CROSS JOIN
A.items AS item;
/*-------+------*
| name | item |
+-------+------+
| first | 1 |
| first | 2 |
| first | 3 |
| first | 4 |
*-------+------*/
WHERE
clause
WHERE bool_expression
The WHERE
clause filters the results of the FROM
clause.
Only rows whose bool_expression
evaluates to TRUE
are included. Rows
whose bool_expression
evaluates to NULL
or FALSE
are
discarded.
The evaluation of a query with a WHERE
clause is typically completed in this
order:
FROM
WHERE
GROUP BY
and aggregationHAVING
WINDOW
QUALIFY
DISTINCT
ORDER BY
LIMIT
Evaluation order doesn't always match syntax order.
The WHERE
clause only references columns available via the FROM
clause;
it cannot reference SELECT
list aliases.
Examples
This query returns returns all rows from the Roster
table
where the SchoolID
column has the value 52
:
SELECT * FROM Roster
WHERE SchoolID = 52;
The bool_expression
can contain multiple sub-conditions:
SELECT * FROM Roster
WHERE STARTS_WITH(LastName, "Mc") OR STARTS_WITH(LastName, "Mac");
Expressions in an INNER JOIN
have an equivalent expression in the
WHERE
clause. For example, a query using INNER
JOIN
and ON
has an
equivalent expression using CROSS JOIN
and WHERE
. For example,
the following two queries are equivalent:
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster INNER JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
SELECT Roster.LastName, TeamMascot.Mascot
FROM Roster CROSS JOIN TeamMascot
WHERE Roster.SchoolID = TeamMascot.SchoolID;
GROUP BY
clause
GROUP BY group_by_specification group_by_specification: { groupable_items | ALL | grouping_sets_specification | rollup_specification | cube_specification | () }
Description
The GROUP BY
clause groups together rows in a table that share common values
for certain columns. For a group of rows in the source table with
non-distinct values, the GROUP BY
clause aggregates them into a single
combined row. This clause is commonly used when aggregate functions are
present in the SELECT
list, or to eliminate redundancy in the output.
Definitions
groupable_items
: Group rows in a table that share common values for certain columns. To learn more, see Group rows by groupable items.ALL
: Automatically group rows. To learn more, see Group rows automatically.grouping_sets_specification
: Group rows with theGROUP BY GROUPING SETS
clause. To learn more, see Group rows byGROUPING SETS
.rollup_specification
: Group rows with theGROUP BY ROLLUP
clause. To learn more, see Group rows byROLLUP
.cube_specification
: Group rows with theGROUP BY CUBE
clause. To learn more, see Group rows byCUBE
.()
: Group all rows and produce a grand total. Equivalent to nogroup_by_specification
.
Group rows by groupable items
GROUP BY groupable_item[, ...] groupable_item: { value | value_alias | column_ordinal }
Description
The GROUP BY
clause can include groupable expressions
and their ordinals.
Definitions
value
: An expression that represents a non-distinct, groupable value. To learn more, see Group rows by values.value_alias
: An alias forvalue
. To learn more, see Group rows by values.column_ordinal
: AnINT64
value that represents the ordinal assigned to a groupable expression in theSELECT
list. To learn more, see Group rows by column ordinals.
Group rows by values
The GROUP BY
clause can group rows in a table with non-distinct
values in the GROUP BY
clause. For example:
WITH PlayerStats AS (
SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL
SELECT 'Buchanan', 'Jie', 0 UNION ALL
SELECT 'Coolidge', 'Kiran', 1 UNION ALL
SELECT 'Adams', 'Noam', 4 UNION ALL
SELECT 'Buchanan', 'Jie', 13)
SELECT SUM(PointsScored) AS total_points, LastName
FROM PlayerStats
GROUP BY LastName;
/*--------------+----------+
| total_points | LastName |
+--------------+----------+
| 7 | Adams |
| 13 | Buchanan |
| 1 | Coolidge |
+--------------+----------*/
GROUP BY
clauses may also refer to aliases. If a query contains aliases in
the SELECT
clause, those aliases override names in the corresponding FROM
clause. For example:
WITH PlayerStats AS (
SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL
SELECT 'Buchanan', 'Jie', 0 UNION ALL
SELECT 'Coolidge', 'Kiran', 1 UNION ALL
SELECT 'Adams', 'Noam', 4 UNION ALL
SELECT 'Buchanan', 'Jie', 13)
SELECT SUM(PointsScored) AS total_points, LastName AS last_name
FROM PlayerStats
GROUP BY last_name;
/*--------------+-----------+
| total_points | last_name |
+--------------+-----------+
| 7 | Adams |
| 13 | Buchanan |
| 1 | Coolidge |
+--------------+-----------*/
You can use the GROUP BY
clause with arrays. The following query executes
because the array elements being grouped are the same length and group type:
WITH PlayerStats AS (
SELECT ['Coolidge', 'Adams'] as Name, 3 as PointsScored UNION ALL
SELECT ['Adams', 'Buchanan'], 0 UNION ALL
SELECT ['Coolidge', 'Adams'], 1 UNION ALL
SELECT ['Kiran', 'Noam'], 1)
SELECT SUM(PointsScored) AS total_points, name
FROM PlayerStats
GROUP BY Name;
/*--------------+------------------+
| total_points | name |
+--------------+------------------+
| 4 | [Coolidge,Adams] |
| 0 | [Adams,Buchanan] |
| 1 | [Kiran,Noam] |
+--------------+------------------*/
You can use the GROUP BY
clause with structs. The following query executes
because the struct fields being grouped have the same group types:
WITH
TeamStats AS (
SELECT
ARRAY<STRUCT<last_name STRING, first_name STRING, age INT64>>[
('Adams', 'Noam', 20), ('Buchanan', 'Jie', 19)] AS Team,
3 AS PointsScored
UNION ALL
SELECT [('Coolidge', 'Kiran', 21), ('Yang', 'Jason', 22)], 4
UNION ALL
SELECT [('Adams', 'Noam', 20), ('Buchanan', 'Jie', 19)], 10
UNION ALL
SELECT [('Coolidge', 'Kiran', 21), ('Yang', 'Jason', 22)], 7
)
SELECT
SUM(PointsScored) AS total_points,
Team
FROM TeamStats
GROUP BY Team;
/*--------------+--------------------------+
| total_points | teams |
+--------------+--------------------------+
| 13 | [{ |
| | last_name: "Adams", |
| | first_name: "Noam", |
| | age: 20 |
| | },{ |
| | last_name: "Buchanan",|
| | first_name: "Jie", |
| | age: 19 |
| | }] |
+-----------------------------------------+
| 11 | [{ |
| | last_name: "Coolidge",|
| | first_name: "Kiran", |
| | age: 21 |
| | },{ |
| | last_name: "Yang", |
| | first_name: "Jason", |
| | age: 22 |
| | }] |
+--------------+--------------------------*/
To learn more about the data types that are supported for values in the
GROUP BY
clause, see Groupable data types.
Group rows by column ordinals
The GROUP BY
clause can refer to expression names in the SELECT
list. The
GROUP BY
clause also allows ordinal references to expressions in the SELECT
list, using integer values. 1
refers to the first value in the
SELECT
list, 2
the second, and so forth. The value list can combine
ordinals and value names. The following queries are equivalent:
WITH PlayerStats AS (
SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL
SELECT 'Buchanan', 'Jie', 0 UNION ALL
SELECT 'Coolidge', 'Kiran', 1 UNION ALL
SELECT 'Adams', 'Noam', 4 UNION ALL
SELECT 'Buchanan', 'Jie', 13)
SELECT SUM(PointsScored) AS total_points, LastName, FirstName
FROM PlayerStats
GROUP BY LastName, FirstName;
/*--------------+----------+-----------+
| total_points | LastName | FirstName |
+--------------+----------+-----------+
| 7 | Adams | Noam |
| 13 | Buchanan | Jie |
| 1 | Coolidge | Kiran |
+--------------+----------+-----------*/
WITH PlayerStats AS (
SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL
SELECT 'Buchanan', 'Jie', 0 UNION ALL
SELECT 'Coolidge', 'Kiran', 1 UNION ALL
SELECT 'Adams', 'Noam', 4 UNION ALL
SELECT 'Buchanan', 'Jie', 13)
SELECT SUM(PointsScored) AS total_points, LastName, FirstName
FROM PlayerStats
GROUP BY 2, 3;
/*--------------+----------+-----------+
| total_points | LastName | FirstName |
+--------------+----------+-----------+
| 7 | Adams | Noam |
| 13 | Buchanan | Jie |
| 1 | Coolidge | Kiran |
+--------------+----------+-----------*/
Group rows by ALL
GROUP BY ALL
Description
The GROUP BY ALL
clause groups rows by inferring grouping keys from the
SELECT
items.
The following SELECT
items are excluded from the GROUP BY ALL
clause:
- Expressions that include aggregate functions.
- Expressions that include window functions.
- Expressions that do not reference a name from the
FROM
clause. This includes:- Constants
- Query parameters
- Correlated column references
- Expressions that only reference
GROUP BY
keys inferred from otherSELECT
items.
After exclusions are applied, an error is produced if any remaining SELECT
item includes a volatile function or has a non-groupable type.
If the set of inferred grouping keys is empty after exclusions are applied, all
input rows are considered a single group for aggregation. This
behavior is equivalent to writing GROUP BY ()
.
Examples
In the following example, the query groups rows by first_name
and
last_name
. total_points
is excluded because it represents an
aggregate function.
WITH PlayerStats AS (
SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL
SELECT 'Buchanan', 'Jie', 0 UNION ALL
SELECT 'Coolidge', 'Kiran', 1 UNION ALL
SELECT 'Adams', 'Noam', 4 UNION ALL
SELECT 'Buchanan', 'Jie', 13)
SELECT
SUM(PointsScored) AS total_points,
FirstName AS first_name,
LastName AS last_name
FROM PlayerStats
GROUP BY ALL;
/*--------------+------------+-----------+
| total_points | first_name | last_name |
+--------------+------------+-----------+
| 7 | Noam | Adams |
| 13 | Jie | Buchanan |
| 1 | Kiran | Coolidge |
+--------------+------------+-----------*/
If the select list contains an analytic function, the query groups rows by
first_name
and last_name
. total_people
is excluded because it
contains a window function.
WITH PlayerStats AS (
SELECT 'Adams' as LastName, 'Noam' as FirstName, 3 as PointsScored UNION ALL
SELECT 'Buchanan', 'Jie', 0 UNION ALL
SELECT 'Coolidge', 'Kiran', 1 UNION ALL
SELECT 'Adams', 'Noam', 4 UNION ALL
SELECT 'Buchanan', 'Jie', 13)
SELECT
COUNT(*) OVER () AS total_people,
FirstName AS first_name,
LastName AS last_name
FROM PlayerStats
GROUP BY ALL;
/*--------------+------------+-----------+
| total_people | first_name | last_name |
+--------------+------------+-----------+
| 3 | Noam | Adams |
| 3 | Jie | Buchanan |
| 3 | Kiran | Coolidge |
+--------------+------------+-----------*/
If multiple SELECT
items reference the same FROM
item, and any of them is
a path expression prefix of another, only the prefix path is used for grouping.
In the following example, coordinates
is excluded because x_coordinate
and
y_coordinate
have already referenced Values.x
and Values.y
in the
FROM
clause, and they are prefixes of the path expression used in
x_coordinate
:
WITH Values AS (
SELECT 1 AS x, 2 AS y
UNION ALL SELECT 1 AS x, 4 AS y
UNION ALL SELECT 2 AS x, 5 AS y
)
SELECT
Values.x AS x_coordinate,
Values.y AS y_coordinate,
[Values.x, Values.y] AS coordinates
FROM Values
GROUP BY ALL
/*--------------+--------------+-------------+
| x_coordinate | y_coordinate | coordinates |
+--------------+--------------+-------------+
| 1 | 4 | [1, 4] |
| 1 | 2 | [1, 2] |
| 2 | 5 | [2, 5] |
+--------------+--------------+-------------*/
In the following example, the inferred set of grouping keys is empty. The query returns one row even when the input contains zero rows.
SELECT COUNT(*) AS num_rows
FROM UNNEST([])
GROUP BY ALL
/*----------+
| num_rows |
+----------+
| 0 |
+----------*/
Group rows by GROUPING SETS
GROUP BY GROUPING SETS ( grouping_list ) grouping_list: { rollup_specification | cube_specification | groupable_item | groupable_item_set }[, ...] groupable_item_set: ( [ groupable_item[, ...] ] )
Description
The GROUP BY GROUPING SETS
clause produces aggregated data for one or more
grouping sets. A grouping set is a group of columns by which rows can
be grouped together. This clause is helpful if you want to produce
aggregated data for sets of data without using the UNION
operation.
For example, GROUP BY GROUPING SETS(x,y)
is roughly equivalent to
GROUP BY x UNION ALL GROUP BY y
.
Definitions
grouping_list
: A list of items that you can add to theGROUPING SETS
clause. Grouping sets are generated based upon what is in this list.rollup_specification
: Group rows with theROLLUP
clause. Don't includeGROUP BY
if you use this inside theGROUPING SETS
clause. To learn more, see Group rows byROLLUP
.cube_specification
: Group rows with theCUBE
clause. Don't includeGROUP BY
if you use this inside theGROUPING SETS
clause. To learn more, see Group rows byCUBE
.groupable_item
: Group rows in a table that share common values for certain columns. To learn more, see Group rows by groupable items. AnonymousSTRUCT
values are not allowed.groupable_item_set
: Group rows by a set of groupable items. If the set contains no groupable items, group all rows and produce a grand total.
Details
GROUP BY GROUPING SETS
works by taking a grouping list, generating
grouping sets from it, and then producing a table as a union of queries
grouped by each grouping set.
For example, GROUP BY GROUPING SETS (a, b, c)
generates the
following grouping sets from the grouping list, a, b, c
, and
then produces aggregated rows for each of them:
(a)
(b)
(c)
Here is an example that includes groupable item sets in
GROUP BY GROUPING SETS (a, (b, c), d)
:
Conceptual grouping sets | Actual grouping sets |
---|---|
(a) |
(a) |
((b, c)) |
(b, c) |
(d) |
(d) |
GROUP BY GROUPING SETS
can include ROLLUP
and CUBE
operations, which
generate grouping sets. If ROLLUP
is added, it generates rolled up
grouping sets. If CUBE
is added, it generates grouping set permutations.
The following grouping sets are generated for
GROUP BY GROUPING SETS (a, ROLLUP(b, c), d)
.
Conceptual grouping sets | Actual grouping sets |
---|---|
(a) |
(a) |
((b, c)) |
(b, c) |
((b)) |
(b) |
(()) |
() |
(d) |
(d) |
The following grouping sets are generated for
GROUP BY GROUPING SETS (a, CUBE(b, c), d)
:
Conceptual grouping sets | Actual grouping sets |
---|---|
(a) |
(a) |
((b, c)) |
(b, c) |
((b)) |
(b) |
((c)) |
(c) |
(()) |
() |
(d) |
(d) |
When evaluating the results for a particular grouping set,
expressions that are not in the grouping set are aggregated and produce a
NULL
placeholder.
You can filter results for specific groupable items. To learn more, see the
GROUPING
function
GROUPING SETS
allows up to 4096 groupable items.
Examples
The following queries produce the same results, but
the first one uses GROUP BY GROUPING SETS
and the second one doesn't:
-- GROUP BY with GROUPING SETS
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY GROUPING SETS (product_type, product_name)
ORDER BY product_name
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| shirt | NULL | 36 |
| pants | NULL | 6 |
| NULL | jeans | 6 |
| NULL | polo | 25 |
| NULL | t-shirt | 11 |
+--------------+--------------+-------------*/
-- GROUP BY without GROUPING SETS
-- (produces the same results as GROUPING SETS)
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, NULL, SUM(product_count) AS product_sum
FROM Products
GROUP BY product_type
UNION ALL
SELECT NULL, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY product_name
ORDER BY product_name
You can include groupable item sets in a GROUP BY GROUPING SETS
clause.
In the example below, (product_type, product_name)
is a groupable item set.
-- GROUP BY with GROUPING SETS and a groupable item set
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY GROUPING SETS (
product_type,
(product_type, product_name))
ORDER BY product_type, product_name;
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| pants | NULL | 6 |
| pants | jeans | 6 |
| shirt | NULL | 36 |
| shirt | polo | 25 |
| shirt | t-shirt | 11 |
+--------------+--------------+-------------*/
-- GROUP BY with GROUPING SETS but without a groupable item set
-- (produces the same results as GROUPING SETS with a groupable item set)
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, NULL, SUM(product_count) AS product_sum
FROM Products
GROUP BY product_type
UNION ALL
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY product_type, product_name
ORDER BY product_type, product_name;
You can include ROLLUP
in a
GROUP BY GROUPING SETS
clause. For example:
-- GROUP BY with GROUPING SETS and ROLLUP
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY GROUPING SETS (
product_type,
ROLLUP (product_type, product_name))
ORDER BY product_type, product_name;
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| NULL | NULL | 42 |
| pants | NULL | 6 |
| pants | NULL | 6 |
| pants | jeans | 6 |
| shirt | NULL | 36 |
| shirt | NULL | 36 |
| shirt | polo | 25 |
| shirt | t-shirt | 11 |
+--------------+--------------+-------------*/
-- GROUP BY with GROUPING SETS, but without ROLLUP
-- (produces the same results as GROUPING SETS with ROLLUP)
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY GROUPING SETS(
product_type,
(product_type, product_name),
product_type,
())
ORDER BY product_type, product_name;
You can include CUBE
in a GROUP BY GROUPING SETS
clause.
For example:
-- GROUP BY with GROUPING SETS and CUBE
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY GROUPING SETS (
product_type,
CUBE (product_type, product_name))
ORDER BY product_type, product_name;
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| NULL | NULL | 42 |
| NULL | jeans | 6 |
| NULL | polo | 25 |
| NULL | t-shirt | 11 |
| pants | NULL | 6 |
| pants | NULL | 6 |
| pants | jeans | 6 |
| shirt | NULL | 36 |
| shirt | NULL | 36 |
| shirt | polo | 25 |
| shirt | t-shirt | 11 |
+--------------+--------------+-------------*/
-- GROUP BY with GROUPING SETS, but without CUBE
-- (produces the same results as GROUPING SETS with CUBE)
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY GROUPING SETS(
product_type,
(product_type, product_name),
product_type,
product_name,
())
ORDER BY product_type, product_name;
Group rows by ROLLUP
GROUP BY ROLLUP ( grouping_list ) grouping_list: { groupable_item | groupable_item_set }[, ...] groupable_item_set: ( groupable_item[, ...] )
Description
The GROUP BY ROLLUP
clause produces aggregated data for rolled up
grouping sets. A grouping set is a group of columns by which rows can
be grouped together. This clause is helpful if you need to roll up totals
in a set of data.
Definitions
grouping_list
: A list of items that you can add to theGROUPING SETS
clause. This is used to create a generated list of grouping sets when the query is run.groupable_item
: Group rows in a table that share common values for certain columns. To learn more, see Group rows by groupable items.anonymousSTRUCT
values are not allowed.groupable_item_set
: Group rows by a subset of groupable items.
Details
GROUP BY ROLLUP
works by taking a grouping list, generating
grouping sets from the prefixes inside this list, and then producing a
table as a union of queries grouped by each grouping set. The resulting
grouping sets include an empty grouping set. In the empty grouping set, all
rows are aggregated down to a single group.
For example, GROUP BY ROLLUP (a, b, c)
generates the
following grouping sets from the grouping list, a, b, c
, and then produces
aggregated rows for each of them:
(a, b, c)
(a, b)
(a)
()
Here is an example that includes groupable item sets in
GROUP BY ROLLUP (a, (b, c), d)
:
Conceptual grouping sets | Actual grouping sets |
---|---|
(a, (b, c), d) |
(a, b, c, d) |
(a, (b, c)) |
(a, b, c) |
(a) |
(a) |
() |
() |
When evaluating the results for a particular grouping set,
expressions that are not in the grouping set are aggregated and produce a
NULL
placeholder.
You can filter results by specific groupable items. To learn more, see the
GROUPING
function
ROLLUP
allows up to 4095 groupable items (equivalent to 4096 grouping sets).
Examples
The following queries produce the same subtotals and a grand total, but
the first one uses GROUP BY
with ROLLUP
and the second one doesn't:
-- GROUP BY with ROLLUP
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY ROLLUP (product_type, product_name)
ORDER BY product_type, product_name;
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| NULL | NULL | 42 |
| pants | NULL | 6 |
| pants | jeans | 6 |
| shirt | NULL | 36 |
| shirt | t-shirt | 11 |
| shirt | polo | 25 |
+--------------+--------------+-------------*/
-- GROUP BY without ROLLUP (produces the same results as ROLLUP)
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY product_type, product_name
UNION ALL
SELECT product_type, NULL, SUM(product_count)
FROM Products
GROUP BY product_type
UNION ALL
SELECT NULL, NULL, SUM(product_count) FROM Products
ORDER BY product_type, product_name;
You can include groupable item sets in a GROUP BY ROLLUP
clause.
In the following example, (product_type, product_name)
is a
groupable item set.
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY ROLLUP (product_type, (product_type, product_name))
ORDER BY product_type, product_name;
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| NULL | NULL | 42 |
| pants | NULL | 6 |
| pants | jeans | 6 |
| shirt | NULL | 36 |
| shirt | polo | 25 |
| shirt | t-shirt | 11 |
+--------------+--------------+-------------*/
Group rows by CUBE
GROUP BY CUBE ( grouping_list ) grouping_list: { groupable_item | groupable_item_set }[, ...] groupable_item_set: ( groupable_item[, ...] )
Description
The GROUP BY CUBE
clause produces aggregated data for all grouping set
permutations. A grouping set is a group of columns by which rows
can be grouped together. This clause is helpful if you need to create a
contingency table to find interrelationships
between items in a set of data.
Definitions
grouping_list
: A list of items that you can add to theGROUPING SETS
clause. This is used to create a generated list of grouping sets when the query is run.groupable_item
: Group rows in a table that share common values for certain columns. To learn more, see Group rows by groupable items. AnonymousSTRUCT
values are not allowed.groupable_item_set
: Group rows by a set of groupable items.
Details
GROUP BY CUBE
is similar to GROUP BY ROLLUP
, except that it takes a
grouping list and generates grouping sets from all permutations in this
list, including an empty grouping set. In the empty grouping set, all rows
are aggregated down to a single group.
For example, GROUP BY CUBE (a, b, c)
generates the following
grouping sets from the grouping list, a, b, c
, and then produces
aggregated rows for each of them:
(a, b, c)
(a, b)
(a, c)
(a)
(b, c)
(b)
(c)
()
Here is an example that includes groupable item sets in
GROUP BY CUBE (a, (b, c), d)
:
Conceptual grouping sets | Actual grouping sets |
---|---|
(a, (b, c), d) |
(a, b, c, d) |
(a, (b, c)) |
(a, b, c) |
(a, d) |
(a, d) |
(a) |
(a) |
((b, c), d) |
(b, c, d) |
((b, c)) |
(b, c) |
(d) |
(d) |
() |
() |
When evaluating the results for a particular grouping set,
expressions that are not in the grouping set are aggregated and produce a
NULL
placeholder.
You can filter results by specific groupable items. To learn more, see the
GROUPING
function
CUBE
allows up to 12 groupable items (equivalent to 4096 grouping sets).
Examples
The following query groups rows by all combinations of product_type
and
product_name
to produce a contingency table:
-- GROUP BY with CUBE
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY CUBE (product_type, product_name)
ORDER BY product_type, product_name;
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| NULL | NULL | 42 |
| NULL | jeans | 6 |
| NULL | polo | 25 |
| NULL | t-shirt | 11 |
| pants | NULL | 6 |
| pants | jeans | 6 |
| shirt | NULL | 36 |
| shirt | polo | 25 |
| shirt | t-shirt | 11 |
+--------------+--------------+-------------*/
You can include groupable item sets in a GROUP BY CUBE
clause.
In the following example, (product_type, product_name)
is a
groupable item set.
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT product_type, product_name, SUM(product_count) AS product_sum
FROM Products
GROUP BY CUBE (product_type, (product_type, product_name))
ORDER BY product_type, product_name;
/*--------------+--------------+-------------+
| product_type | product_name | product_sum |
+--------------+--------------+-------------+
| NULL | NULL | 42 |
| pants | NULL | 6 |
| pants | jeans | 6 |
| pants | jeans | 6 |
| shirt | NULL | 36 |
| shirt | polo | 25 |
| shirt | polo | 25 |
| shirt | t-shirt | 11 |
| shirt | t-shirt | 11 |
+--------------+--------------+-------------*/
HAVING
clause
HAVING bool_expression
The HAVING
clause filters the results produced by GROUP BY
or
aggregation. GROUP BY
or aggregation must be present in the query. If
aggregation is present, the HAVING
clause is evaluated once for every
aggregated row in the result set.
Only rows whose bool_expression
evaluates to TRUE
are included. Rows
whose bool_expression
evaluates to NULL
or FALSE
are
discarded.
The evaluation of a query with a HAVING
clause is typically completed in this
order:
FROM
WHERE
GROUP BY
and aggregationHAVING
WINDOW
QUALIFY
DISTINCT
ORDER BY
LIMIT
Evaluation order doesn't always match syntax order.
The HAVING
clause references columns available via the FROM
clause, as
well as SELECT
list aliases. Expressions referenced in the HAVING
clause
must either appear in the GROUP BY
clause or they must be the result of an
aggregate function:
SELECT LastName
FROM Roster
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
If a query contains aliases in the SELECT
clause, those aliases override names
in a FROM
clause.
SELECT LastName, SUM(PointsScored) AS ps
FROM Roster
GROUP BY LastName
HAVING ps > 0;
Mandatory aggregation
Aggregation doesn't have to be present in the HAVING
clause itself, but
aggregation must be present in at least one of the following forms:
Aggregation function in the SELECT
list.
SELECT LastName, SUM(PointsScored) AS total
FROM PlayerStats
GROUP BY LastName
HAVING total > 15;
Aggregation function in the HAVING
clause.
SELECT LastName
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
Aggregation in both the SELECT
list and HAVING
clause.
When aggregation functions are present in both the SELECT
list and HAVING
clause, the aggregation functions and the columns they reference don't need
to be the same. In the example below, the two aggregation functions,
COUNT()
and SUM()
, are different and also use different columns.
SELECT LastName, COUNT(*)
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
ORDER BY
clause
ORDER BY expression [{ ASC | DESC }] [{ NULLS FIRST | NULLS LAST }] [, ...]
The ORDER BY
clause specifies a column or expression as the sort criterion for
the result set. If an ORDER BY
clause is not present, the order of the results
of a query is not defined. Column aliases from a FROM
clause or SELECT
list
are allowed. If a query contains aliases in the SELECT
clause, those aliases
override names in the corresponding FROM
clause. The data type of
expression
must be orderable.
Optional Clauses
NULLS FIRST | NULLS LAST
:NULLS FIRST
: Sort null values before non-null values.NULLS LAST
. Sort null values after non-null values.
ASC | DESC
: Sort the results in ascending or descending order ofexpression
values.ASC
is the default value. If null ordering is not specified withNULLS FIRST
orNULLS LAST
:NULLS FIRST
is applied by default if the sort order is ascending.NULLS LAST
is applied by default if the sort order is descending.
Examples
Use the default sort order (ascending).
SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
SELECT 9, true UNION ALL
SELECT NULL, false)
ORDER BY x;
/*------+-------*
| x | y |
+------+-------+
| NULL | false |
| 1 | true |
| 9 | true |
*------+-------*/
Use the default sort order (ascending), but return null values last.
SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
SELECT 9, true UNION ALL
SELECT NULL, false)
ORDER BY x NULLS LAST;
/*------+-------*
| x | y |
+------+-------+
| 1 | true |
| 9 | true |
| NULL | false |
*------+-------*/
Use descending sort order.
SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
SELECT 9, true UNION ALL
SELECT NULL, false)
ORDER BY x DESC;
/*------+-------*
| x | y |
+------+-------+
| 9 | true |
| 1 | true |
| NULL | false |
*------+-------*/
Use descending sort order, but return null values first.
SELECT x, y
FROM (SELECT 1 AS x, true AS y UNION ALL
SELECT 9, true UNION ALL
SELECT NULL, false)
ORDER BY x DESC NULLS FIRST;
/*------+-------*
| x | y |
+------+-------+
| NULL | false |
| 9 | true |
| 1 | true |
*------+-------*/
It is possible to order by multiple columns. In the example below, the result
set is ordered first by SchoolID
and then by LastName
:
SELECT LastName, PointsScored, OpponentID
FROM PlayerStats
ORDER BY SchoolID, LastName;
When used in conjunction with
set operators,
the ORDER BY
clause applies to the result set of the entire query; it doesn't
apply only to the closest SELECT
statement. For this reason, it can be helpful
(though it is not required) to use parentheses to show the scope of the ORDER
BY
.
This query without parentheses:
SELECT