Legacy SQL Functions and Operators
This document details legacy SQL functions and operators. The preferred query syntax for BigQuery is GoogleSQL. For information on GoogleSQL, see GoogleSQL Functions and Operators.
Supported functions and operators
Most SELECT
statement clauses support functions. Fields
referenced in a function don't need to be listed in any SELECT
clause. Therefore, the following query is valid, even though the
clicks
field is not displayed directly:
#legacySQL SELECT country, SUM(clicks) FROM table GROUP BY country;
Aggregate functions | |
---|---|
AVG() |
Returns the average of the values for a group of rows ... |
BIT_AND() |
Returns the result of a bitwise AND operation ... |
BIT_OR() |
Returns the result of a bitwise OR operation ... |
BIT_XOR() |
Returns the result of a bitwise XOR operation ... |
CORR() |
Returns the Pearson correlation coefficient of a set of number pairs. |
COUNT() |
Returns the total number of values ... |
COUNT([DISTINCT]) |
Returns the total number of non-NULL values ... |
COVAR_POP() |
Computes the population covariance of the values ... |
COVAR_SAMP() |
Computes the sample covariance of the values ... |
EXACT_COUNT_DISTINCT() |
Returns the exact number of non-NULL, distinct values for the specified field. |
FIRST() |
Returns the first sequential value in the scope of the function. |
GROUP_CONCAT() |
Concatenates multiple strings into a single string ... |
GROUP_CONCAT_UNQUOTED() |
Concatenates multiple strings into a single string ... will not add double quotes ... |
LAST() |
Returns the last sequential value ... |
MAX() |
Returns the maximum value ... |
MIN() |
Returns the minimum value ... |
NEST() |
Aggregates all values in the current aggregation scope into a repeated field. |
NTH() |
Returns the nth sequential value ... |
QUANTILES() |
Computes approximate minimum, maximum, and quantiles ... |
STDDEV() |
Returns the standard deviation ... |
STDDEV_POP() |
Computes the population standard deviation ... |
STDDEV_SAMP() |
Computes the sample standard deviation ... |
SUM() |
Returns the sum total of the values ... |
TOP() ... COUNT(*) |
Returns the top max_records records by frequency. |
UNIQUE() |
Returns the set of unique, non-NULL values ... |
VARIANCE() |
Computes the variance of the values ... |
VAR_POP() |
Computes the population variance of the values ... |
VAR_SAMP() |
Computes the sample variance of the values ... |
Arithmetic operators | |
---|---|
+ |
Addition |
- |
Subtraction |
* |
Multiplication |
/ |
Division |
% |
Modulo |
Bitwise functions | |
---|---|
& |
Bitwise AND |
| |
Bitwise OR |
^ |
Bitwise XOR |
<< |
Bitwise shift left |
>> |
Bitwise shift right |
~ |
Bitwise NOT |
BIT_COUNT() |
Returns the number of bits ... |
Casting functions | |
---|---|
BOOLEAN() |
Cast to boolean. |
BYTES() |
Cast to bytes. |
CAST(expr AS type) |
Converts expr into a variable of type type . |
FLOAT() |
Cast to double. |
HEX_STRING() |
Cast to hexadecimal string. |
INTEGER() |
Cast to integer. |
STRING() |
Cast to string. |
Comparison functions | |
---|---|
expr1 = expr2 |
Returns true if the expressions are equal. |
expr1 != expr2 expr1 <> expr2
|
Returns true if the expressions are not equal. |
expr1 > expr2 |
Returns true if expr1 is greater than expr2 . |
expr1 < expr2 |
Returns true if expr1 is less than expr2 . |
expr1 >= expr2 |
Returns true if expr1 is greater than or equal to expr2 . |
expr1 <= expr2 |
Returns true if expr1 is less than or equal to expr2 . |
expr1 BETWEEN expr2 AND expr3 |
Returns true if the value of expr1
is between expr2 and expr3 , inclusive. |
expr IS NULL |
Returns true if expr is NULL. |
expr IN() |
Returns true if expr matches
expr1 , expr2 , or any value in the parentheses. |
COALESCE() |
Returns the first argument that isn't NULL. |
GREATEST() |
Returns the largest numeric_expr parameter. |
IFNULL() |
If argument is not null, returns the argument. |
IS_INF() |
Returns true if positive or negative infinity. |
IS_NAN() |
Returns true if argument is NaN . |
IS_EXPLICITLY_DEFINED() |
deprecated: Use expr IS NOT NULL instead. |
LEAST() |
Returns the smallest argument numeric_expr parameter. |
NVL() |
If expr is not null, returns expr , otherwise returns null_default . |
Date and time functions | |
---|---|
CURRENT_DATE() |
Returns current date in the format %Y-%m-%d . |
CURRENT_TIME() |
Returns the server's current time in the format %H:%M:%S . |
CURRENT_TIMESTAMP() |
Returns the server's current time in the format %Y-%m-%d %H:%M:%S . |
DATE() |
Returns the date in the format %Y-%m-%d . |
DATE_ADD() |
Adds the specified interval to a TIMESTAMP data type. |
DATEDIFF() |
Returns the number of days between two TIMESTAMP data types. |
DAY() |
Returns the day of the month as an integer between 1 and 31. |
DAYOFWEEK() |
Returns the day of the week as an integer between 1 (Sunday) and 7 (Saturday). |
DAYOFYEAR() |
Returns the day of the year as an integer between 1 and 366. |
FORMAT_UTC_USEC() |
Returns a UNIX timestamp in the format YYYY-MM-DD HH:MM:SS.uuuuuu . |
HOUR() |
Returns the hour of a TIMESTAMP as an integer between 0 and 23. |
MINUTE() |
Returns the minutes of a TIMESTAMP as an integer between 0 and 59. |
MONTH() |
Returns the month of a TIMESTAMP as an integer between 1 and 12. |
MSEC_TO_TIMESTAMP() |
Converts a UNIX timestamp in milliseconds to a TIMESTAMP. |
NOW() |
Returns the current UNIX timestamp in microseconds. |
PARSE_UTC_USEC() |
Converts a date string to a UNIX timestamp in microseconds. |
QUARTER() |
Returns the quarter of the year of a TIMESTAMP as an integer between 1 and 4. |
SEC_TO_TIMESTAMP() |
Converts a UNIX timestamp in seconds to a TIMESTAMP. |
SECOND() |
Returns the seconds of a TIMESTAMP as an integer between 0 and 59. |
STRFTIME_UTC_USEC() |
Returns a date string in the format date_format_str. |
TIME() |
Returns a TIMESTAMP in the format %H:%M:%S . |
TIMESTAMP() |
Convert a date string to a TIMESTAMP. |
TIMESTAMP_TO_MSEC() |
Converts a TIMESTAMP to a UNIX timestamp in milliseconds. |
TIMESTAMP_TO_SEC() |
Converts a TIMESTAMP to a UNIX timestamp in seconds. |
TIMESTAMP_TO_USEC() |
Converts a TIMESTAMP to a UNIX timestamp in microseconds. |
USEC_TO_TIMESTAMP() |
Converts a UNIX timestamp in microseconds to a TIMESTAMP. |
UTC_USEC_TO_DAY() |
Shifts a UNIX timestamp in microseconds to the beginning of the day it occurs in. |
UTC_USEC_TO_HOUR() |
Shifts a UNIX timestamp in microseconds to the beginning of the hour it occurs in. |
UTC_USEC_TO_MONTH() |
Shifts a UNIX timestamp in microseconds to the beginning of the month it occurs in. |
UTC_USEC_TO_WEEK() |
Returns a UNIX timestamp in microseconds that represents a day in the week. |
UTC_USEC_TO_YEAR() |
Returns a UNIX timestamp in microseconds that represents the year. |
WEEK() |
Returns the week of a TIMESTAMP as an integer between 1 and 53. |
YEAR() |
Returns the year of a TIMESTAMP. |
IP functions | |
---|---|
FORMAT_IP() |
Converts 32 least significant bits of integer_value to human-readable IPv4 address string. |
PARSE_IP() |
Converts a string representing IPv4 address to unsigned integer value. |
FORMAT_PACKED_IP() |
Returns a human-readable IP address in the form
10.1.5.23 or 2620:0:1009:1:216:36ff:feef:3f . |
PARSE_PACKED_IP() |
Returns an IP address in BYTES. |
JSON functions | |
---|---|
JSON_EXTRACT() |
Selects a value according to the JSONPath expression and returns a JSON string. |
JSON_EXTRACT_SCALAR() |
Selects a value according to the JSONPath expression and returns a JSON scalar. |
Logical operators | |
---|---|
expr AND expr |
Returns true if both expressions are true. |
expr OR expr |
Returns true if one or both expressions are true. |
NOT expr |
Returns true if the expression is false. |
Mathematical functions | |
---|---|
ABS() |
Returns the absolute value of the argument. |
ACOS() |
Returns the arc cosine of the argument. |
ACOSH() |
Returns the arc hyperbolic cosine of the argument. |
ASIN() |
Returns the arc sine of the argument. |
ASINH() |
Returns the arc hyperbolic sine of the argument. |
ATAN() |
Returns the arc tangent of the argument. |
ATANH() |
Returns the arc hyperbolic tangent of the argument. |
ATAN2() |
Returns the arc tangent of the two arguments. |
CEIL() |
Rounds the argument up to the nearest whole number and returns the rounded value. |
COS() |
Returns the cosine of the argument. |
COSH() |
Returns the hyperbolic cosine of the argument. |
DEGREES() |
Converts from radians to degrees. |
EXP() |
Returns e to the power of the argument. |
FLOOR() |
Rounds the argument down to the nearest whole number. |
LN() LOG()
|
Returns the natural logarithm of the argument. |
LOG2() |
Returns the Base-2 logarithm of the argument. |
LOG10() |
Returns the Base-10 logarithm of the argument. |
PI() |
Returns the constant π. |
POW() |
Returns first argument to the power of the second argument. |
RADIANS() |
Converts from degrees to radians. |
RAND() |
Returns a random float value in the range 0.0 <= value < 1.0. |
ROUND() |
Rounds the argument either up or down to the nearest whole number. |
SIN() |
Returns the sine of the argument. |
SINH() |
Returns the hyperbolic sine of the argument. |
SQRT() |
Returns the square root of the expression. |
TAN() |
Returns the tangent of the argument. |
TANH() |
Returns the hyperbolic tangent of the argument. |
Regular expression functions | |
---|---|
REGEXP_MATCH() |
Returns true if the argument matches the regular expression. |
REGEXP_EXTRACT() |
Returns the portion of the argument that matches the capturing group within the regular expression. |
REGEXP_REPLACE() |
Replaces a substring that matches a regular expression. |
String functions | |
---|---|
CONCAT() |
Returns the concatenation of two or more strings, or NULL if any of the values are NULL. |
expr CONTAINS 'str' |
Returns true if expr contains the specified string argument. |
INSTR() |
Returns the one-based index of the first occurrence of a string. |
LEFT() |
Returns the leftmost characters of a string. |
LENGTH() |
Returns the length of the string. |
LOWER() |
Returns the original string with all characters in lower case. |
LPAD() |
Inserts characters to the left of a string. |
LTRIM() |
Removes characters from the left side of a string. |
REPLACE() |
Replaces all occurrences of a substring. |
RIGHT() |
Returns the rightmost characters of a string. |
RPAD() |
Inserts characters to the right side of a string. |
RTRIM() |
Removes trailing characters from the right side of a string. |
SPLIT() |
Splits a string into repeated substrings. |
SUBSTR() |
Returns a substring ... |
UPPER() |
Returns the original string with all characters in upper case. |
Table wildcard functions | |
---|---|
TABLE_DATE_RANGE() |
Queries multiple daily tables that span a date range. |
TABLE_DATE_RANGE_STRICT() |
Queries multiple daily tables that span a date range, with no missing dates. |
TABLE_QUERY() |
Queries tables whose names match a specified predicate. |
URL functions | |
---|---|
HOST() |
Given a URL, returns the host name as a string. |
DOMAIN() |
Given a URL, returns the domain as a string. |
TLD() |
Given a URL, returns the top level domain plus any country domain in the URL. |
Window functions | |
---|---|
AVG() COUNT(*) COUNT([DISTINCT]) MAX() MIN() STDDEV() SUM() |
The same operation as the corresponding Aggregate functions, but are computed over a window defined by the OVER clause. |
CUME_DIST() |
Returns a double that indicates the cumulative distribution of a value in a group of values ... |
DENSE_RANK() |
Returns the integer rank of a value in a group of values. |
FIRST_VALUE() |
Returns the first value of the specified field in the window. |
LAG() |
Enables you to read data from a previous row within a window. |
LAST_VALUE() |
Returns the last value of the specified field in the window. |
LEAD() |
Enables you to read data from a following row within a window. |
NTH_VALUE() |
Returns the value of <expr> at position
<n> of the window frame ...
|
NTILE() |
Divides the window into the specified number of buckets. |
PERCENT_RANK() |
Returns the rank of the current row, relative to the other rows in the partition. |
PERCENTILE_CONT() |
Returns an interpolated value that would map to the percentile argument with respect to the window ... |
PERCENTILE_DISC() |
Returns the value nearest the percentile of the argument over the window. |
RANK() |
Returns the integer rank of a value in a group of values. |
RATIO_TO_REPORT() |
Returns the ratio of each value to the sum of the values. |
ROW_NUMBER() |
Returns the current row number of the query result over the window. |
Other functions | |
---|---|
CASE WHEN ... THEN |
Use CASE to choose among two or more alternate expressions in your query. |
CURRENT_USER() |
Returns the email address of the user running the query. |
EVERY() |
Returns true if the argument is true for all of its inputs. |
FROM_BASE64() |
Converts the base-64 encoded input string into BYTES format. |
HASH() |
Computes and returns a 64-bit signed hash value ... |
FARM_FINGERPRINT() |
Computes and returns a 64-bit signed fingerprint value ... |
IF() |
If first argument is true, returns second argument; otherwise returns third argument. |
POSITION() |
Returns the one-based, sequential position of the argument. |
SHA1() |
Returns a SHA1 hash, in BYTES format. |
SOME() |
Returns true if argument is true for at least one of its inputs. |
TO_BASE64() |
Converts the BYTES argument to a base-64 encoded string. |
Query syntax
Note: Keywords are not case-sensitive. In this document, keywords such
as SELECT
are capitalized for illustration purposes.
SELECT clause
The SELECT
clause specifies a list of expressions to be computed. Expressions in the
SELECT
clause can contain field names, literals, and
function calls (including aggregate functions
and window functions) as well as combinations of the three. The
expression list is comma-separated.
Each expression can be given an alias by adding a space followed by an identifier after the
expression. The optional AS
keyword can be added between the expression and the alias
for improved readability. Aliases defined in a SELECT
clause can be referenced in the
GROUP BY
, HAVING
, and ORDER BY
clauses of the query, but
not by the FROM
, WHERE
, or OMIT RECORD IF
clauses nor by
other expressions in the same SELECT
clause.
Notes:
-
If you use an aggregate function in your
SELECT
clause, you must either use an aggregate function in all expressions or your query must have aGROUP BY
clause which includes all non-aggregated fields in yourSELECT
clause as grouping keys. For example:#legacySQL SELECT word, corpus, COUNT(word) FROM [bigquery-public-data:samples.shakespeare] WHERE word CONTAINS "th" GROUP BY word, corpus; /* Succeeds because all non-aggregated fields are group keys. */
#legacySQL SELECT word, corpus, COUNT(word) FROM [bigquery-public-data:samples.shakespeare] WHERE word CONTAINS "th" GROUP BY word; /* Fails because corpus is not aggregated nor is it a group key. */
-
You can use square brackets to escape
reserved words
so that you can use them as field name and aliases. For example, if you have a column named
"partition", which is a reserved word in BigQuery syntax, the queries referencing
that field fail with obscure error messages unless you escape it with square brackets:
SELECT [partition] FROM ...
Example
This example defines aliases in the SELECT
clause and then references one of them in
the ORDER BY
clause. Notice that the word column can not be referenced using
the word_alias in the WHERE
clause; it must be referenced by name. The
len alias also is not visible in the WHERE
clause. It would be visible to a
HAVING
clause.
#legacySQL SELECT word AS word_alias, LENGTH(word) AS len FROM [bigquery-public-data:samples.shakespeare] WHERE word CONTAINS 'th' ORDER BY len;
WITHIN modifier for aggregate functions
aggregate_function WITHIN RECORD [ [ AS ] alias ]
The WITHIN
keyword causes the aggregate function to aggregate across repeated values
within each record. For every input record, exactly one aggregated output will be produced. This
type of aggregation is referred to as scoped aggregation. Since scoped aggregation
produces output for every record, non-aggregated expressions can be selected alongside
scoped-aggregated expressions without using a GROUP BY
clause.
Most commonly you will use the RECORD
scope when using scoped aggregation. If you
have a very complex nested, repeated schema, you may find a need to perform aggregations within
sub-record scopes. This can be done by replacing the RECORD
keyword in the syntax
above with the name of the node in your schema where you want the aggregation to be performed.
For more information about that advanced behavior, see
Dealing with data.
Example
This example performs a scoped COUNT
aggregation and then filters and sorts the
records by the aggregated value.
#legacySQL SELECT repository.url, COUNT(payload.pages.page_name) WITHIN RECORD AS page_count FROM [bigquery-public-data:samples.github_nested] HAVING page_count > 80 ORDER BY page_count DESC;
FROM clause
FROM [project_name:]datasetId.tableId [ [ AS ] alias ] | (subquery) [ [ AS ] alias ] |JOIN
clause |FLATTEN
clause | table wildcard function
The FROM
clause specifies the source data to be queried. BigQuery queries
can execute directly over tables, over subqueries, over joined tables, and over tables modified by
special-purpose operators described below. Combinations of these data sources can be queried using
the comma, which is the UNION ALL
operator in
BigQuery.
Referencing tables
When referencing a table, both datasetId and tableId must be specified; project_name is optional. If project_name is not specified, BigQuery defaults to the current project. If your project name includes a dash, you must surround the entire table reference with brackets.
Example
[my-dashed-project:dataset1.tableName]
Tables can be given an alias by adding a space followed by an identifier after the table name. The
optional AS
keyword can be added between the tableId and the alias for
improved readability.
When referencing columns from a table, you can use the simple column name or you can prefix the column name with either the alias, if you specified one, or with the datasetId and tableId as long as no project_name was specified. The project_name cannot be included in the column prefix because the colon character is not allowed in field names.
Examples
This example references a column with no table prefix.
#legacySQL SELECT word FROM [bigquery-public-data:samples.shakespeare];
This example prefixes the column name with the datasetId and tableId. Notice that the project_name cannot be included in this example. This method will only work if the dataset is in your current default project.
#legacySQL SELECT samples.shakespeare.word FROM samples.shakespeare;
This example prefixes the column name with a table alias.
#legacySQL SELECT t.word FROM [bigquery-public-data:samples.shakespeare] AS t;
Using subqueries
A subquery is a nested SELECT
statement wrapped in parentheses. The
expressions computed in the SELECT
clause of the subquery are available to the outer
query just as columns of a table would be available.
Subqueries can be used to compute aggregations and other expressions. The full range of SQL operators are available in the subquery. This means a subquery can itself contain other subqueries, subqueries can perform joins and grouping aggregations, etc.
Comma as UNION ALL
Unlike GoogleSQL, legacy SQL uses the comma as a UNION ALL
operator rather
than a CROSS JOIN
operator. This is a legacy behavior that evolved because
historically BigQuery did not support CROSS JOIN
and BigQuery users regularly needed to write
UNION ALL
queries. In GoogleSQL, queries that perform unions are particularly
verbose. Using the comma as the union operator allows such queries to be written much more
efficiently. For example, this query can be used to run a single query over logs from multiple
days.
#legacySQL SELECT FORMAT_UTC_USEC(event.timestamp_in_usec) AS time, request_url FROM [applogs.events_20120501], [applogs.events_20120502], [applogs.events_20120503] WHERE event.username = 'root' AND NOT event.source_ip.is_internal;
Queries that union a large number of tables typically run more slowly than queries that process the same amount of data from a single table. The difference in performance can be up to 50 ms per additional table. A single query can union at most 1,000 tables.
Table wildcard functions
The term table wildcard function refers to a special type of function unique to BigQuery.
These functions are used in the FROM
clause to match a collection of table names
using one of several types of filters. For example, the TABLE_DATE_RANGE
function
can be used to query only a specific set of daily tables. For more information on these functions,
see Table wildcard functions.
FLATTEN operator
(FLATTEN([project_name:]datasetId.tableId, field_to_be_flattened)) (FLATTEN((subquery), field_to_be_flattened))
Unlike typical SQL-processing systems, BigQuery is designed to handle repeated data. Because of
this, BigQuery users sometimes need to write queries that manipulate the structure of repeated
records. One way to do this is by using the FLATTEN
operator.
FLATTEN
converts one node in the schema from repeated to optional. Given a record
with one or more values for a repeated field, FLATTEN
will create multiple records,
one for each value in the repeated field. All other fields selected from the record are duplicated
in each new output record. FLATTEN
can be applied repeatedly in order to remove
multiple levels of repetition.
For more information and examples, see Dealing with data.
JOIN operator
BigQuery supports multiple JOIN
operators in each FROM
clause.
Subsequent JOIN
operations use the results of the previous JOIN
operation as the left JOIN
input. Fields from any preceding JOIN
input
can be used as keys in the ON
clauses of subsequent JOIN
operators.
JOIN types
BigQuery supports INNER
, [FULL|RIGHT|LEFT] OUTER
and
CROSS JOIN
operations. If left unspecified, the default is INNER
.
CROSS JOIN
operations do not allow ON
clauses. CROSS JOIN
can return a large amount of data and might result in a slow and inefficient query or in a query
that exceeds the maximum allowed per-query resources. Such queries will fail with an error. When
possible, prefer queries that do not use CROSS JOIN
. For example, CROSS JOIN
is often used in places where window functions would
be more efficient.
EACH modifier
The EACH
modifier is a hint that tells BigQuery to execute the JOIN
using multiple partitions. This is particularly useful when you know that both sides of the
JOIN
are large. The EACH
modifier can't be used in
CROSS JOIN
clauses.
EACH
used to be encouraged in many cases, but this is no longer the case. When
possible, use JOIN
without the EACH
modifier for better performance.
Use JOIN EACH
when your query has failed with a resources exceeded error message.
Semi-join and Anti-join
In addition to supporting JOIN
in the FROM
clause, BigQuery also
supports two types of joins in the WHERE
clause: semi-join and anti-semi-join. A
semi-join is specified using the IN
keyword with a subquery; anti-join, using the
NOT IN
keywords.
Examples
The following query uses a semi-join to find ngrams where the first word in the ngram is also the second word in another ngram that has "AND" as the third word in the ngram.
#legacySQL SELECT ngram FROM [bigquery-public-data:samples.trigrams] WHERE first IN (SELECT second FROM [bigquery-public-data:samples.trigrams] WHERE third = "AND") LIMIT 10;
The following query uses a semi-join to return the number of women over age 50 who gave birth in the 10 states with the most births.
#legacySQL SELECT mother_age, COUNT(mother_age) total FROM [bigquery-public-data:samples.natality] WHERE state IN (SELECT state FROM (SELECT state, COUNT(state) total FROM [bigquery-public-data:samples.natality] GROUP BY state ORDER BY total DESC LIMIT 10)) AND mother_age > 50 GROUP BY mother_age ORDER BY mother_age DESC
To see the numbers for the other 40 states, you can use an anti-join. The following query is
nearly identical to the previous example, but uses NOT IN
instead of IN
to return the number of women over age 50 who gave birth in the 40 states with the least births.
#legacySQL SELECT mother_age, COUNT(mother_age) total FROM [bigquery-public-data:samples.natality] WHERE state NOT IN (SELECT state FROM (SELECT state, COUNT(state) total FROM [bigquery-public-data:samples.natality] GROUP BY state ORDER BY total DESC LIMIT 10)) AND mother_age > 50 GROUP BY mother_age ORDER BY mother_age DESC
Notes:
- BigQuery does not support correlated semi- or anti-semi-joins. The subquery can not reference any fields from the outer query.
- The subquery used in a semi- or anti-semi-join must select exactly one field.
-
The types of the selected field and the field being used from the outer query in the
WHERE
clause must match exactly. BigQuery will not do any type coercion for semi- or anti-semi-joins.
WHERE clause
The WHERE
clause, sometimes called the predicate, filters records produced by the
FROM
clause using a boolean expression. Multiple conditions can be joined by boolean
AND
and OR
clauses, optionally surrounded by parentheses—()—
to group them. The fields listed in a WHERE
clause do not need to be selected in the
corresponding SELECT
clause and the WHERE
clause expression cannot
reference expressions computed in the SELECT
clause of the query to which the
WHERE
clause belongs.
Note: Aggregate functions cannot be used in the WHERE
clause. Use a
HAVING
clause and an outer query if you need to filter on the
output of an aggregate function.
Example
The following example uses a disjunction of boolean expressions in the WHERE
clause—the two expressions joined by an OR
operator. An input record will pass
through the WHERE
filter if either of the expressions returns true
.
#legacySQL SELECT word FROM [bigquery-public-data:samples.shakespeare] WHERE (word CONTAINS 'prais' AND word CONTAINS 'ing') OR (word CONTAINS 'laugh' AND word CONTAINS 'ed');
OMIT RECORD IF clause
The OMIT RECORD IF
clause is a construct that is unique to BigQuery. It is
particularly useful for dealing with nested, repeated schemas. It is similar to a WHERE
clause, but different in two important ways. First, it uses an exclusionary condition,
which means that records are omitted if the expression returns true
, but kept if the
expression returns false
or null
. Second, the OMIT RECORD IF
clause can (and usually does) use scoped aggregate functions in its condition.
In addition to filtering full records, OMIT...IF
can specify a more narrow scope
to filter just portions of a record. This is done by using the name of a non-leaf node in your
schema rather than RECORD
in your OMIT...IF
clause. This functionality
is rarely used by BigQuery users. You can find more documentation about this advanced behavior
linked from the WITHIN
documentation above.
If you use OMIT...IF
to exclude a portion of a record in a repeating field, and the query also
selects other independently repeating fields, BigQuery omits a
portion of the other repeated records in the query. If you see the error
Cannot perform OMIT IF on repeated scope <scope> with independently repeating pass through field <field>,
we recommend that you switch to GoogleSQL. For information about migrating
OMIT...IF
statements to GoogleSQL, see
Migrating
to GoogleSQL.
Example
Referring back to the example used for the WITHIN
modifier, OMIT RECORD IF
can be used to accomplish the same thing WITHIN
and HAVING
were
used to do in that example.
#legacySQL SELECT repository.url FROM [bigquery-public-data:samples.github_nested] OMIT RECORD IF COUNT(payload.pages.page_name) <= 80;
GROUP BY clause
The GROUP BY
clause lets you group rows that have the same values for a given
field or set of fields so that you can compute aggregations of related fields. Grouping occurs
after the filtering performed in the WHERE
clause but before the expressions in the
SELECT
clause are computed. The expression results cannot be used as group keys in
the GROUP BY
clause.
Example
This query finds the top ten most common first words in the trigrams sample dataset.
In addition to demonstrating the use of the GROUP BY
clause, it demonstrates how
positional indexes can be used instead of field names in the GROUP BY
and
ORDER BY
clauses.
#legacySQL SELECT first, COUNT(ngram) FROM [bigquery-public-data:samples.trigrams] GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
Aggregation performed using a GROUP BY
clause is called grouped aggregation
. Unlike scoped aggregation, grouped aggregation is
common in most SQL processing systems.
The EACH
modifier
The EACH
modifier is a hint that tells BigQuery to execute the GROUP BY
using multiple partitions. This is particularly useful when you know that your dataset contains a
large number of distinct values for the group keys.
EACH
used to be encouraged in many cases, but this is no longer the case.
Using GROUP BY
without the EACH
modifier usually provides better performance.
Use GROUP EACH BY
when your query has failed with a resources exceeded error message.
The ROLLUP
function
When the ROLLUP
function is used, BigQuery adds extra rows to the query result that
represent rolled up aggregations. All fields listed after ROLLUP
must be
enclosed in a single set of parentheses. In rows added because of the ROLLUP
function, NULL
indicates the columns for which the aggregation is rolled up.
Example
This query generates per-year counts of male and female births from the sample natality dataset.
#legacySQL SELECT year, is_male, COUNT(1) as count FROM [bigquery-public-data:samples.natality] WHERE year >= 2000 AND year <= 2002 GROUP BY ROLLUP(year, is_male) ORDER BY year, is_male;
These are the results of the query. Notice that there are rows where one or both of the group keys
are NULL
. These rows are the rollup rows.
+------+---------+----------+ | year | is_male | count | +------+---------+----------+ | NULL | NULL | 12122730 | | 2000 | NULL | 4063823 | | 2000 | false | 1984255 | | 2000 | true | 2079568 | | 2001 | NULL | 4031531 | | 2001 | false | 1970770 | | 2001 | true | 2060761 | | 2002 | NULL | 4027376 | | 2002 | false | 1966519 | | 2002 | true | 2060857 | +------+---------+----------+
When using the ROLLUP
function, you can use the GROUPING
function
to distinguish between rows that were added because of the ROLLUP
function and rows
that actually have a NULL
value for the group key.
Example
This query adds the GROUPING
function to the previous example to better identify the
rows added because of the ROLLUP
function.
#legacySQL SELECT year, GROUPING(year) as rollup_year, is_male, GROUPING(is_male) as rollup_gender, COUNT(1) as count FROM [bigquery-public-data:samples.natality] WHERE year >= 2000 AND year <= 2002 GROUP BY ROLLUP(year, is_male) ORDER BY year, is_male;
These are the result the new query returns.
+------+-------------+---------+---------------+----------+ | year | rollup_year | is_male | rollup_gender | count | +------+-------------+---------+---------------+----------+ | NULL | 1 | NULL | 1 | 12122730 | | 2000 | 0 | NULL | 1 | 4063823 | | 2000 | 0 | false | 0 | 1984255 | | 2000 | 0 | true | 0 | 2079568 | | 2001 | 0 | NULL | 1 | 4031531 | | 2001 | 0 | false | 0 | 1970770 | | 2001 | 0 | true | 0 | 2060761 | | 2002 | 0 | NULL | 1 | 4027376 | | 2002 | 0 | false | 0 | 1966519 | | 2002 | 0 | true | 0 | 2060857 | +------+-------------+---------+---------------+----------+
Notes:
-
Non-aggregated fields in the
SELECT
clause must be listed in theGROUP BY
clause.#legacySQL SELECT word, corpus, COUNT(word) FROM [bigquery-public-data:samples.shakespeare] WHERE word CONTAINS "th" GROUP BY word, corpus; /* Succeeds because all non-aggregated fields are group keys. */
#legacySQL SELECT word, corpus, COUNT(word) FROM [bigquery-public-data:samples.shakespeare] WHERE word CONTAINS "th" GROUP BY word; /* Fails because corpus is not aggregated nor is it a group key. */
-
Expressions computed in the
SELECT
clause cannot be used in the correspondingGROUP BY
clause.#legacySQL SELECT word, corpus, COUNT(word) word_count FROM [bigquery-public-data:samples.shakespeare] WHERE word CONTAINS "th" GROUP BY word, corpus, word_count; /* Fails because word_count is not visible to this
GROUP BY
clause. */ - Grouping by float and double values is not supported, because the equality function for those types is not well-defined.
-
Because the system is interactive, queries that produce a large number of groups might fail. The
use of the
TOP
function instead ofGROUP BY
might solve some scaling problems.
HAVING clause
The HAVING
clause behaves exactly like the WHERE
clause except that it is evaluated after the SELECT
clause so the results of all
computed expressions are visible to the HAVING
clause. The HAVING clause can only
refer to outputs of the corresponding SELECT
clause.
Example
This query computes the most common first words in the ngram sample dataset that contain the letter a and occur at most 10,000 times.
#legacySQL SELECT first, COUNT(ngram) ngram_count FROM [bigquery-public-data:samples.trigrams] GROUP BY 1 HAVING first contains "a" AND ngram_count < 10000 ORDER BY 2 DESC LIMIT 10;
ORDER BY clause
The ORDER BY
clause sorts the results of a query in ascending or descending order
using one or more key fields. To sort by multiple fields or aliases, enter them as a
comma-separated list. The results are sorted on the fields in the order in which they are listed.
Use DESC
(descending) or ASC
(ascending) to specify the sort direction.
ASC
is the default. A different sort direction can be specified for each sort key.
The ORDER BY
clause is evaluated after the SELECT
clause so it can
reference the output of any expression computed in the SELECT
. If a field is given
an alias in the SELECT
clause, the alias must be used in the ORDER BY
clause.
LIMIT clause
The LIMIT
clause limits the number of rows in the returned result set. Since BigQuery
queries regularly operate over very large numbers of rows, LIMIT
is a good way to
avoid long-running queries by processing only a subset of the rows.
Notes:
-
The
LIMIT
clause will stop processing and return results when it satisfies your requirements. This can reduce processing time for some queries, but when you specify aggregate functions such as COUNT orORDER BY
clauses, the full result set must still be processed before returning results. TheLIMIT
clause is the last to be evaluated. -
A query with a
LIMIT
clause may still be non-deterministic if there is no operator in the query that guarantees the ordering of the output result set. This is because BigQuery executes using a large number of parallel workers. The order in which parallel jobs return is not guaranteed. -
The
LIMIT
clause cannot contain any functions; it takes only a numeric constant.
Query grammar
The individual clauses of BigQuery SELECT
statements are described in detail
above. Here we present the full grammar of SELECT
statements in a compact form with links back to the individual sections.
query: SELECT { * | field_path.* | expression } [ [ AS ] alias ] [ , ... ] [ FROM from_body [ WHERE bool_expression ] [ OMIT RECORD IF bool_expression] [ GROUP [ EACH ] BY [ ROLLUP ] { field_name_or_alias } [ , ... ] ] [ HAVING bool_expression ] [ ORDER BY field_name_or_alias [ { DESC | ASC } ] [, ... ] ] [ LIMIT n ] ]; from_body: { from_item [, ...] | # Warning: Comma means UNION ALL here from_item [ join_type ] JOIN [ EACH ] from_item [ ON join_predicate ] | (FLATTEN({ table_name | (query) }, field_name_or_alias)) | table_wildcard_function } from_item: { table_name | (query) } [ [ AS ] alias ] join_type: { INNER | [ FULL ] [ OUTER ] | RIGHT [ OUTER ] | LEFT [ OUTER ] | CROSS } join_predicate: field_from_one_side_of_the_join = field_from_the_other_side_of_the_join [ AND ...] expression: { literal_value | field_name_or_alias | function_call } bool_expression: { expression_which_results_in_a_boolean_value | bool_expression AND bool_expression | bool_expression OR bool_expression | NOT bool_expression }
Notation:
- Square brackets "[ ]" indicate optional clauses.
- Curly braces "{ }" enclose a set of options.
- The vertical bar "|" indicates a logical OR.
- A comma or keyword followed by an ellipsis within square brackets "[, ... ]" indicates that the preceding item can repeat in a list with the specified separator.
- Parentheses "( )" indicate literal parentheses.
Aggregate functions
Aggregate functions return values that represent summaries of larger sets of data, which makes these functions particularly useful for analyzing logs. An aggregate function operates against a collection of values and returns a single value per table, group, or scope:
- Table aggregation
Uses an aggregate function to summarize all qualifying rows in the table. For example:
SELECT COUNT(f1) FROM ds.Table;
- Group aggregation
Uses an aggregate function and a
GROUP BY
clause that specifies a non-aggregated field to summarize rows by group. For example:SELECT COUNT(f1) FROM ds.Table GROUP BY b1;
The TOP function represents a specialized case of group aggregation.
- Scoped aggregation
This feature applies only to tables that have nested fields.
Uses an aggregate function and theWITHIN
keyword to aggregate repeated values within a defined scope. For example:SELECT COUNT(m1.f2) WITHIN RECORD FROM Table;
The scope can be
RECORD
, which corresponds to entire row, or a node (repeated field in a row). Aggregation functions operate over the values within the scope and return aggregated results for each record or node.
You can apply a restriction to an aggregate function using one of the following options:
-
An alias in a subselect query. The restriction is specified in the outer
WHERE
clause.#legacySQL SELECT corpus, count_corpus_words FROM (SELECT corpus, count(word) AS count_corpus_words FROM [bigquery-public-data:samples.shakespeare] GROUP BY corpus) AS sub_shakespeare WHERE count_corpus_words > 4000
-
An alias in a HAVING clause.
#legacySQL SELECT corpus, count(word) AS count_corpus_words FROM [bigquery-public-data:samples.shakespeare] GROUP BY corpus HAVING count_corpus_words > 4000;
You can also refer to an alias in the GROUP BY
or ORDER BY
clauses.
Syntax
Aggregate functions | |
---|---|
AVG() |
Returns the average of the values for a group of rows ... |
BIT_AND() |
Returns the result of a bitwise AND operation ... |
BIT_OR() |
Returns the result of a bitwise OR operation ... |
BIT_XOR() |
Returns the result of a bitwise XOR operation ... |
CORR() |
Returns the Pearson correlation coefficient of a set of number pairs. |
COUNT() |
Returns the total number of values ... |
COUNT([DISTINCT]) |
Returns the total number of non-NULL values ... |
COVAR_POP() |
Computes the population covariance of the values ... |
COVAR_SAMP() |
Computes the sample covariance of the values ... |
EXACT_COUNT_DISTINCT() |
Returns the exact number of non-NULL, distinct values for the specified field. |
FIRST() |
Returns the first sequential value in the scope of the function. |
GROUP_CONCAT() |
Concatenates multiple strings into a single string ... |
GROUP_CONCAT_UNQUOTED() |
Concatenates multiple strings into a single string ... will not add double quotes ... |
LAST() |
Returns the last sequential value ... |
MAX() |
Returns the maximum value ... |
MIN() |
Returns the minimum value ... |
NEST() |
Aggregates all values in the current aggregation scope into a repeated field. |
NTH() |
Returns the nth sequential value ... |
QUANTILES() |
Computes approximate minimum, maximum, and quantiles ... |
STDDEV() |
Returns the standard deviation ... |
STDDEV_POP() |
Computes the population standard deviation ... |
STDDEV_SAMP() |
Computes the sample standard deviation ... |
SUM() |
Returns the sum total of the values ... |
TOP() ... COUNT(*) |
Returns the top max_records records by frequency. |
UNIQUE() |
Returns the set of unique, non-NULL values ... |
VARIANCE() |
Computes the variance of the values ... |
VAR_POP() |
Computes the population variance of the values ... |
VAR_SAMP() |
Computes the sample variance of the values ... |
AVG(numeric_expr)
- Returns the average of the values for a group of rows computed by
numeric_expr
. Rows with a NULL value are not included in the calculation. BIT_AND(numeric_expr)
- Returns the result of a bitwise
AND
operation between each instance ofnumeric_expr
across all rows.NULL
values are ignored. This function returnsNULL
if all instances ofnumeric_expr
evaluate toNULL
. BIT_OR(numeric_expr)
- Returns the result of a bitwise
OR
operation between each instance ofnumeric_expr
across all rows.NULL
values are ignored. This function returnsNULL
if all instances ofnumeric_expr
evaluate toNULL
. BIT_XOR(numeric_expr)
- Returns the result of a bitwise
XOR
operation between each instance ofnumeric_expr
across all rows.NULL
values are ignored. This function returnsNULL
if all instances ofnumeric_expr
evaluate toNULL
. CORR(numeric_expr, numeric_expr)
- Returns the Pearson correlation coefficient of a set of number pairs.
COUNT(*)
- Returns the total number of values (NULL and non-NULL) in the scope of the function. Unless you are using
COUNT(*)
with theTOP
function, it is better to explicitly specify the field to count. COUNT([DISTINCT] field [, n])
- Returns the total number of non-NULL values in the scope of the function.
If you use the
DISTINCT
keyword, the function returns the number of distinct values for the specified field. Note that the returned value forDISTINCT
is a statistical approximation and is not guaranteed to be exact.Use
EXACT_COUNT_DISTINCT()
for an exact answer.If you require greater accuracy from
, you can specify a second parameter,COUNT(DISTINCT)
n
, which gives the threshold below which exact results are guaranteed. By default,n
is 1000, but if you give a largern
, you will get exact results forCOUNT(DISTINCT)
up to that value ofn
. However, giving larger values ofn
will reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using
GROUP EACH BY
on the relevant field(s) and then applyingCOUNT(*)
. TheGROUP EACH BY
approach is more scalable but might incur a slight up-front performance penalty. COVAR_POP(numeric_expr1, numeric_expr2)
- Computes the population covariance of the values computed by
numeric_expr1
andnumeric_expr2
. COVAR_SAMP(numeric_expr1, numeric_expr2)
- Computes the sample covariance of the values computed by
numeric_expr1
andnumeric_expr2
. EXACT_COUNT_DISTINCT(field)
- Returns the exact number of non-NULL, distinct values for the specified field. For better scalability and performance, use COUNT(DISTINCT field).
FIRST(expr)
- Returns the first sequential value in the scope of the function.
GROUP_CONCAT('str' [, separator])
-
Concatenates multiple strings into a single string, where each value is separated by the optional
separator
parameter. Ifseparator
is omitted, BigQuery returns a comma-separated string.If a string in the source data contains a double quote character,
GROUP_CONCAT
returns the string with double quotes added. For example, the stringa"b
would return as"a""b"
. UseGROUP_CONCAT_UNQUOTED
if you prefer that these strings do not return with double quotes added.Example:
#legacySQL SELECT GROUP_CONCAT(x) FROM ( SELECT 'a"b' AS x), ( SELECT 'cd' AS x);
GROUP_CONCAT_UNQUOTED('str' [, separator])
-
Concatenates multiple strings into a single string, where each value is separated by the optional
separator
parameter. Ifseparator
is omitted, BigQuery returns a comma-separated string.Unlike
GROUP_CONCAT
, this function will not add double quotes to returned values that include a double quote character. For example, the stringa"b
would return asa"b
.Example:
#legacySQL SELECT GROUP_CONCAT_UNQUOTED(x) FROM ( SELECT 'a"b' AS x), ( SELECT 'cd' AS x);
LAST(field)
- Returns the last sequential value in the scope of the function.
MAX(field)
- Returns the maximum value in the scope of the function.
MIN(field)
- Returns the minimum value in the scope of the function.
NEST(expr)
-
Aggregates all values in the current aggregation scope into a repeated field. For example, the query
"SELECT x, NEST(y) FROM ... GROUP BY x"
returns one output record for each distinctx
value, and contains a repeated field for ally
values paired withx
in the query input. TheNEST
function requires aGROUP BY
clause.BigQuery automatically flattens query results, so if you use the
NEST
function on the top level query, the results won't contain repeated fields. Use theNEST
function when using a subselect that produces intermediate results for immediate use by the same query. NTH(n, field)
- Returns the
n
th sequential value in the scope of the function, wheren
is a constant. TheNTH
function starts counting at 1, so there is no zeroth term. If the scope of the function has less thann
values, the function returnsNULL
. QUANTILES(expr[, buckets])
-
Computes approximate minimum, maximum, and quantiles for the input expression.
NULL
input values are ignored. Empty or exclusively-NULL
input results inNULL
output. The number of quantiles computed is controlled with the optionalbuckets
parameter, which includes the minimum and maximum in the count. To compute approximate N-tiles, use N+1buckets
. The default value ofbuckets
is 100. (Note: The default of 100 does not estimate percentiles. To estimate percentiles, use 101buckets
at minimum.) If specified explicitly,buckets
must be at least 2.The fractional error per quantile is epsilon = 1 /
buckets
, which means that the error decreases as the number of buckets increases. For example:QUANTILES(<expr>, 2) # computes min and max with 50% error. QUANTILES(<expr>, 3) # computes min, median, and max with 33% error. QUANTILES(<expr>, 5) # computes quartiles with 25% error. QUANTILES(<expr>, 11) # computes deciles with 10% error. QUANTILES(<expr>, 21) # computes vigintiles with 5% error. QUANTILES(<expr>, 101) # computes percentiles with 1% error.
The
NTH
function can be used to pick a particular quantile, but remember thatNTH
is 1-based, and thatQUANTILES
returns the minimum ("0th" quantile) in the first position, and the maximum ("100th" percentile or "Nth" N-tile) in the last position. For example,NTH(11, QUANTILES(expr, 21))
estimates the median ofexpr
, whereasNTH(20, QUANTILES(expr, 21))
estimates the 19th vigintile (95th percentile) ofexpr
. Both estimates have a 5% margin of error.To improve accuracy, use more buckets. For example, to reduce the margin of error for the previous calculations from 5% to 0.1%, use 1001 buckets instead of 21, and adjust the argument to the
NTH
function accordingly. To calculate the median with 0.1% error, useNTH(501, QUANTILES(expr, 1001))
; for the 95th percentile with 0.1% error, useNTH(951, QUANTILES(expr, 1001))
. STDDEV(numeric_expr)
- Returns the standard deviation of the values computed by
numeric_expr
. Rows with a NULL value are not included in the calculation. TheSTDDEV
function is an alias forSTDDEV_SAMP
. STDDEV_POP(numeric_expr)
- Computes the population standard deviation of the value computed by
numeric_expr
. UseSTDDEV_POP()
to compute the standard deviation of a dataset that encompasses the entire population of interest. If your dataset comprises only a representative sample of the population, useSTDDEV_SAMP()
instead. For more information about population versus sample standard deviation, see Standard deviation on Wikipedia. STDDEV_SAMP(numeric_expr)
- Computes the sample standard deviation of the value computed by
numeric_expr
. UseSTDDEV_SAMP()
to compute the standard deviation of an entire population based on a representative sample of the population. If your dataset comprises the entire population, useSTDDEV_POP()
instead. For more information about population versus sample standard deviation, see Standard deviation on Wikipedia. SUM(field)
- Returns the sum total of the values in the scope of the function. For use with numerical data types only.
TOP(field|alias[, max_values][,multiplier]) ... COUNT(*)
- Returns the top max_records records by frequency. See the TOP description below for details.
UNIQUE(expr)
- Returns the set of unique, non-NULL values in the scope of the function in an undefined order. Similar to a large
GROUP BY
clause without theEACH
keyword, the query will fail with a "Resources Exceeded" error if there are too many distinct values. UnlikeGROUP BY
, however, theUNIQUE
function can be applied with scoped aggregation, allowing efficient operation on nested fields with a limited number of values. VARIANCE(numeric_expr)
- Computes the variance of the values computed by
numeric_expr
. Rows with a NULL value are not included in the calculation. TheVARIANCE
function is an alias forVAR_SAMP
. VAR_POP(numeric_expr)
- Computes the population variance of the values computed by
numeric_expr
. For more information about population versus sample standard deviation, see Standard deviation on Wikipedia. VAR_SAMP(numeric_expr)
- Computes the sample variance of the values computed by
numeric_expr
. For more information about population versus sample standard deviation, see Standard deviation on Wikipedia.
TOP() function
TOP is a function that is an alternative to the GROUP BY clause. It is used as simplified syntax for GROUP BY ... ORDER BY ... LIMIT ...
. Generally, the TOP function performs faster than the full ... GROUP BY ... ORDER BY ... LIMIT ...
query, but may only return approximate results. The following is the syntax for the TOP function:
TOP(field|alias[, max_values][,multiplier]) ... COUNT(*)
When using TOP in a SELECT
clause, you must include COUNT(*)
as one of the fields.
A query that uses the TOP() function can return only two fields: the TOP field, and the COUNT(*) value.
field|alias
- The field or alias to return.
max_values
- [Optional] The maximum number of results to return. Default is 20.
multiplier
- A positive integer that increases the value(s) returned by
COUNT(*)
by the multiple specified.
TOP() examples
-
Basic example queries that use
TOP()
The following queries use
TOP()
to return 10 rows.Example 1:
#legacySQL SELECT TOP(word, 10) as word, COUNT(*) as cnt FROM [bigquery-public-data:samples.shakespeare] WHERE word CONTAINS "th";
Example 2:
#legacySQL SELECT word, left(word, 3) FROM (SELECT TOP(word, 10) AS word, COUNT(*) FROM [bigquery-public-data:samples.shakespeare] WHERE word CONTAINS "th");
-
Compare
TOP()
toGROUP BY...ORDER BY...LIMIT
The query returns, in order, the top 10 most frequently used words containing "th", and the number of documents the words was used in. The
TOP
query will execute much faster:Example without
TOP()
:#legacySQL SELECT word, COUNT(*) AS cnt FROM ds.Table WHERE word CONTAINS 'th' GROUP BY word ORDER BY cnt DESC LIMIT 10;
Example with
TOP()
:#legacySQL SELECT TOP(word, 10), COUNT(*) FROM ds.Table WHERE word contains 'th';
-
Using the
multiplier
parameter.The following queries show how the
multiplier
parameter affects the query result. The first query returns the number of births per month in Wyoming. The second query uses tomultiplier
parameter to multiply thecnt
values by 100.Example without the
multiplier
parameter:#legacySQL SELECT TOP(month,3) as month, COUNT(*) as cnt FROM [bigquery-public-data:samples.natality] WHERE state = "WY";
Returns:
+-------+-------+ | month | cnt | +-------+-------+ | 7 | 19594 | | 5 | 19038 | | 8 | 19030 | +-------+-------+
Example with the
multiplier
parameter:#legacySQL SELECT TOP(month,3,100) as month, COUNT(*) as cnt FROM [bigquery-public-data:samples.natality] WHERE state = "WY";
Returns:
+-------+---------+ | month | cnt | +-------+---------+ | 7 | 1959400 | | 5 | 1903800 | | 8 | 1903000 | +-------+---------+
Note: You must include COUNT(*)
in the SELECT
clause to use TOP
.