A BigQuery statement comprises a series of tokens. Tokens include identifiers, quoted identifiers, literals, keywords, operators, and special characters. You can separate tokens with whitespace (space, backspace, tab, newline) or comments.
Identifiers are names that are associated with columns, tables, and other database objects.
Identifiers must begin with a letter or an underscore. Subsequent characters can be letters, numbers, or underscores. Quoted identifiers are identifiers enclosed by backtick (`) characters and can contain any character, such as spaces or symbols. However, quoted identifiers cannot be empty. Reserved Keywords can only be used as identifiers if enclosed by backticks.
Syntax (defined here as a regular expression):
Customers5 _dataField1 ADGROUP
5Customers _dataField! GROUP
5Customers begins with a number, not a letter or underscore.
contains the special character "!" which is not a letter, number, or underscore.
GROUP is a reserved keyword, and therefore cannot be used as an identifier
without being enclosed by backtick characters.
Both identifiers and quoted identifiers are case insensitive, with some nuances. See Case Sensitivity for further details.
Quoted identifiers have the same escape sequences as string literals, defined below.
A literal represents a constant value of a built-in data type. Some, but not all, data types can be expressed as literals.
String and Bytes Literals
Both string and bytes literals must be quoted, either with single (
") quotation marks, or triple-quoted with groups of three single
''') or three double (
""") quotation marks.
||Quoted strings enclosed by single (
Quoted strings cannot contain newlines, even when preceded by a backslash (
||Embedded newlines and quotes are allowed without escaping - see fourth example.
A trailing unescaped backslash (
End the string with three unescaped quotes in a row that match the starting quotes.
||Quoted or triple-quoted literals that have the raw string literal prefix (
Backslash characters (
A raw string cannot end with an odd number of backslashes.
Raw strings are useful for constructing regular expressions.
Prefix characters (
B) are optional for quoted or triple-quoted strings, and indicate that the string is a raw/regex string or a byte sequence, respectively. For
b'''abc''' are both interpreted as type bytes. Prefix characters are case insensitive.
Quoted literals with prefixes:
||Quoted or triple-quoted literals that have the bytes literal prefix (
The table below lists all valid escape sequences for representing non-alphanumeric characters in string and byte literals. Any sequence not in this table produces an error.
||Question Mark (
||Double Quote (
||Single Quote (
||Octal escape, with exactly 3 digits (in the range 0–7). Decodes to a single Unicode character (in string literals) or byte (in bytes literals).|
||Hex escape, with exactly 2 hex digits (0–9 or A–F or a–f). Decodes to a single Unicode character (in string literals) or byte (in bytes literals). Examples:
||Unicode escape, with lowercase 'u' and exactly 4 hex digits. Valid only in string literals or identifiers.
Note that the range D800-DFFF is not allowed, as these are surrogate unicode values.
||Unicode escape, with uppercase 'U' and exactly 8 hex digits. Valid only in string literals or identifiers.
The range D800-DFFF is not allowed, as these values are surrogate unicode values. Also, values greater than 10FFFF are not allowed.
Integer literals are either a sequence of decimal digits (0–9) or a hexadecimal
value that is prefixed with "
0x". Integers can be prefixed by "
+" or "
to represent positive and negative values, respectively.
123 0xABC -123
An integer literal is interpreted as an
You can construct NUMERIC literals using the
NUMERIC keyword followed by a floating point value in quotes.
SELECT NUMERIC '0'; SELECT NUMERIC '123456'; SELECT NUMERIC '-3.14'; SELECT NUMERIC '-0.54321'; SELECT NUMERIC '1.23456e05'; SELECT NUMERIC '-9.876e-3';
Floating Point Literals
[+-]DIGITS.[DIGITS][e[+-]DIGITS] [DIGITS].DIGITS[e[+-]DIGITS] DIGITSe[+-]DIGITS
DIGITS represents one or more decimal numbers (0 through 9) and
e represents the exponent marker (e or E).
123.456e-67 .1E4 58. 4e2
Numeric literals that contain either a decimal point or an exponent marker are presumed to be type double.
Implicit coercion of floating point literals to float type is possible if the value is within the valid float range.
There is no literal representation of NaN or infinity, but the following case-insensitive strings can be explicitly cast to float:
- "inf" or "+inf"
Array literals are comma-separated lists of elements
enclosed in square brackets. The
ARRAY keyword is optional, and an explicit
element type T is also optional.
[1, 2, 3] ['x', 'y', 'xy'] ARRAY[1, 2, 3] ARRAY<string>['x', 'y', 'xy'] ARRAY<int64>
elem is an element in the struct.
elem must be a literal data type, not an expression or column name.
The output type is an anonymous struct type (structs are not named types) with anonymous fields with types matching the types of the input expressions.
Date literals contain the
DATE keyword followed by a string literal that conforms to the canonical date format, enclosed in single quotation marks. Date literals support a range between the
years 1 and 9999, inclusive. Dates outside of this range are invalid.
For example, the following date literal represents September 27, 2014:
String literals in canonical date format also implicitly coerce to DATE type when used where a DATE-type expression is expected. For example, in the query
SELECT * FROM foo WHERE date_col = "2014-09-27"
the string literal
"2014-09-27" will be coerced to a date literal.
TIME literals contain the
TIME keyword and a string literal that conforms to
the canonical time format, enclosed in single quotation marks.
For example, the following time represents 12:30 p.m.:
DATETIME 'YYYY-[M]M-[D]D [[H]H:[M]M:[S]S[.DDDDDD]]'
DATETIME literals contain the
DATETIME keyword and a string literal that
conforms to the canonical DATETIME format, enclosed in single quotation marks.
For example, the following DATETIME represents 12:30 p.m. on September 27, 2014:
DATETIME '2014-09-27 12:30:00.45'
DATETIME literals support a range between the years 1 and 9999, inclusive. DATETIMEs outside of this range are invalid.
String literals with the canonical DATETIME format implicitly coerce to a DATETIME literal when used where a DATETIME expression is expected.
SELECT * FROM foo WHERE datetime_col = "2014-09-27 12:30:00.45"
In this query, the string literal
"2014-09-27 12:30:00.45" is coerced to a
TIMESTAMP 'YYYY-[M]M-[D]D [[H]H:[M]M:[S]S[.DDDDDD]] [timezone]'
Timestamp literals contain the
TIMESTAMP keyword and a string literal that
conforms to the canonical timestamp format, enclosed in single quotation marks.
Timestamp literals support a range between the years 1 and 9999, inclusive. Timestamps outside of this range are invalid.
A timestamp literal can include a numerical suffix to indicate the time zone:
TIMESTAMP '2014-09-27 12:30:00.45-08'
If this suffix is absent, the default time zone, UTC, is used.
For example, the following timestamp represents 12:30 p.m. on September 27, 2014, using the UTC time zone:
TIMESTAMP '2014-09-27 12:30:00.45'
For more information about time zones, see Time zone.
String literals with the canonical timestamp format, including those with
time zone names, implicitly coerce to a timestamp literal when used where a
timestamp expression is expected. For example, in the following query, the
"2014-09-27 12:30:00.45 America/Los_Angeles" is coerced
to a timestamp literal.
SELECT * FROM foo WHERE timestamp_col = "2014-09-27 12:30:00.45 America/Los_Angeles"
Since timestamp literals must be mapped to a specific point in time, a time zone is necessary to correctly interpret a literal. If a time zone is not specified as part of the literal itself, then BigQuery uses the default time zone value, which the BigQuery implementation sets.
BigQuery represents time zones using strings in the following canonical format, which represents the offset from Coordinated Universal Time (UTC).
'-08:00' '-8:15' '+3:00' '+07:30' '-7'
Time zones can also be expressed using string time zone names from the
tz database. For a less comprehensive but
simpler reference, see the
List of tz database time zones
on Wikipedia. Canonical time zone names have the format
<continent/[region/]city>, such as
Note: Not all time zone names are interchangeable even if they do happen to
report the same time during a given part of the year. For example,
America/Los_Angeles reports the same time as
UTC-7:00 during Daylight Savings Time, but reports the same time as
UTC-8:00 outside of Daylight Savings Time.
TIMESTAMP '2014-09-27 12:30:00 America/Los_Angeles' TIMESTAMP '2014-09-27 12:30:00 America/Argentina/Buenos_Aires'
BigQuery follows these rules for case sensitivity:
|Aliases within a query||No|
|Regular expression matching||See Notes||Regular expression matching is case sensitive by default, unless the expression itself specifies that it should be case insensitive.|
Keywords are a group of tokens that have special meaning in the BigQuery language, and have the following characteristics:
- Keywords cannot be used as identifiers unless enclosed by backtick (`) characters.
- Keywords are case insensitive.
BigQuery has the following reserved keywords.
You can optionally use a terminating semicolon (
;) when you submit a query
string statement through an Application Programming Interface (API).
In a request containing multiple statements, you must separate statements with semicolons, but the semicolon is generally optional after the final statement. Some interactive tools require statements to have a terminating semicolon.
Comments are sequences of characters that the parser ignores. BigQuery supports the following types of comments.
Single-line comments are supported by prepending
-- before the
SELECT x FROM T; # x is a field and T is a table
Comment includes all characters from the '#' character to the end of the line.
SELECT x FROM T; --x is a field and T is a table
Comment includes all characters from the '
--' sequence to the end of the line. You can optionally add a space after the '
Multiline comments are supported by enclosing the comment using
/* <comment> */.
SELECT x FROM T /* x is a field and T is a table */ WHERE x = 3;
SELECT x FROM T /* comment starts here /* comment ends on this line */ this line is not considered a comment */ WHERE x = 3;
Comment includes all characters, including newlines, enclosed by the first
occurrence of '
/*' and the first subsequent occurrence of '
comments are not supported. The second example contains a nested comment that
renders the query invalid.