A GoogleSQL statement comprises a series of tokens. Tokens include identifiers, quoted identifiers, literals, keywords, operators, and special characters. You can separate tokens with comments or whitespace such as spaces, backspaces, tabs, or newlines.
Identifiers
Identifiers are names that are associated with columns, tables, fields, path expressions, and more. They can be unquoted or quoted and some are case-sensitive.
Unquoted identifiers
- Must begin with a letter or an underscore (_) character.
- Subsequent characters can be letters, numbers, or underscores (_).
Quoted identifiers
- Must be enclosed by backtick (`) characters.
- Can contain any characters, including spaces and symbols.
- Cannot be empty.
- Have the same escape sequences as string literals.
- If an identifier is the same as a reserved keyword, the
identifier must be quoted. For example, the identifier
FROM
must be quoted. Additional rules apply for path expressions and field names.
Identifier examples
Path expression examples:
-- Valid. _5abc and dataField are valid identifiers.
_5abc.dataField
-- Valid. `5abc` and dataField are valid identifiers.
`5abc`.dataField
-- Invalid. 5abc is an invalid identifier because it is unquoted and starts
-- with a number rather than a letter or underscore.
5abc.dataField
-- Valid. abc5 and dataField are valid identifiers.
abc5.dataField
-- Invalid. abc5! is an invalid identifier because it is unquoted and contains
-- a character that is not a letter, number, or underscore.
abc5!.dataField
-- Valid. `GROUP` and dataField are valid identifiers.
`GROUP`.dataField
-- Invalid. GROUP is an invalid identifier because it is unquoted and is a
-- stand-alone reserved keyword.
GROUP.dataField
-- Valid. abc5 and GROUP are valid identifiers.
abc5.GROUP
Function examples:
-- Valid. dataField is a valid identifier in a function called foo().
foo().dataField
Array access operation examples:
-- Valid. dataField is a valid identifier in an array called items.
items[OFFSET(3)].dataField
Named query parameter examples:
-- Valid. param and dataField are valid identifiers.
@param.dataField
Protocol buffer examples:
-- Valid. dataField is a valid identifier in a protocol buffer called foo.
(foo).dataField
Path expressions
A path expression describes how to navigate to an object in a graph of objects and generally follows this structure:
path:
[path_expression][. ...]
path_expression:
[first_part]/subsequent_part[ { / | : | - } subsequent_part ][...]
first_part:
{ unquoted_identifier | quoted_identifier }
subsequent_part:
{ unquoted_identifier | quoted_identifier | number }
path
: A graph of one or more objects.path_expression
: An object in a graph of objects.first_part
: A path expression can start with a quoted or unquoted identifier. If the path expressions starts with a reserved keyword, it must be a quoted identifier.subsequent_part
: Subsequent parts of a path expression can include non-identifiers, such as reserved keywords. If a subsequent part of a path expressions starts with a reserved keyword, it may be quoted or unquoted.
Examples:
foo.bar
foo.bar/25
foo/bar:25
foo/bar/25-31
/foo/bar
/25/foo/bar
Field names
A field name represents the name of a field inside a complex data type such as a struct, protocol buffer message, or JSON object.
- A field name can be a quoted identifier or an unquoted identifier.
Literals
A literal represents a constant value of a built-in data type. Some, but not all, data types can be expressed as literals.
String and bytes literals
A string literal represents a constant value of the string data type. A bytes literal represents a constant value of the bytes data type.
Both string and bytes literals must be quoted, either with single ('
) or
double ("
) quotation marks, or triple-quoted with groups of three single
('''
) or three double ("""
) quotation marks.
Formats for quoted literals
The following table lists all of the ways you can format a quoted literal.
Literal | Examples | Description |
---|---|---|
Quoted string |
|
Quoted strings enclosed by single (' ) quotes can contain unescaped double (" ) quotes, as well as the inverse. Backslashes ( \ ) introduce escape sequences. See the Escape Sequences table below.Quoted strings cannot contain newlines, even when preceded by a backslash ( \ ). |
Triple-quoted string |
|
Embedded newlines and quotes are allowed without escaping - see fourth example. Backslashes ( \ ) introduce escape sequences. See Escape Sequences table below.A trailing unescaped backslash ( \ ) at the end of a line is not allowed.End the string with three unescaped quotes in a row that match the starting quotes. |
Raw string |
|
Quoted or triple-quoted literals that have the raw string literal prefix (r or R ) are interpreted as raw strings (sometimes described as regex strings).Backslash characters ( \ ) do not act as escape characters. If a backslash followed by another character occurs inside the string literal, both characters are preserved.A raw string cannot end with an odd number of backslashes. Raw strings are useful for constructing regular expressions. The prefix is case-insensitive. |
Bytes |
|
Quoted or triple-quoted literals that have the bytes literal prefix (b or B ) are interpreted as bytes. |
Raw bytes |
|
A bytes literal can be interpreted as raw bytes if both the
r and b prefixes are present. These prefixes can be
combined in any order and are case-insensitive. For example,
rb'abc*' and rB'abc*' and br'abc*' are
all equivalent. See the description for raw string to learn more about
what you can do with a raw literal.
|
Escape sequences for string and bytes literals
The following table lists all valid escape sequences for representing non-alphanumeric characters in string and bytes literals. Any sequence not in this table produces an error.
Escape Sequence | Description |
---|---|
\a |
Bell |
\b |
Backspace |
\f |
Formfeed |
\n |
Newline |
\r |
Carriage Return |
\t |
Tab |
\v |
Vertical Tab |
\\ |
Backslash (\ ) |
\? |
Question Mark (? ) |
\" |
Double Quote (" ) |
\' |
Single Quote (' ) |
\` |
Backtick (` ) |
\ooo |
Octal escape, with exactly 3 digits (in the range 0–7). Decodes to a single Unicode character (in string literals) or byte (in bytes literals). |
\xhh or \Xhh |
Hex escape, with exactly 2 hex digits (0–9 or A–F or a–f). Decodes to a single Unicode character (in string literals) or byte (in bytes literals). Examples:
|
\uhhhh |
Unicode escape, with lowercase 'u' and exactly 4 hex digits. Valid only in string literals or identifiers. Note that the range D800-DFFF is not allowed, as these are surrogate unicode values. |
\Uhhhhhhhh |
Unicode escape, with uppercase 'U' and exactly 8 hex digits. Valid only in string literals or identifiers. The range D800-DFFF is not allowed, as these values are surrogate unicode values. Also, values greater than 10FFFF are not allowed. |
Integer literals
Integer literals are either a sequence of decimal digits (0–9) or a hexadecimal
value that is prefixed with "0x
" or "0X
". Integers can be prefixed by "+
"
or "-
" to represent positive and negative values, respectively.
Examples:
123
0xABC
-123
An integer literal is interpreted as an INT64
.
A integer literal represents a constant value of the integer data type.
NUMERIC
literals
You can construct NUMERIC
literals using the
NUMERIC
keyword followed by a floating point value in quotes.
Examples:
SELECT NUMERIC '0';
SELECT NUMERIC '123456';
SELECT NUMERIC '-3.14';
SELECT NUMERIC '-0.54321';
SELECT NUMERIC '1.23456e05';
SELECT NUMERIC '-9.876e-3';
A NUMERIC
literal represents a constant value of the
NUMERIC
data type.
Floating point literals
Syntax options:
[+-]DIGITS.[DIGITS][e[+-]DIGITS]
[+-][DIGITS].DIGITS[e[+-]DIGITS]
DIGITSe[+-]DIGITS
DIGITS
represents one or more decimal numbers (0 through 9) and e
represents
the exponent marker (e or E).
Examples:
123.456e-67
.1E4
58.
4e2
Numeric literals that contain either a decimal point or an exponent marker are presumed to be type double.
Implicit coercion of floating point literals to float type is possible if the value is within the valid float range.
There is no literal representation of NaN or infinity, but the following case-insensitive strings can be explicitly cast to float:
- "NaN"
- "inf" or "+inf"
- "-inf"
A floating-point literal represents a constant value of the floating-point data type.
Array literals
Array literals are comma-separated lists of elements
enclosed in square brackets. The ARRAY
keyword is optional, and an explicit
element type T is also optional.
Examples:
[1, 2, 3]
['x', 'y', 'xy']
ARRAY[1, 2, 3]
ARRAY<string>['x', 'y', 'xy']
ARRAY<int64>[]
An array literal represents a constant value of the array data type.
Struct literals
A struct literal is a struct whose fields are all literals. Struct literals can be written using any of the syntaxes for constructing a struct (tuple syntax, typeless struct syntax, or typed struct syntax).
Note that tuple syntax requires at least two fields, in order to distinguish it from an ordinary parenthesized expression. To write a struct literal with a single field, use typeless struct syntax or typed struct syntax.
Example | Output Type |
---|---|
(1, 2, 3) |
STRUCT<INT64, INT64, INT64> |
(1, 'abc') |
STRUCT<INT64, STRING> |
STRUCT(1 AS foo, 'abc' AS bar) |
STRUCT<foo INT64, bar STRING> |
STRUCT<INT64, STRING>(1, 'abc') |
STRUCT<INT64, STRING> |
STRUCT(1) |
STRUCT<INT64> |
STRUCT<INT64>(1) |
STRUCT<INT64> |
A struct literal represents a constant value of the struct data type.
Date literals
Syntax:
DATE 'date_canonical_format'
Date literals contain the DATE
keyword followed by
date_canonical_format
,
a string literal that conforms to the canonical date format, enclosed in single
quotation marks. Date literals support a range between the
years 1 and 9999, inclusive. Dates outside of this range are invalid.
For example, the following date literal represents September 27, 2014:
DATE '2014-09-27'
String literals in canonical date format also implicitly coerce to DATE type when used where a DATE-type expression is expected. For example, in the query
SELECT * FROM foo WHERE date_col = "2014-09-27"
the string literal "2014-09-27"
will be coerced to a date literal.
A date literal represents a constant value of the date data type.
Timestamp literals
Syntax:
TIMESTAMP 'timestamp_canonical_format'
Timestamp literals contain the TIMESTAMP
keyword and
timestamp_canonical_format
, a string literal that
conforms to the canonical timestamp format, enclosed in single quotation marks.
Timestamp literals support a range between the years 1 and 9999, inclusive. Timestamps outside of this range are invalid.
A timestamp literal can include a numerical suffix to indicate the time zone:
TIMESTAMP '2014-09-27 12:30:00.45-08'
If this suffix is absent, the default time zone, America/Los_Angeles, is used.
For example, the following timestamp represents 12:30 p.m. on September 27, 2014 in the default time zone, America/Los_Angeles:
TIMESTAMP '2014-09-27 12:30:00.45'
For more information about time zones, see Time zone.
String literals with the canonical timestamp format, including those with
time zone names, implicitly coerce to a timestamp literal when used where a
timestamp expression is expected. For example, in the following query, the
string literal "2014-09-27 12:30:00.45 America/Los_Angeles"
is coerced
to a timestamp literal.
SELECT * FROM foo
WHERE timestamp_col = "2014-09-27 12:30:00.45 America/Los_Angeles"
A timestamp literal can include these optional characters:
T
ort
Z
orz
If you use one of these characters, a space can't be included before or after it. These are valid:
TIMESTAMP '2017-01-18T12:34:56.123456Z'
TIMESTAMP '2017-01-18t12:34:56.123456'
TIMESTAMP '2017-01-18 12:34:56.123456z'
TIMESTAMP '2017-01-18 12:34:56.123456Z'
A timestamp literal represents a constant value of the timestamp data type.
Time zone
Since timestamp literals must be mapped to a specific point in time, a time zone is necessary to correctly interpret a literal. If a time zone is not specified as part of the literal itself, then GoogleSQL uses the default time zone value, which the GoogleSQL implementation sets.
GoogleSQL can represent a time zones using a string, which represents the offset from Coordinated Universal Time (UTC).
Examples:
'-08:00'
'-8:15'
'+3:00'
'+07:30'
'-7'
Time zones can also be expressed using string time zone names.
Examples:
TIMESTAMP '2014-09-27 12:30:00 America/Los_Angeles'
TIMESTAMP '2014-09-27 12:30:00 America/Argentina/Buenos_Aires'
Enum literals
There is no syntax for enum literals. Integer or string literals are coerced to the enum type when necessary, or explicitly cast to a specific enum type name. For more information, see Literal coercion.
An enum literal represents a constant value of the enum data type.
JSON literals
Syntax:
JSON 'json_formatted_data'
A JSON literal represents JSON-formatted data.
Example:
JSON '
{
"id": 10,
"type": "fruit",
"name": "apple",
"on_menu": true,
"recipes":
{
"salads":
[
{ "id": 2001, "type": "Walnut Apple Salad" },
{ "id": 2002, "type": "Apple Spinach Salad" }
],
"desserts":
[
{ "id": 3001, "type": "Apple Pie" },
{ "id": 3002, "type": "Apple Scones" },
{ "id": 3003, "type": "Apple Crumble" }
]
}
}
'
A JSON literal represents a constant value of the JSON data type.
Case sensitivity
GoogleSQL follows these rules for case sensitivity:
Category | Case-sensitive? | Notes |
---|---|---|
Keywords | No | |
Function names | No | |
Table names | See Notes | Table names are usually case-insensitive, but they might be case-sensitive when querying a database that uses case-sensitive table names. |
Column names | No | |
All type names except for protocol buffer type names | No | |
Protocol buffer type names | Yes | |
String values | Yes |
Any value of type STRING preserves its case. For example, the result of an expression that produces a STRING value or a column value that's of type STRING .
|
String comparisons | Yes | |
Aliases within a query | No | |
Regular expression matching | See Notes | Regular expression matching is case-sensitive by default, unless the expression itself specifies that it should be case-insensitive. |
LIKE matching |
Yes | |
Property graph names | No | |
Property graph label names | No | |
Property graph property names | No |
Reserved keywords
Keywords are a group of tokens that have special meaning in the GoogleSQL language, and have the following characteristics:
- Keywords cannot be used as identifiers unless enclosed by backtick (`) characters.
- Keywords are case-insensitive.
GoogleSQL has the following reserved keywords.
ALL AND ANY ARRAY AS ASC ASSERT_ROWS_MODIFIED AT BETWEEN BY CASE CAST COLLATE CONTAINS CREATE CROSS CUBE CURRENT DEFAULT DEFINE DESC DISTINCT ELSE END |
ENUM ESCAPE EXCEPT EXCLUDE EXISTS EXTRACT FALSE FETCH FOLLOWING FOR FROM FULL GRAPH_TABLE GROUP GROUPING GROUPS HASH HAVING IF IGNORE IN INNER INTERSECT INTERVAL INTO |
IS JOIN LATERAL LEFT LIKE LIMIT LOOKUP MERGE NATURAL NEW NO NOT NULL NULLS OF ON OR ORDER OUTER OVER PARTITION PRECEDING PROTO RANGE |
RECURSIVE RESPECT RIGHT ROLLUP ROWS SELECT SET SOME STRUCT TABLESAMPLE THEN TO TREAT TRUE UNBOUNDED UNION UNNEST USING WHEN WHERE WINDOW WITH WITHIN |
Terminating semicolons
You can optionally use a terminating semicolon (;
) when you submit a query
string statement through an Application Programming Interface (API).
In a request containing multiple statements, you must separate statements with semicolons, but the semicolon is generally optional after the final statement. Some interactive tools require statements to have a terminating semicolon.
Trailing commas
You can optionally use a trailing comma (,
) at the end of a column list in a
SELECT
statement. You might have a trailing comma as the result of
programmatically creating a column list.
Example
SELECT name, release_date, FROM Books
Query parameters
You can use query parameters to substitute arbitrary expressions. However, query parameters cannot be used to substitute identifiers, column names, table names, or other parts of the query itself. Query parameters are defined outside of the query statement.
Client APIs allow the binding of parameter names to values; the query engine substitutes a bound value for a parameter at execution time.
Parameterized queries have better query cache hit rates resulting in lower query latency and lower overall CPU usage.
For example, instead of using a query like the following:
SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, 'cat')
use the following syntax:
SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, @p)
Spanner runs the query optimizer on distinct SQL. The fewer distinct SQL instances the application uses, the fewer times the query optimization is invoked.
Named query parameters
Syntax:
@parameter_name
A named query parameter is denoted using an identifier
preceded by the @
character.
A named query parameter can start with an identifier or a reserved keyword. An identifier can be unquoted or quoted.
Example:
This example returns all rows where LastName
is equal to the value of the
named query parameter myparam
.
SELECT * FROM Roster WHERE LastName = @myparam
Hints
@{ hint [, ...] }
hint:
[engine_name.]hint_name = value
The purpose of a hint is to modify the execution strategy for a query without changing the result of the query. Hints generally do not affect query semantics, but may have performance implications. These hint types are available:
Hint syntax requires the @
character followed by curly braces.
You can create one hint or a group of hints. The optional engine_name.
prefix allows for multiple engines to define hints with the same hint_name
.
This is important if you need to suggest different engine-specific
execution strategies or different engines support different hints.
You can assign identifiers and literals to hints.
- Identifiers are useful for hints that are meant to act like enums.
You can use an identifier to avoid using a quoted string.
In the resolved AST, identifier hints are represented as string literals,
so
@{hint="abc"}
is the same as@{hint=abc}
. Identifier hints can also be used for hints that take a table name or column name as a single identifier. - NULL literals are allowed and are inferred as integers.
Hints are meant to apply only to the node they are attached to,
and not to a larger scope.
For example, a hint on a JOIN
in the middle of the
FROM
clause is meant to apply to that JOIN
only, and not other JOIN
s
in the FROM
clause.
Statement-level hints can be used for hints
that modify execution of an entire statement, for example an overall memory
budget or deadline.
Examples
In this example, a literal is assigned to a hint. This hint is only used
with two database engines called database_engine_a
and database_engine_b
.
The value for the hint is different for each database engine.
@{ database_engine_a.file_count=23, database_engine_b.file_count=10 }
In this example, an identifier is assigned to a hint. There are unique identifiers for each hint type. You can view a list of hint types at the beginning of this topic.
@{ JOIN_METHOD=HASH_JOIN }
Comments
Comments are sequences of characters that the parser ignores. GoogleSQL supports the following types of comments.
Single-line comments
Use a single-line comment if you want the comment to appear on a line by itself.
Examples
# this is a single-line comment
SELECT book FROM library;
-- this is a single-line comment
SELECT book FROM library;
/* this is a single-line comment */
SELECT book FROM library;
SELECT book FROM library
/* this is a single-line comment */
WHERE book = "Ulysses";
Inline comments
Use an inline comment if you want the comment to appear on the same line as
a statement. A comment that is prepended with #
or --
must appear to the
right of a statement.
Examples
SELECT book FROM library; # this is an inline comment
SELECT book FROM library; -- this is an inline comment
SELECT book FROM library; /* this is an inline comment */
SELECT book FROM library /* this is an inline comment */ WHERE book = "Ulysses";
Multiline comments
Use a multiline comment if you need the comment to span multiple lines. Nested multiline comments are not supported.
Examples
SELECT book FROM library
/*
This is a multiline comment
on multiple lines
*/
WHERE book = "Ulysses";
SELECT book FROM library
/* this is a multiline comment
on two lines */
WHERE book = "Ulysses";