Query syntax

Query statements scan one or more tables or expressions and return the computed result rows. This topic describes the syntax for SQL queries in GoogleSQL for Bigtable.

SQL syntax notation rules

The GoogleSQL documentation commonly uses the following syntax notation rules:

  • Square brackets [ ]: Optional clause.
  • Curly braces with vertical bars { a | b | c }: Logical OR. Select one option.
  • Ellipsis ...: Preceding item can repeat.
  • Double quotes ": Syntax wrapped in double quotes ("") is required.

SQL syntax

query_statement:
  query_expr

query_expr:
  { select | ( query_expr ) }
  [ ORDER BY expression [ ASC ] ]
  [ LIMIT count ]

select:
  SELECT
    select_list
  [ FROM from_clause[, ...] ]
  [ WHERE bool_expression ]

SELECT statement

SELECT
  select_list

select_list:
  { select_all | select_expression } [, ...]

select_all:
  [ expression. ]*
  [ EXCEPT ( column_name [, ...] ) ]
  [ REPLACE ( expression [ AS ] column_name [, ...] ) ]

select_expression:
  expression [ [ AS ] alias ]

The SELECT list defines the columns that the query will return. Expressions in the SELECT list can refer to columns in any of the from_items in its corresponding FROM clause.

Each item in the SELECT list is one of:

  • *
  • expression
  • expression.*

SELECT *

SELECT *, often referred to as select star, produces one output column for each column that is visible after executing the full query.

SELECT * FROM example_table;

/*-------------------------------------------------------------------------------+
 | _key | f1               | f2                                 |    f3          |
 +-------------------------------------------------------------------------------+
 | a#01 | {                | {                                  | {              |
 |      |   "c1": "xyz",   |   "col1": "def",                   |   "cl1": "{    |
 |      |   "c2": "nop"    |   "col2": "125",                   |     "n": "Def",|
 |      | }                |   "col3": "2008-12-25 15:30:00+00" |     "g": 40,   |
 |      |                  | }                                  |     "k": 3     |
 |      |                  |                                    |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | a#02 | {                | {                                  | {              |
 |      |   "c1": "zyx",   |   "col1": "cba",                   |   "cl1": "{    |
 |      | }                |   "col2": "123",                   |     "n": "Gih",|
 |      |                  |   "col3": "2018-10-15 15:30:00+00" |     "g": 20,   |
 |      |                  | }                                  |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | a#03 | {                | {                                  | {              |
 |      |   "c1": "abc",   |   "col1": "cba",                   |   "cl1": "{    |
 |      |   "c2": "def"    |   "col2": "543",                   |     "n": "Xyz",|
 |      | }                |   "col3": "2020-11-10 13:21:00+00" |     "g": 15,   |
 |      |                  | }                                  |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | b#01 | {                | {                                  | NULL           |
 |      |   "c1": "jkl",   |   "col1": "qrs",                   |                |
 |      |   "c2": "mno"    |   "col2": "654",                   |                |
 |      | }                |   "col3": "2021-10-10 14:18:03+07" |                |
 |      |                  | }                                  |                |
 +-------------------------------------------------------------------------------+
 | b#02 | {                | {                                  | NULL           |
 |      |   "c1": "hi",    |   "col1": "tu",                    |                |
 |      |   "c2": "no"     |   "col3": "2023-10-10 11:28:09+05",|                |
 |      | }                | }                                  |                |
 +-------------------------------------------------------------------------------+
 | c#03 | {                | {                                  | {              |
 |      |   "c1": "j",     |   "col1": "u",                     |   "cl1": "{    |
 |      |   "c2": "l"      |  }                                 |     "n": "T",  |
 |      | }                | }                                  |     "g": 22,   |
 |      |                  |                                    |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------*/

SELECT expression

Items in a SELECT list can be expressions. These expressions evaluate to a single value and produce one output column, with an optional explicit alias.

If the expression doesn't have an explicit alias, it receives an implicit alias according to the rules for implicit aliases, if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query.

In the following example, the first expression has an explicit alias for f1 and the second expression has an implicit alias for f1:

SELECT example_table.f1 FROM example_table;
SELECT f1 FROM example_table;

/*-------------------------------------+
 | _key | f1                           |
 +-------------------------------------+
 | a#01 | { "c1": "xyz", "c2": "nop" } |
 | a#02 | { "c1": "zyx" }              |
 | a#03 | { "c1": "abc", "c2": "def" } |
 | b#01 | { "c1": "jkl", "c2": "mno" } |
 | b#02 | { "c1": "hi", "c2": "no" }   |
 | c#03 | { "c1": "j", "c2": "l" }     |
 +-------------------------------------*/

SELECT expression.*

An item in a SELECT list can also take the form of expression.*. This produces one output column for each column or top-level field of expression. The expression must either be a table alias or evaluate to a single value of a data type with fields, such as a STRUCT.

SELECT example_table.* FROM example_table;

/*-------------------------------------------------------------------------------+
 | _key | f1               | f2                                 |    f3          |
 +-------------------------------------------------------------------------------+
 | a#01 | {                | {                                  | {              |
 |      |   "c1": "xyz",   |   "col1": "def",                   |   "cl1": "{    |
 |      |   "c2": "nop"    |   "col2": "125",                   |     "n": "Def",|
 |      | }                |   "col3": "2008-12-25 15:30:00+00" |     "g": 40,   |
 |      |                  | }                                  |     "k": 3     |
 |      |                  |                                    |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | a#02 | {                | {                                  | {              |
 |      |   "c1": "zyx",   |   "col1": "cba",                   |   "cl1": "{    |
 |      | }                |   "col2": "123",                   |     "n": "Gih",|
 |      |                  |   "col3": "2018-10-15 15:30:00+00" |     "g": 20,   |
 |      |                  | }                                  |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | a#03 | {                | {                                  | {              |
 |      |   "c1": "abc",   |   "col1": "cba",                   |   "cl1": "{    |
 |      |   "c2": "def"    |   "col2": "543",                   |     "n": "Xyz",|
 |      | }                |   "col3": "2020-11-10 13:21:00+00" |     "g": 15,   |
 |      |                  | }                                  |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | b#01 | {                | {                                  | NULL           |
 |      |   "c1": "jkl",   |   "col1": "qrs",                   |                |
 |      |   "c2": "mno"    |   "col2": "654",                   |                |
 |      | }                |   "col3": "2021-10-10 14:18:03+07" |                |
 |      |                  | }                                  |                |
 +-------------------------------------------------------------------------------+
 | b#02 | {                | {                                  | NULL           |
 |      |   "c1": "hi",    |   "col1": "tu",                    |                |
 |      |   "c2": "no"     |   "col3": "2023-10-10 11:28:09+05",|                |
 |      | }                | }                                  |                |
 +-------------------------------------------------------------------------------+
 | c#03 | {                | {                                  | {              |
 |      |   "c1": "j",     |   "col1": "u",                     |   "cl1": "{    |
 |      |   "c2": "l"      |  }                                 |     "n": "T",  |
 |      | }                | }                                  |     "g": 22,   |
 |      |                  |                                    |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------*/

SELECT * EXCEPT

A SELECT * EXCEPT statement specifies the names of one or more columns to exclude from the result. All matching column names are omitted from the output.

SELECT * EXCEPT (f2) FROM example_table;

/*---------------------------------------------------------------------------------+
 | _key | f1                           | f3                                        |
 +---------------------------------------------------------------------------------+
 | a#01 | { "c1": "xyz", "c2": "nop" } | { "cl1": "{"n": "Def", "g": 40,"k": 3}" } |
 | a#02 | { "c1": "zyx" }              | { "cl1": "{"n": "Gih", "g": 20}" }        |
 | a#03 | { "c1": "abc", "c2": "def" } | { "cl1": "{"n": "Xyz", "g": 15}" }        |
 | b#01 | { "c1": "jkl", "c2": "mno" } | NULL                                      |
 | b#02 | { "c1": "hi", "c2": "no" }   | NULL                                      |
 | c#03 | { "c1": "j", "c2": "l" }     | { "cl1": "{"n": "T", "g": 22}" }          |
 +---------------------------------------------------------------------------------*/

SELECT * REPLACE

A SELECT * REPLACE statement specifies one or more expression AS identifier clauses. Each identifier must match a column name from the SELECT * statement. In the output column list, the column that matches the identifier in a REPLACE clause is replaced by the expression in that REPLACE clause.

A SELECT * REPLACE statement doesn't change the names or order of columns. However, it can change the value and the value type.

SELECT * REPLACE (f1 AS f2) FROM sample1;

/*----------------------------------------------------------------------------------------------------------------+
 | _key | f1                           | f2                           | f3                                        |
 +----------------------------------------------------------------------------------------------------------------+
 | a#01 | { "c1": "xyz", "c2": "nop" } | { "c1": "xyz", "c2": "nop" } | { "cl1": "{"n": "Def", "g": 40,"k": 3}" } |
 | a#02 | { "c1": "zyx" }              | { "c1": "zyx" }              | { "cl1": "{"n": "Gih", "g": 20}" }        |
 | a#03 | { "c1": "abc", "c2": "def" } | { "c1": "abc", "c2": "def" } | { "cl1": "{"n": "Xyz", "g": 15}" }        |
 | b#01 | { "c1": "jkl", "c2": "mno" } | { "c1": "jkl", "c2": "mno" } | NULL                                      |
 | b#02 | { "c1": "hi", "c2": "no" }   | { "c1": "hi", "c2": "no" }   | NULL                                      |
 | c#03 | { "c1": "j", "c2": "l" }     | { "c1": "j", "c2": "l" }     | { "cl1": "{"n": "T", "g": 22}" }          |
 +----------------------------------------------------------------------------------------------------------------*/

FROM clause

FROM from_clause[, ...]

from_clause:
  from_item

from_item:
  table_name [ as_alias ]

as_alias:
  [ AS ] alias

The FROM clause indicates the table or tables from which to retrieve rows, and specifies how to join those rows together to produce a single stream of rows for processing in the rest of the query.

table_name

The name of an existing table.

SELECT * FROM Roster;

WHERE clause

WHERE bool_expression

The WHERE clause filters the results of the FROM clause.

Only rows whose bool_expression evaluates to TRUE are included. Rows whose bool_expression evaluates to NULL or FALSE are discarded.

The evaluation of a query with a WHERE clause is typically completed in this order:

  • FROM
  • WHERE
  • ORDER BY
  • LIMIT

Evaluation order doesn't always match syntax order.

The WHERE clause only references columns available via the FROM clause; it cannot reference SELECT list aliases.

Examples

This query returns returns all rows from the example_table table where f2['col2'] starts with 1:

SELECT f2 FROM example_table
WHERE STARTS_WITH(CAST(f2['col2'] AS STRING), "1");

/*--------------------------------------------------------------------+
 | f2                                                                 |
 +--------------------------------------------------------------------+
 | { "col1": "def", "col2": "125", "col3": "2008-12-25 15:30:00+00" } |
 | { "col1": "cba", "col2": "123", "col3": "2018-10-15 15:30:00+00" } |
 +--------------------------------------------------------------------*/

The bool_expression can contain multiple sub-conditions:

SELECT f2 FROM example_table
WHERE
  STARTS_WITH(CAST(f2['col1'] AS STRING), "c") OR
  STARTS_WITH(CAST(f2['col2'] AS STRING), "1");

/*--------------------------------------------------------------------+
 | f2                                                                 |
 +--------------------------------------------------------------------+
 | { "col1": "def", "col2": "125", "col3": "2008-12-25 15:30:00+00" } |
 | { "col1": "cba", "col2": "123", "col3": "2018-10-15 15:30:00+00" } |
 | { "col1": "cba", "col2": "543", "col3": "2020-11-10 13:21:00+00" } |
 +--------------------------------------------------------------------*/

ORDER BY clause

ORDER BY _key [ ASC ]

The ORDER BY clause specifies a column or expression as the sort criterion for the result set. If an ORDER BY clause is not present, the order of the results of a query is not defined. Only the _key generated column can be used with the ORDER BY clause.

Optional Clauses

  • ASC: Sort the results in ascending order of _key values. ASC is the default value.

Examples

Use the default sort order (ascending).

SELECT _key, f1 FROM example_table ORDER BY _key;
SELECT _key, f1 FROM example_table ORDER BY _key ASC;

/*-------------------------------------+
 | _key | f1                           |
 +-------------------------------------+
 | a#01 | { "c1": "xyz", "c2": "nop" } |
 | a#02 | { "c1": "zyx" }              |
 | a#03 | { "c1": "abc", "c2": "def" } |
 | b#01 | { "c1": "jkl", "c2": "mno" } |
 | b#02 | { "c1": "hi", "c2": "no" }   |
 | c#03 | { "c1": "j", "c2": "l" }     |
 +-------------------------------------*/

LIMIT clause

LIMIT count

Limits the number of rows to return in a query.

Definitions

  • LIMIT: Limits the number of rows to produce.

    count is an INT64 constant expression that represents the non-negative, non-NULL limit. No more than count rows are produced. LIMIT 0 returns 0 rows.

Details

The rows that are returned by LIMIT have undefined order unless these clauses are used after ORDER BY.

A constant expression can be represented by a general expression, literal, or parameter value.

Examples

SELECT f1 FROM example_table LIMIT 2;

/*-------------------------------------+
 | _key | f1                           |
 +-------------------------------------+
 | a#01 | { "c1": "xyz", "c2": "nop" } |
 | a#02 | { "c1": "zyx" }              |
 +-------------------------------------*/

Using aliases

An alias is a temporary name given to a table, column, or expression present in a query. You can introduce explicit aliases in the SELECT list or FROM clause, or GoogleSQL will infer an implicit alias for some expressions. Expressions with neither an explicit nor implicit alias are anonymous and the query cannot reference them by name.

Explicit aliases

You can introduce explicit aliases in either the FROM clause or the SELECT list.

In a FROM clause, you can introduce explicit aliases for any item, including tables, arrays, using [AS] alias. The AS keyword is optional.

Example:

SELECT bar.f1
FROM example_table AS bar;

You can introduce explicit aliases for any expression in the SELECT list using [AS] alias. The AS keyword is optional.

Example:

SELECT f1 AS foo
FROM example_table;

Implicit aliases

In the SELECT list, if there is an expression that doesn't have an explicit alias, GoogleSQL assigns an implicit alias according to the following rules. There can be multiple columns with the same alias in the SELECT list.

  • For identifiers, the alias is the identifier. For example, SELECT abc implies AS abc.
  • For path expressions, the alias is the last identifier in the path. For example, SELECT abc.def.ghi implies AS ghi.
  • For field access using the "dot" member field access operator, the alias is the field name. For example, SELECT (struct_function()).fname implies AS fname.

In all other cases, there is no implicit alias, so the column is anonymous and cannot be referenced by name. The data from that column will still be returned and the displayed query results may have a generated label for that column, but the label cannot be used like an alias.

In a FROM clause, from_items are not required to have an alias. The following rules apply:

  • If there is an expression that doesn't have an explicit alias, GoogleSQL assigns an implicit alias in these cases:
    • For identifiers, the alias is the identifier. For example, FROM abc implies AS abc.
    • For path expressions, the alias is the last identifier in the path. For example, FROM abc.def.ghi implies AS ghi
    • The column produced using WITH OFFSET has the implicit alias offset.

Alias visibility

After you introduce an explicit alias in a query, there are restrictions on where else in the query you can reference that alias. These restrictions on alias visibility are the result of GoogleSQL name scoping rules.

Visibility in the FROM clause

GoogleSQL processes aliases in a FROM clause from left to right, and aliases are visible only to subsequent path expressions in a FROM clause.

You can use any column name from a table in the FROM as an alias anywhere in the query, with or without qualification with the table name.

Example:

SELECT f1, s.f2
FROM example_table AS s
WHERE STARTS_WITH(CAST(f2['col1'] AS STRING), "1");

If the FROM clause contains an explicit alias, you must use the explicit alias instead of the implicit alias for the remainder of the query (see Implicit Aliases). A table alias is useful for brevity or to eliminate ambiguity in cases such as self-joins, where the same table is scanned multiple times during query processing.

Example:

SELECT * FROM example_table AS s,
ORDER BY s._key

Invalid — ORDER BY doesn't use the table alias:

SELECT * FROM example_table AS s
ORDER BY example_table._key;  // INVALID.

Visibility in the SELECT list

Aliases in the SELECT list are not visible to any clauses.

Duplicate aliases

A SELECT list containing multiple explicit or implicit aliases of the same name is allowed, as long as the alias name is not referenced elsewhere in the query, since the reference would be ambiguous.

Example:

SELECT 1 AS a, 2 AS a;

/*---+---*
 | a | a |
 +---+---+
 | 1 | 2 |
 *---+---*/

Ambiguous aliases

GoogleSQL provides an error if accessing a name is ambiguous, meaning it can resolve to more than one unique object in the query or in a table schema, including the schema of a destination table.

A name is not ambiguous in ORDER BY if it is both a column name and a SELECT list alias, as long as the name resolves to the same underlying object.

Range variables

In GoogleSQL, a range variable is a table expression alias in the FROM clause. Sometimes a range variable is known as a table alias. A range variable lets you reference rows being scanned from a table expression. A table expression represents an item in the FROM clause that returns a table. Common items that this expression can represent include tables.

In general, a range variable provides a reference to the rows of a table expression. A range variable can be used to qualify a column reference and unambiguously identify the related table, for example range_variable.column_1.

When referencing a range variable on its own without a specified column suffix, the result of a table expression is the row type of the related table. Value tables have explicit row types, so for range variables related to value tables, the result type is the value table's row type. Other tables don't have explicit row types, and for those tables, the range variable type is a dynamically defined struct that includes all of the columns in the table.

Examples

The following example selects column f1 from range variable bar, which in effect selects column f1 from table example_table.

SELECT bar.f1 FROM example_table AS bar;

Appendix A: examples with sample data

These examples include statements which perform queries on the Roster and TeamMascot, and PlayerStats tables.

Sample table with column families

The following table called example_table is used to illustrate the behavior of different query clauses in this reference:

/*-------------------------------------------------+
 |              example_table                      |
 +-------------------------------------------------+
 | column family name | column names (BYTES)       |
 +-------------------------------------------------+
 | f1                 | { "c1", "c2" }             |
 | f2                 | { "col1", "col2", "col3" } |
 | f3                 | { "cl1" }                  |
 +-------------------------------------------------*/

When queried, example_table produces the following results:

/*-------------------------------------------------------------------------------+
 | _key | f1               | f2                                 |    f3          |
 +-------------------------------------------------------------------------------+
 | a#01 | {                | {                                  | {              |
 |      |   "c1": "xyz",   |   "col1": "def",                   |   "cl1": "{    |
 |      |   "c2": "nop"    |   "col2": "125",                   |     "n": "Def",|
 |      | }                |   "col3": "2008-12-25 15:30:00+00" |     "g": 40,   |
 |      |                  | }                                  |     "k": 3     |
 |      |                  |                                    |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | a#02 | {                | {                                  | {              |
 |      |   "c1": "zyx",   |   "col1": "cba",                   |   "cl1": "{    |
 |      | }                |   "col2": "123",                   |     "n": "Gih",|
 |      |                  |   "col3": "2018-10-15 15:30:00+00" |     "g": 20,   |
 |      |                  | }                                  |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | a#03 | {                | {                                  | {              |
 |      |   "c1": "abc",   |   "col1": "cba",                   |   "cl1": "{    |
 |      |   "c2": "def"    |   "col2": "543",                   |     "n": "Xyz",|
 |      | }                |   "col3": "2020-11-10 13:21:00+00" |     "g": 15,   |
 |      |                  | }                                  |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------+
 | b#01 | {                | {                                  | NULL           |
 |      |   "c1": "jkl",   |   "col1": "qrs",                   |                |
 |      |   "c2": "mno"    |   "col2": "654",                   |                |
 |      | }                |   "col3": "2021-10-10 14:18:03+07" |                |
 |      |                  | }                                  |                |
 +-------------------------------------------------------------------------------+
 | b#02 | {                | {                                  | NULL           |
 |      |   "c1": "hi",    |   "col1": "tu",                    |                |
 |      |   "c2": "no"     |   "col3": "2023-10-10 11:28:09+05",|                |
 |      | }                | }                                  |                |
 +-------------------------------------------------------------------------------+
 | c#03 | {                | {                                  | {              |
 |      |   "c1": "j",     |   "col1": "u",                     |   "cl1": "{    |
 |      |   "c2": "l"      |  }                                 |     "n": "T",  |
 |      | }                | }                                  |     "g": 22,   |
 |      |                  |                                    |   }"           |
 |      |                  |                                    | }              |
 +-------------------------------------------------------------------------------*/

In the preceding results, f1, f2, and f3 each represent a column family, and _key represents a key assigned to a group of column families. Each column family contains its own columns. For example, for key a#01, the column family f1 contains the columns c1 and c2. c1 contains the value "xyz" and c2 contains the value "nop".