This topic is a compilation of functions, operators, and conditional expressions.
To learn more about how to call functions, function call rules,
the SAFE
prefix, and special types of arguments,
see Function calls.
OPERATORS AND CONDITIONALS
Operators
GoogleSQL for BigQuery supports operators. Operators are represented by special characters or keywords; they do not use function call syntax. An operator manipulates any number of data inputs, also called operands, and returns a result.
Common conventions:
- Unless otherwise specified, all operators return
NULL
when one of the operands isNULL
. - All operators will throw an error if the computation result overflows.
- For all floating point operations,
+/-inf
andNaN
may only be returned if one of the operands is+/-inf
orNaN
. In other cases, an error is returned.
Operator precedence
The following table lists all GoogleSQL operators from highest to lowest precedence, i.e., the order in which they will be evaluated within a statement.
Order of Precedence | Operator | Input Data Types | Name | Operator Arity |
---|---|---|---|---|
1 | Field access operator |
STRUCT JSON |
Field access operator | Binary |
Array subscript operator | ARRAY |
Array position. Must be used with OFFSET or ORDINAL —see
Array Functions
. |
Binary | |
JSON subscript operator | JSON |
Field name or array position in JSON. | Binary | |
2 | + |
All numeric types | Unary plus | Unary |
- |
All numeric types | Unary minus | Unary | |
~ |
Integer or BYTES |
Bitwise not | Unary | |
3 | * |
All numeric types | Multiplication | Binary |
/ |
All numeric types | Division | Binary | |
|| |
STRING , BYTES , or ARRAY<T> |
Concatenation operator | Binary | |
4 | + |
All numeric types, DATE with
INT64
, INTERVAL
|
Addition | Binary |
- |
All numeric types, DATE with
INT64
, INTERVAL
|
Subtraction | Binary | |
5 | << |
Integer or BYTES |
Bitwise left-shift | Binary |
>> |
Integer or BYTES |
Bitwise right-shift | Binary | |
6 | & |
Integer or BYTES |
Bitwise and | Binary |
7 | ^ |
Integer or BYTES |
Bitwise xor | Binary |
8 | | |
Integer or BYTES |
Bitwise or | Binary |
9 (Comparison Operators) | = |
Any comparable type. See Data Types for a complete list. | Equal | Binary |
< |
Any comparable type. See Data Types for a complete list. | Less than | Binary | |
> |
Any comparable type. See Data Types for a complete list. | Greater than | Binary | |
<= |
Any comparable type. See Data Types for a complete list. | Less than or equal to | Binary | |
>= |
Any comparable type. See Data Types for a complete list. | Greater than or equal to | Binary | |
!= , <> |
Any comparable type. See Data Types for a complete list. | Not equal | Binary | |
[NOT] LIKE |
STRING and BYTES |
Value does [not] match the pattern specified | Binary | |
Quantified LIKE | STRING and BYTES |
Checks a search value for matches against several patterns. | Binary | |
[NOT] BETWEEN |
Any comparable types. See Data Types for a complete list. | Value is [not] within the range specified | Binary | |
[NOT] IN |
Any comparable types. See Data Types for a complete list. | Value is [not] in the set of values specified | Binary | |
IS [NOT] NULL |
All | Value is [not] NULL |
Unary | |
IS [NOT] TRUE |
BOOL |
Value is [not] TRUE . |
Unary | |
IS [NOT] FALSE |
BOOL |
Value is [not] FALSE . |
Unary | |
10 | NOT |
BOOL |
Logical NOT |
Unary |
11 | AND |
BOOL |
Logical AND |
Binary |
12 | OR |
BOOL |
Logical OR |
Binary |
Operators with the same precedence are left associative. This means that those operators are grouped together starting from the left and moving right. For example, the expression:
x AND y AND z
is interpreted as
( ( x AND y ) AND z )
The expression:
x * y / z
is interpreted as:
( ( x * y ) / z )
All comparison operators have the same priority, but comparison operators are not associative. Therefore, parentheses are required in order to resolve ambiguity. For example:
(x < y) IS FALSE
Operator list
Name | Summary |
---|---|
Field access operator | Gets the value of a field. |
Array subscript operator | Gets a value from an array at a specific position. |
Struct subscript operator | Gets the value of a field at a selected position in a struct. |
JSON subscript operator | Gets a value of an array element or field in a JSON expression. |
Arithmetic operators | Performs arithmetic operations. |
Date arithmetics operators | Performs arithmetic operations on dates. |
Datetime subtraction | Computes the difference between two datetimes as an interval. |
Interval arithmetic operators | Adds an interval to a datetime or subtracts an interval from a datetime. |
Bitwise operators | Performs bit manipulation. |
Logical operators |
Tests for the truth of some condition and produces TRUE ,
FALSE , or NULL .
|
Comparison operators |
Compares operands and produces the results of the comparison as a
BOOL value.
|
EXISTS operator
|
Checks if a subquery produces one or more rows. |
IN operator
|
Checks for an equal value in a set of values. |
IS operators
|
Checks for the truth of a condition and produces either TRUE or
FALSE .
|
IS DISTINCT FROM operator
|
Checks if values are considered to be distinct from each other. |
LIKE operator
|
Checks if values are like or not like one another. |
Quantified LIKE operator
|
Checks a search value for matches against several patterns. |
Concatenation operator | Combines multiple values into one. |
Field access operator
expression.fieldname[. ...]
Description
Gets the value of a field. Alternatively known as the dot operator. Can be
used to access nested fields. For example, expression.fieldname1.fieldname2
.
Input values:
STRUCT
JSON
Return type
- For
STRUCT
: SQL data type offieldname
. If a field is not found in the struct, an error is thrown. - For
JSON
:JSON
. If a field is not found in a JSON value, a SQLNULL
is returned.
Example
In the following example, the field access operations are .address
and
.country
.
SELECT
STRUCT(
STRUCT('Yonge Street' AS street, 'Canada' AS country)
AS address).address.country
/*---------*
| country |
+---------+
| Canada |
*---------*/
Array subscript operator
array_expression "[" array_subscript_specifier "]"
array_subscript_specifier:
{ index | position_keyword(index) }
position_keyword:
{ OFFSET | SAFE_OFFSET | ORDINAL | SAFE_ORDINAL }
Description
Gets a value from an array at a specific position.
Input values:
array_expression
: The input array.position_keyword(index)
: Determines where the index for the array should start and how out-of-range indexes are handled. The index is an integer that represents a specific position in the array.OFFSET(index)
: The index starts at zero. Produces an error if the index is out of range. To produceNULL
instead of an error, useSAFE_OFFSET(index)
. This position keyword produces the same result asindex
by itself.SAFE_OFFSET(index)
: The index starts at zero. ReturnsNULL
if the index is out of range.ORDINAL(index)
: The index starts at one. Produces an error if the index is out of range. To produceNULL
instead of an error, useSAFE_ORDINAL(index)
.SAFE_ORDINAL(index)
: The index starts at one. ReturnsNULL
if the index is out of range.
index
: An integer that represents a specific position in the array. If used by itself without a position keyword, the index starts at zero and produces an error if the index is out of range. To produceNULL
instead of an error, use theSAFE_OFFSET(index)
orSAFE_ORDINAL(index)
position keyword.
Return type
T
where array_expression
is ARRAY<T>
.
Examples
In following query, the array subscript operator is used to return values at
specific position in item_array
. This query also shows what happens when you
reference an index (6
) in an array that is out of range. If the SAFE
prefix
is included, NULL
is returned, otherwise an error is produced.
SELECT
["coffee", "tea", "milk"] AS item_array,
["coffee", "tea", "milk"][0] AS item_index,
["coffee", "tea", "milk"][OFFSET(0)] AS item_offset,
["coffee", "tea", "milk"][ORDINAL(1)] AS item_ordinal,
["coffee", "tea", "milk"][SAFE_OFFSET(6)] AS item_safe_offset
/*---------------------+------------+-------------+--------------+------------------*
| item_array | item_index | item_offset | item_ordinal | item_safe_offset |
+---------------------+------------+-------------+--------------+------------------+
| [coffee, tea, milk] | coffee | coffee | coffee | NULL |
*----------------------------------+-------------+--------------+------------------*/
When you reference an index that is out of range in an array, and a positional
keyword that begins with SAFE
is not included, an error is produced.
For example:
-- Error. Array index 6 is out of bounds.
SELECT ["coffee", "tea", "milk"][6] AS item_offset
-- Error. Array index 6 is out of bounds.
SELECT ["coffee", "tea", "milk"][OFFSET(6)] AS item_offset
Struct subscript operator
struct_expression "[" struct_subscript_specifier "]"
struct_subscript_specifier:
{ index | position_keyword(index) }
position_keyword:
{ OFFSET | ORDINAL }
Description
Gets the value of a field at a selected position in a struct.
Input types
struct_expression
: The input struct.position_keyword(index)
: Determines where the index for the struct should start and how out-of-range indexes are handled. The index is an integer literal or constant that represents a specific position in the struct.OFFSET(index)
: The index starts at zero. Produces an error if the index is out of range. Produces the same result asindex
by itself.ORDINAL(index)
: The index starts at one. Produces an error if the index is out of range.
index
: An integer literal or constant that represents a specific position in the struct. If used by itself without a position keyword, the index starts at zero and produces an error if the index is out of range.
Examples
In following query, the struct subscript operator is used to return values at
specific locations in item_struct
using position keywords. This query also
shows what happens when you reference an index (6
) in an struct that is out of
range.
SELECT
STRUCT<INT64, STRING, BOOL>(23, "tea", FALSE)[0] AS field_index,
STRUCT<INT64, STRING, BOOL>(23, "tea", FALSE)[OFFSET(0)] AS field_offset,
STRUCT<INT64, STRING, BOOL>(23, "tea", FALSE)[ORDINAL(1)] AS field_ordinal
/*-------------+--------------+---------------*
| field_index | field_offset | field_ordinal |
+-------------+--------------+---------------+
| 23 | 23 | 23 |
*-------------+--------------+---------------*/
When you reference an index that is out of range in a struct, an error is produced. For example:
-- Error: Field ordinal 6 is out of bounds in STRUCT
SELECT STRUCT<INT64, STRING, BOOL>(23, "tea", FALSE)[6] AS field_offset
-- Error: Field ordinal 6 is out of bounds in STRUCT
SELECT STRUCT<INT64, STRING, BOOL>(23, "tea", FALSE)[OFFSET(6)] AS field_offset
JSON subscript operator
json_expression "[" array_element_id "]"
json_expression "[" field_name "]"
Description
Gets a value of an array element or field in a JSON expression. Can be used to access nested data.
Input values:
JSON expression
: TheJSON
expression that contains an array element or field to return.[array_element_id]
: AnINT64
expression that represents a zero-based index in the array. If a negative value is entered, or the value is greater than or equal to the size of the array, or the JSON expression doesn't represent a JSON array, a SQLNULL
is returned.[field_name]
: ASTRING
expression that represents the name of a field in JSON. If the field name is not found, or the JSON expression is not a JSON object, a SQLNULL
is returned.
Return type
JSON
Example
In the following example:
json_value
is a JSON expression..class
is a JSON field access..students
is a JSON field access.[0]
is a JSON subscript expression with an element offset that accesses the zeroth element of an array in the JSON value.['name']
is a JSON subscript expression with a field name that accesses a field.
SELECT json_value.class.students[0]['name'] AS first_student
FROM
UNNEST(
[
JSON '{"class" : {"students" : [{"name" : "Jane"}]}}',
JSON '{"class" : {"students" : []}}',
JSON '{"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'])
AS json_value;
/*-----------------*
| first_student |
+-----------------+
| "Jane" |
| NULL |
| "John" |
*-----------------*/
Arithmetic operators
All arithmetic operators accept input of numeric type T
, and the result type
has type T
unless otherwise indicated in the description below:
Name | Syntax |
---|---|
Addition | X + Y |
Subtraction | X - Y |
Multiplication | X * Y |
Division | X / Y |
Unary Plus | + X |
Unary Minus | - X |
NOTE: Divide by zero operations return an error. To return a different result,
consider the IEEE_DIVIDE
or SAFE_DIVIDE
functions.
Result types for Addition, Subtraction and Multiplication:
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
INT64 | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
Result types for Division:
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
INT64 | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
Result types for Unary Plus:
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
Result types for Unary Minus:
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
Date arithmetics operators
Operators '+' and '-' can be used for arithmetic operations on dates.
date_expression + int64_expression
int64_expression + date_expression
date_expression - int64_expression
Description
Adds or subtracts int64_expression
days to or from date_expression
. This is
equivalent to DATE_ADD
or DATE_SUB
functions, when interval is expressed in
days.
Return Data Type
DATE
Example
SELECT DATE "2020-09-22" + 1 AS day_later, DATE "2020-09-22" - 7 AS week_ago
/*------------+------------*
| day_later | week_ago |
+------------+------------+
| 2020-09-23 | 2020-09-15 |
*------------+------------*/
Datetime subtraction
date_expression - date_expression
timestamp_expression - timestamp_expression
datetime_expression - datetime_expression
Description
Computes the difference between two datetime values as an interval.
Return Data Type
INTERVAL
Example
SELECT
DATE "2021-05-20" - DATE "2020-04-19" AS date_diff,
TIMESTAMP "2021-06-01 12:34:56.789" - TIMESTAMP "2021-05-31 00:00:00" AS time_diff
/*-------------------+------------------------*
| date_diff | time_diff |
+-------------------+------------------------+
| 0-0 396 0:0:0 | 0-0 0 36:34:56.789 |
*-------------------+------------------------*/
Interval arithmetic operators
Addition and subtraction
date_expression + interval_expression = DATETIME
date_expression - interval_expression = DATETIME
timestamp_expression + interval_expression = TIMESTAMP
timestamp_expression - interval_expression = TIMESTAMP
datetime_expression + interval_expression = DATETIME
datetime_expression - interval_expression = DATETIME
Description
Adds an interval to a datetime value or subtracts an interval from a datetime value.
Example
SELECT
DATE "2021-04-20" + INTERVAL 25 HOUR AS date_plus,
TIMESTAMP "2021-05-02 00:01:02.345" - INTERVAL 10 SECOND AS time_minus;
/*-------------------------+--------------------------------*
| date_plus | time_minus |
+-------------------------+--------------------------------+
| 2021-04-21 01:00:00 | 2021-05-02 00:00:52.345+00 |
*-------------------------+--------------------------------*/
Multiplication and division
interval_expression * integer_expression = INTERVAL
interval_expression / integer_expression = INTERVAL
Description
Multiplies or divides an interval value by an integer.
Example
SELECT
INTERVAL '1:2:3' HOUR TO SECOND * 10 AS mul1,
INTERVAL 35 SECOND * 4 AS mul2,
INTERVAL 10 YEAR / 3 AS div1,
INTERVAL 1 MONTH / 12 AS div2
/*----------------+--------------+-------------+--------------*
| mul1 | mul2 | div1 | div2 |
+----------------+--------------+-------------+--------------+
| 0-0 0 10:20:30 | 0-0 0 0:2:20 | 3-4 0 0:0:0 | 0-0 2 12:0:0 |
*----------------+--------------+-------------+--------------*/
Bitwise operators
All bitwise operators return the same type and the same length as the first operand.
Name | Syntax | Input Data Type | Description |
---|---|---|---|
Bitwise not | ~ X |
Integer or BYTES |
Performs logical negation on each bit, forming the ones' complement of the given binary value. |
Bitwise or | X | Y |
X : Integer or BYTES Y : Same type as X
|
Takes two bit patterns of equal length and performs the logical inclusive
OR operation on each pair of the corresponding bits.
This operator throws an error if X and Y are bytes of
different lengths.
|
Bitwise xor | X ^ Y |
X : Integer or BYTES Y : Same type as X
|
Takes two bit patterns of equal length and performs the
logical exclusive OR operation on each pair of the corresponding
bits.
This operator throws an error if X and Y are bytes of
different lengths.
|
Bitwise and | X & Y |
X : Integer or BYTES Y : Same type as X
|
Takes two bit patterns of equal length and performs the
logical AND operation on each pair of the corresponding bits.
This operator throws an error if X and Y are bytes of
different lengths.
|
Left shift | X << Y |
X : Integer or BYTES Y : INT64
|
Shifts the first operand X to the left.
This operator returns
0 or a byte sequence of b'\x00'
if the second operand Y is greater than or equal to
the bit length of the first operand X (for example, 64
if X has the type INT64 ).
This operator throws an error if Y is negative. |
Right shift | X >> Y |
X : Integer or BYTES Y : INT64 |
Shifts the first operand X to the right. This operator does not
do sign bit extension with a signed type (i.e., it fills vacant bits on the left
with 0 ). This operator returns
0 or a byte sequence of
b'\x00'
if the second operand Y is greater than or equal to
the bit length of the first operand X (for example, 64
if X has the type INT64 ).
This operator throws an error if Y is negative. |
Logical operators
GoogleSQL supports the AND
, OR
, and NOT
logical operators.
Logical operators allow only BOOL
or NULL
input
and use three-valued logic
to produce a result. The result can be TRUE
, FALSE
, or NULL
:
x |
y |
x AND y |
x OR y |
---|---|---|---|
TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
FALSE |
FALSE |
TRUE |
TRUE |
NULL |
NULL |
TRUE |
FALSE |
TRUE |
FALSE |
TRUE |
FALSE |
FALSE |
FALSE |
FALSE |
FALSE |
NULL |
FALSE |
NULL |
NULL |
TRUE |
NULL |
TRUE |
NULL |
FALSE |
FALSE |
NULL |
NULL |
NULL |
NULL |
NULL |
x |
NOT x |
---|---|
TRUE |
FALSE |
FALSE |
TRUE |
NULL |
NULL |
Examples
The examples in this section reference a table called entry_table
:
/*-------*
| entry |
+-------+
| a |
| b |
| c |
| NULL |
*-------*/
SELECT 'a' FROM entry_table WHERE entry = 'a'
-- a => 'a' = 'a' => TRUE
-- b => 'b' = 'a' => FALSE
-- NULL => NULL = 'a' => NULL
/*-------*
| entry |
+-------+
| a |
*-------*/
SELECT entry FROM entry_table WHERE NOT (entry = 'a')
-- a => NOT('a' = 'a') => NOT(TRUE) => FALSE
-- b => NOT('b' = 'a') => NOT(FALSE) => TRUE
-- NULL => NOT(NULL = 'a') => NOT(NULL) => NULL
/*-------*
| entry |
+-------+
| b |
| c |
*-------*/
SELECT entry FROM entry_table WHERE entry IS NULL
-- a => 'a' IS NULL => FALSE
-- b => 'b' IS NULL => FALSE
-- NULL => NULL IS NULL => TRUE
/*-------*
| entry |
+-------+
| NULL |
*-------*/
Comparison operators
Compares operands and produces the results of the comparison as a BOOL
value. These comparison operators are available:
Name | Syntax | Description |
---|---|---|
Less Than | X < Y |
Returns TRUE if X is less than Y .
This operator supports specifying collation.
|
Less Than or Equal To | X <= Y |
Returns TRUE if X is less than or equal to
Y .
This operator supports specifying collation.
|
Greater Than | X > Y |
Returns TRUE if X is greater than
Y .
This operator supports specifying collation.
|
Greater Than or Equal To | X >= Y |
Returns TRUE if X is greater than or equal to
Y .
This operator supports specifying collation.
|
Equal | X = Y |
Returns TRUE if X is equal to Y .
This operator supports specifying collation.
|
Not Equal | X != Y X <> Y |
Returns TRUE if X is not equal to
Y .
This operator supports specifying collation.
|
BETWEEN |
X [NOT] BETWEEN Y AND Z |
Returns |
LIKE |
X [NOT] LIKE Y |
See the `LIKE` operator for details. |
IN |
Multiple | See the `IN` operator for details. |
The following rules apply to operands in a comparison operator:
- The operands must be comparable.
- A comparison operator generally requires both operands to be of the same type.
- If the operands are of different types, and the values of those types can be converted to a common type without loss of precision, they are generally coerced to that common type for the comparison.
- A literal operand is generally coerced to the same data type of a non-literal operand that is part of the comparison.
- Struct operands support only these comparison operators: equal
(
=
), not equal (!=
and<>
), andIN
.
The following rules apply when comparing these data types:
FLOAT64
: All comparisons withNaN
returnFALSE
, except for!=
and<>
, which returnTRUE
.BOOL
:FALSE
is less thanTRUE
.STRING
: Strings are compared codepoint-by-codepoint, which means that canonically equivalent strings are only guaranteed to compare as equal if they have been normalized first.JSON
: You can't compare JSON, but you can compare the values inside of JSON if you convert the values to SQL values first. For more information, seeJSON
functions.NULL
: Any operation with aNULL
input returnsNULL
.STRUCT
: When testing a struct for equality, it's possible that one or more fields areNULL
. In such cases:- If all non-
NULL
field values are equal, the comparison returnsNULL
. - If any non-
NULL
field values are not equal, the comparison returnsFALSE
.
The following table demonstrates how
STRUCT
data types are compared when they have fields that areNULL
valued.Struct1 Struct2 Struct1 = Struct2 STRUCT(1, NULL)
STRUCT(1, NULL)
NULL
STRUCT(1, NULL)
STRUCT(2, NULL)
FALSE
STRUCT(1,2)
STRUCT(1, NULL)
NULL
- If all non-
EXISTS
operator
EXISTS ( subquery )
Description
Returns TRUE
if the subquery produces one or more rows. Returns FALSE
if
the subquery produces zero rows. Never returns NULL
. To learn more about
how you can use a subquery with EXISTS
,
see EXISTS
subqueries.
Examples
In this example, the EXISTS
operator returns FALSE
because there are no
rows in Words
where the direction is south
:
WITH Words AS (
SELECT 'Intend' as value, 'east' as direction UNION ALL
SELECT 'Secure', 'north' UNION ALL
SELECT 'Clarity', 'west'
)
SELECT EXISTS ( SELECT value FROM Words WHERE direction = 'south' ) as result;
/*--------*
| result |
+--------+
| FALSE |
*--------*/
IN
operator
The IN
operator supports the following syntax:
search_value [NOT] IN value_set
value_set:
{
(expression[, ...])
| (subquery)
| UNNEST(array_expression)
}
Description
Checks for an equal value in a set of values.
Semantic rules apply, but in general, IN
returns TRUE
if an equal value is found, FALSE
if an equal value is excluded, otherwise
NULL
. NOT IN
returns FALSE
if an equal value is found, TRUE
if an
equal value is excluded, otherwise NULL
.
search_value
: The expression that is compared to a set of values.value_set
: One or more values to compare to a search value.(expression[, ...])
: A list of expressions.(subquery)
: A subquery that returns a single column. The values in that column are the set of values. If no rows are produced, the set of values is empty.UNNEST(array_expression)
: An UNNEST operator that returns a column of values from an array expression. This is equivalent to:IN (SELECT element FROM UNNEST(array_expression) AS element)
This operator supports collation, but these limitations apply:
[NOT] IN UNNEST
does not support collation.- If collation is used with a list of expressions, there must be at least one item in the list.
Semantic rules
When using the IN
operator, the following semantics apply in this order:
- Returns
FALSE
ifvalue_set
is empty. - Returns
NULL
ifsearch_value
isNULL
. - Returns
TRUE
ifvalue_set
contains a value equal tosearch_value
. - Returns
NULL
ifvalue_set
contains aNULL
. - Returns
FALSE
.
When using the NOT IN
operator, the following semantics apply in this order:
- Returns
TRUE
ifvalue_set
is empty. - Returns
NULL
ifsearch_value
isNULL
. - Returns
FALSE
ifvalue_set
contains a value equal tosearch_value
. - Returns
NULL
ifvalue_set
contains aNULL
. - Returns
TRUE
.
The semantics of:
x IN (y, z, ...)
are defined as equivalent to:
(x = y) OR (x = z) OR ...
and the subquery and array forms are defined similarly.
x NOT IN ...
is equivalent to:
NOT(x IN ...)
The UNNEST
form treats an array scan like UNNEST
in the
FROM
clause:
x [NOT] IN UNNEST(<array expression>)
This form is often used with array parameters. For example:
x IN UNNEST(@array_parameter)
See the Arrays topic for more information on how to use this syntax.
IN
can be used with multi-part keys by using the struct constructor syntax.
For example:
(Key1, Key2) IN ( (12,34), (56,78) )
(Key1, Key2) IN ( SELECT (table.a, table.b) FROM table )
See the Struct Type topic for more information.
Return Data Type
BOOL
Examples
You can use these WITH
clauses to emulate temporary tables for
Words
and Items
in the following examples:
WITH Words AS (
SELECT 'Intend' as value UNION ALL
SELECT 'Secure' UNION ALL
SELECT 'Clarity' UNION ALL
SELECT 'Peace' UNION ALL
SELECT 'Intend'
)
SELECT * FROM Words;
/*----------*
| value |
+----------+
| Intend |
| Secure |
| Clarity |
| Peace |
| Intend |
*----------*/
WITH
Items AS (
SELECT STRUCT('blue' AS color, 'round' AS shape) AS info UNION ALL
SELECT STRUCT('blue', 'square') UNION ALL
SELECT STRUCT('red', 'round')
)
SELECT * FROM Items;
/*----------------------------*
| info |
+----------------------------+
| {blue color, round shape} |
| {blue color, square shape} |
| {red color, round shape} |
*----------------------------*/
Example with IN
and an expression:
SELECT * FROM Words WHERE value IN ('Intend', 'Secure');
/*----------*
| value |
+----------+
| Intend |
| Secure |
| Intend |
*----------*/
Example with NOT IN
and an expression:
SELECT * FROM Words WHERE value NOT IN ('Intend');
/*----------*
| value |
+----------+
| Secure |
| Clarity |
| Peace |
*----------*/
Example with IN
, a scalar subquery, and an expression:
SELECT * FROM Words WHERE value IN ((SELECT 'Intend'), 'Clarity');
/*----------*
| value |
+----------+
| Intend |
| Clarity |
| Intend |
*----------*/
Example with IN
and an UNNEST
operation:
SELECT * FROM Words WHERE value IN UNNEST(['Secure', 'Clarity']);
/*----------*
| value |
+----------+
| Secure |
| Clarity |
*----------*/
Example with IN
and a struct:
SELECT
(SELECT AS STRUCT Items.info) as item
FROM
Items
WHERE (info.shape, info.color) IN (('round', 'blue'));
/*------------------------------------*
| item |
+------------------------------------+
| { {blue color, round shape} info } |
*------------------------------------*/
IS
operators
IS operators return TRUE or FALSE for the condition they are testing. They never
return NULL
, even for NULL
inputs, unlike the IS_INF
and IS_NAN
functions defined in Mathematical Functions.
If NOT
is present, the output BOOL
value is
inverted.
Function Syntax | Input Data Type | Result Data Type | Description |
---|---|---|---|
X IS TRUE |
BOOL |
BOOL |
Evaluates to TRUE if X evaluates to
TRUE .
Otherwise, evaluates to FALSE .
|
X IS NOT TRUE |
BOOL |
BOOL |
Evaluates to FALSE if X evaluates to
TRUE .
Otherwise, evaluates to TRUE .
|
X IS FALSE |
BOOL |
BOOL |
Evaluates to TRUE if X evaluates to
FALSE .
Otherwise, evaluates to FALSE .
|
X IS NOT FALSE |
BOOL |
BOOL |
Evaluates to FALSE if X evaluates to
FALSE .
Otherwise, evaluates to TRUE .
|
X IS NULL |
Any value type | BOOL |
Evaluates to TRUE if X evaluates to
NULL .
Otherwise evaluates to FALSE .
|
X IS NOT NULL |
Any value type | BOOL |
Evaluates to FALSE if X evaluates to
NULL .
Otherwise evaluates to TRUE .
|
X IS UNKNOWN |
BOOL |
BOOL |
Evaluates to TRUE if X evaluates to
NULL .
Otherwise evaluates to FALSE .
|
X IS NOT UNKNOWN |
BOOL |
BOOL |
Evaluates to FALSE if X evaluates to
NULL .
Otherwise, evaluates to TRUE .
|
IS DISTINCT FROM
operator
expression_1 IS [NOT] DISTINCT FROM expression_2
Description
IS DISTINCT FROM
returns TRUE
if the input values are considered to be
distinct from each other by the DISTINCT
and
GROUP BY
clauses. Otherwise, returns FALSE
.
a IS DISTINCT FROM b
being TRUE
is equivalent to:
SELECT COUNT(DISTINCT x) FROM UNNEST([a,b]) x
returning2
.SELECT * FROM UNNEST([a,b]) x GROUP BY x
returning 2 rows.
a IS DISTINCT FROM b
is equivalent to NOT (a = b)
, except for the
following cases:
- This operator never returns
NULL
soNULL
values are considered to be distinct from non-NULL
values, not otherNULL
values. NaN
values are considered to be distinct from non-NaN
values, but not otherNaN
values.
You can use this operation with fields in a complex data type, but not on the complex data types themselves. These complex data types cannot be compared directly:
STRUCT
ARRAY
Input values:
expression_1
: The first value to compare. This can be a groupable data type,NULL
orNaN
.expression_2
: The second value to compare. This can be a groupable data type,NULL
orNaN
.NOT
: If present, the outputBOOL
value is inverted.
Return type
BOOL
Examples
These return TRUE
:
SELECT 1 IS DISTINCT FROM 2
SELECT 1 IS DISTINCT FROM NULL
SELECT 1 IS NOT DISTINCT FROM 1
SELECT NULL IS NOT DISTINCT FROM NULL
These return FALSE
:
SELECT NULL IS DISTINCT FROM NULL
SELECT 1 IS DISTINCT FROM 1
SELECT 1 IS NOT DISTINCT FROM 2
SELECT 1 IS NOT DISTINCT FROM NULL
LIKE
operator
expression_1 [NOT] LIKE expression_2
Description
LIKE
returns TRUE
if the string in the first operand expression_1
matches a pattern specified by the second operand expression_2
,
otherwise returns FALSE
.
NOT LIKE
returns TRUE
if the string in the first operand expression_1
does not match a pattern specified by the second operand expression_2
,
otherwise returns FALSE
.
Expressions can contain these characters:
- A percent sign (
%
) matches any number of characters or bytes. - An underscore (
_
) matches a single character or byte. - You can escape
\
,_
, or%
using two backslashes. For example,\\%
. If you are using raw strings, only a single backslash is required. For example,r'\%'
.
This operator supports collation, but caveats apply:
- Each
%
character inexpression_2
represents an arbitrary string specifier. An arbitrary string specifier can represent any sequence of0
or more characters. A character in the expression represents itself and is considered a single character specifier unless:
The character is a percent sign (
%
).The character is an underscore (
_
) and the collator is notund:ci
.
These additional rules apply to the underscore (
_
) character:If the collator is not
und:ci
, an error is produced when an underscore is not escaped inexpression_2
.If the collator is not
und:ci
, the underscore is not allowed when the operands have collation specified.Some compatibility composites, such as the fi-ligature (
fi
) and the telephone sign (℡
), will produce a match if they are compared to an underscore.A single underscore matches the idea of what a character is, based on an approximation known as a grapheme cluster.
For a contiguous sequence of single character specifiers, equality depends on the collator and its language tags and tailoring.
By default, the
und:ci
collator does not fully normalize a string. Some canonically equivalent strings are considered unequal for both the=
andLIKE
operators.The
LIKE
operator with collation has the same behavior as the=
operator when there are no wildcards in the strings.Character sequences with secondary or higher-weighted differences are considered unequal. This includes accent differences and some special cases.
For example there are three ways to produce German sharp
ß
:\u1E9E
\U00DF
ss
\u1E9E
and\U00DF
are considered equal but differ in tertiary. They are considered equal withund:ci
collation but different fromss
, which has secondary differences.Character sequences with tertiary or lower-weighted differences are considered equal. This includes case differences and kana subtype differences, which are considered equal.
There are ignorable characters defined in Unicode. Ignorable characters are ignored in the pattern matching.
Return type
BOOL
Examples
The following examples illustrate how you can check to see if the string in the first operand matches a pattern specified by the second operand.
-- Returns TRUE
SELECT 'apple' LIKE 'a%';
-- Returns FALSE
SELECT '%a' LIKE 'apple';
-- Returns FALSE
SELECT 'apple' NOT LIKE 'a%';
-- Returns TRUE
SELECT '%a' NOT LIKE 'apple';
-- Produces an error
SELECT NULL LIKE 'a%';
-- Produces an error
SELECT 'apple' LIKE NULL;
The following example illustrates how to search multiple patterns in an array
to find a match with the LIKE
operator:
WITH Words AS
(SELECT 'Intend with clarity.' as value UNION ALL
SELECT 'Secure with intention.' UNION ALL
SELECT 'Clarity and security.')
SELECT value
FROM Words WHERE
EXISTS(
SELECT value FROM UNNEST(['%ity%', '%and%']) AS pattern
WHERE value LIKE pattern
);
/*------------------------+
| value |
+------------------------+
| Intend with clarity. |
| Clarity and security. |
+------------------------*/
The following examples illustrate how collation can be used with the LIKE
operator.
-- Returns FALSE
'Foo' LIKE '%foo%'
-- Returns TRUE
COLLATE('Foo', 'und:ci') LIKE COLLATE('%foo%', 'und:ci');
-- Returns TRUE
COLLATE('Foo', 'und:ci') = COLLATE('foo', 'und:ci');
-- Produces an error
COLLATE('Foo', 'und:ci') LIKE COLLATE('%foo%', 'binary');
-- Produces an error
COLLATE('Foo', 'und:ci') LIKE COLLATE('%f_o%', 'und:ci');
-- Returns TRUE
COLLATE('Foo_', 'und:ci') LIKE COLLATE('%foo\\_%', 'und:ci');
There are two capital forms of ß
. We can use either SS
or ẞ
as upper
case. While the difference between ß
and ẞ
is case difference (tertiary
difference), the difference between sharp s
and ss
is secondary and
considered not equal using the und:ci
collator. For example:
-- Returns FALSE
'MASSE' LIKE 'Maße';
-- Returns FALSE
COLLATE('MASSE', 'und:ci') LIKE '%Maße%';
-- Returns FALSE
COLLATE('MASSE', 'und:ci') = COLLATE('Maße', 'und:ci');
The kana differences in Japanese are considered as tertiary or quaternary
differences, and should be considered as equal in the und:ci
collator with
secondary strength.
'\u3042'
is'あ'
(hiragana)'\u30A2'
is'ア'
(katakana)
For example:
-- Returns FALSE
'\u3042' LIKE '%\u30A2%';
-- Returns TRUE
COLLATE('\u3042', 'und:ci') LIKE COLLATE('%\u30A2%', 'und:ci');
-- Returns TRUE
COLLATE('\u3042', 'und:ci') = COLLATE('\u30A2', 'und:ci');
When comparing two strings, the und:ci
collator compares the collation units
based on the specification of the collation. Even though the number of
code points is different, the two strings are considered equal when the
collation units are considered the same.
'\u0041\u030A'
is'Å'
(two code points)'\u0061\u030A'
is'å'
(two code points)'\u00C5'
is'Å'
(one code point)
In the following examples, the difference between '\u0061\u030A'
and
'\u00C5'
is tertiary.
-- Returns FALSE
'\u0061\u030A' LIKE '%\u00C5%';
-- Returns TRUE
COLLATE('\u0061\u030A', 'und:ci') LIKE '%\u00C5%';
-- Returns TRUE
COLLATE('\u0061\u030A', 'und:ci') = COLLATE('\u00C5', 'und:ci');
In the following example, '\u0083'
is a NO BREAK HERE
character and
is ignored.
-- Returns FALSE
'\u0083' LIKE '';
-- Returns TRUE
COLLATE('\u0083', 'und:ci') LIKE '';
Quantified LIKE
operator
The quantified LIKE
operator supports the following syntax:
search_value [NOT] LIKE quantifier patterns
quantifier:
{ ANY | SOME | ALL }
patterns:
{
pattern_expression_list
| pattern_array
}
pattern_expression_list:
(expression[, ...])
pattern_array:
UNNEST(array_expression)
Description
Checks search_value
for matches against several patterns. Each comparison is
case-sensitive. Wildcard searches are supported.
Semantic rules apply, but in general, LIKE
returns TRUE
if a matching pattern is found, FALSE
if a matching pattern
is not found, or otherwise NULL
. NOT LIKE
returns FALSE
if a
matching pattern is found, TRUE
if a matching pattern is not found, or
otherwise NULL
.
search_value
: The value to search for matching patterns. This value can be aSTRING
orBYTES
type.patterns
: The patterns to look for in the search value. Each pattern must resolve to the same type assearch_value
.pattern_expression_list
: A list of one or more patterns that match thesearch_value
type.pattern_array
: AnUNNEST
operation that returns a column of values with the same type assearch_value
from an array expression.
The regular expressions that are supported by the
LIKE
operator are also supported bypatterns
in the quantifiedLIKE
operator.quantifier
: Condition for pattern matching.ANY
: Checks if the set of patterns contains at least one pattern that matches the search value.SOME
: Synonym forANY
.ALL
: Checks if every pattern in the set of patterns matches the search value.
Collation caveats
Collation is supported, but with the following caveats:
- The collation caveats that apply to the
LIKE
operator also apply to the quantifiedLIKE
operator. - If a collation-supported input contains no collation specification or an empty collation specification and another input contains an explicitly defined collation, the explicitly defined collation is used for all of the inputs.
- All inputs with a non-empty, explicitly defined collation specification must have the same type of collation specification, otherwise an error is thrown.
Semantics rules
When using the quantified LIKE
operator with ANY
or SOME
, the
following semantics apply in this order:
- Returns
FALSE
ifpatterns
is empty. - Returns
NULL
ifsearch_value
isNULL
. - Returns
TRUE
ifsearch_value
matches at least one value inpatterns
. - Returns
NULL
if a pattern inpatterns
isNULL
and other patterns inpatterns
don't match. - Returns
FALSE
.
When using the quantified LIKE
operator with ALL
, the following semantics
apply in this order:
- For
pattern_array
, returnsFALSE
ifpatterns
is empty. - Returns
NULL
ifsearch_value
isNULL
. - Returns
TRUE
ifsearch_value
matches all values inpatterns
. - Returns
NULL
if a pattern inpatterns
isNULL
and other patterns inpatterns
don't match. - Returns
FALSE
.
When using the quantified NOT LIKE
operator with ANY
or SOME
, the
following semantics apply in this order:
- For
pattern_array
, returnsTRUE
ifpatterns
is empty. - Returns
NULL
ifsearch_value
isNULL
. - Returns
TRUE
ifsearch_value
doesn't match at least one value inpatterns
. - Returns
NULL
if a pattern inpatterns
isNULL
and other patterns inpatterns
don't match. - Returns
FALSE
.
When using the quantified NOT LIKE
operator with ALL
, the following
semantics apply in this order:
- For
pattern_array
, returnsTRUE
ifpatterns
is empty. - Returns
NULL
ifsearch_value
isNULL
. - Returns
TRUE
ifsearch_value
matches none of the values inpatterns
. - Returns
NULL
if a pattern inpatterns
isNULL
and other patterns inpatterns
don't match. - Returns
FALSE
.
Details
Some computation limitations apply. For more information, see Quotas and limits.
Return Data Type
BOOL
Examples
The following example checks to see if the Intend%
or %intention%
pattern exists in a value and produces that value if either pattern is found:
WITH Words AS
(SELECT 'Intend with clarity.' as value UNION ALL
SELECT 'Secure with intention.' UNION ALL
SELECT 'Clarity and security.')
SELECT * FROM Words WHERE value LIKE ANY ('Intend%', '%intention%');
/*------------------------+
| value |
+------------------------+
| Intend with clarity. |
| Secure with intention. |
+------------------------*/
The following example checks to see if the %ity%
pattern exists in a value and produces that value if the pattern is found.
Example with LIKE ALL
:
WITH Words AS
(SELECT 'Intend with clarity.' as value UNION ALL
SELECT 'Secure with intention.' UNION ALL
SELECT 'Clarity and security.')
SELECT * FROM Words WHERE value LIKE ALL ('%ity%');
/*-----------------------+
| value |
+-----------------------+
| Intend with clarity. |
| Clarity and security. |
+-----------------------*/
The following example checks to see if the %ity%
pattern exists in a value produces that value if the pattern
is not found:
WITH Words AS
(SELECT 'Intend with clarity.' as value UNION ALL
SELECT 'Secure with intention.' UNION ALL
SELECT 'Clarity and security.')
SELECT * FROM Words WHERE value NOT LIKE ('%ity%');
/*------------------------+
| value |
+------------------------+
| Secure with intention. |
+------------------------*/
You can pass in an array for patterns
. For example:
WITH Words AS
(SELECT 'Intend with clarity.' as value UNION ALL
SELECT 'Secure with intention.' UNION ALL
SELECT 'Clarity and security.')
SELECT * FROM Words WHERE value LIKE ANY UNNEST(['%ion%', '%and%']);
/*------------------------+
| value |
+------------------------+
| Secure with intention. |
| Clarity and security. |
+------------------------*/
The following queries illustrate some of the semantic rules for the
quantified LIKE
operator:
SELECT
NULL LIKE ANY ('a', 'b'), -- NULL
'a' LIKE ANY ('a', 'c'), -- TRUE
'a' LIKE ANY ('b', 'c'), -- FALSE
'a' LIKE ANY ('a', NULL), -- TRUE
'a' LIKE ANY ('b', NULL), -- NULL
NULL NOT LIKE ANY ('a', 'b'), -- NULL
'a' NOT LIKE ANY ('a', 'b'), -- TRUE
'a' NOT LIKE ANY ('a', '%a%'), -- FALSE
'a' NOT LIKE ANY ('a', NULL), -- NULL
'a' NOT LIKE ANY ('b', NULL); -- TRUE
SELECT
NULL LIKE SOME ('a', 'b'), -- NULL
'a' LIKE SOME ('a', 'c'), -- TRUE
'a' LIKE SOME ('b', 'c'), -- FALSE
'a' LIKE SOME ('a', NULL), -- TRUE
'a' LIKE SOME ('b', NULL), -- NULL
NULL NOT LIKE SOME ('a', 'b'), -- NULL
'a' NOT LIKE SOME ('a', 'b'), -- TRUE
'a' NOT LIKE SOME ('a', '%a%'), -- FALSE
'a' NOT LIKE SOME ('a', NULL), -- NULL
'a' NOT LIKE SOME ('b', NULL); -- TRUE
SELECT
NULL LIKE ALL ('a', 'b'), -- NULL
'a' LIKE ALL ('a', '%a%'), -- TRUE
'a' LIKE ALL ('a', 'c'), -- FALSE
'a' LIKE ALL ('a', NULL), -- NULL
'a' LIKE ALL ('b', NULL), -- FALSE
NULL NOT LIKE ALL ('a', 'b'), -- NULL
'a' NOT LIKE ALL ('b', 'c'), -- TRUE
'a' NOT LIKE ALL ('a', 'c'), -- FALSE
'a' NOT LIKE ALL ('a', NULL), -- FALSE
'a' NOT LIKE ALL ('b', NULL); -- NULL
The following queries illustrate some of the semantic rules for the
quantified LIKE
operator and collation:
SELECT
COLLATE('a', 'und:ci') LIKE ALL ('a', 'A'), -- TRUE
'a' LIKE ALL (COLLATE('a', 'und:ci'), 'A'), -- TRUE
'a' LIKE ALL ('%A%', COLLATE('a', 'und:ci')); -- TRUE
-- ERROR: BYTES and STRING values can't be used together.
SELECT b'a' LIKE ALL (COLLATE('a', 'und:ci'), 'A');
Concatenation operator
The concatenation operator combines multiple values into one.
Function Syntax | Input Data Type | Result Data Type |
---|---|---|
STRING || STRING [ || ... ] |
STRING |
STRING |
BYTES || BYTES [ || ... ] |
BYTES |
BYTES |
ARRAY<T> || ARRAY<T> [ || ... ] |
ARRAY<T> |
ARRAY<T> |
Conditional expressions
GoogleSQL for BigQuery supports conditional expressions. Conditional expressions impose constraints on the evaluation order of their inputs. In essence, they are evaluated left to right, with short-circuiting, and only evaluate the output value that was chosen. In contrast, all inputs to regular functions are evaluated before calling the function. Short-circuiting in conditional expressions can be exploited for error handling or performance tuning.
Expression list
Name | Summary |
---|---|
CASE expr
|
Compares the given expression to each successive WHEN clause
and produces the first result where the values are equal.
|
CASE
|
Evaluates the condition of each successive WHEN clause and
produces the first result where the condition evaluates to
TRUE .
|
COALESCE
|
Produces the value of the first non-NULL expression, if any,
otherwise NULL .
|
IF
|
If an expression evaluates to TRUE , produces a specified
result, otherwise produces the evaluation for an else result.
|
IFNULL
|
If an expression evaluates to NULL , produces a specified
result, otherwise produces the expression.
|
NULLIF
|
Produces NULL if the first expression that matches another
evaluates to TRUE , otherwise returns the first expression.
|
CASE expr
CASE expr
WHEN expr_to_match THEN result
[ ... ]
[ ELSE else_result ]
END
Description
Compares expr
to expr_to_match
of each successive WHEN
clause and returns
the first result where this comparison evaluates to TRUE
. The remaining WHEN
clauses and else_result
aren't evaluated.
If the expr = expr_to_match
comparison evaluates to FALSE
or NULL
for all
WHEN
clauses, returns the evaluation of else_result
if present; if
else_result
isn't present, then returns NULL
.
Consistent with equality comparisons elsewhere, if both
expr
and expr_to_match
are NULL
, then expr = expr_to_match
evaluates to
NULL
, which returns else_result
. If a CASE statement needs to distinguish a
NULL
value, then the alternate CASE syntax should be used.
expr
and expr_to_match
can be any type. They must be implicitly
coercible to a common supertype; equality comparisons are
done on coerced values. There may be multiple result
types. result
and
else_result
expressions must be coercible to a common supertype.
This expression supports specifying collation.
Return Data Type
Supertype of result
[, ...] and else_result
.
Example
WITH Numbers AS (
SELECT 90 as A, 2 as B UNION ALL
SELECT 50, 8 UNION ALL
SELECT 60, 6 UNION ALL
SELECT 50, 10
)
SELECT
A,
B,
CASE A
WHEN 90 THEN 'red'
WHEN 50 THEN 'blue'
ELSE 'green'
END
AS result
FROM Numbers
/*------------------*
| A | B | result |
+------------------+
| 90 | 2 | red |
| 50 | 8 | blue |
| 60 | 6 | green |
| 50 | 10 | blue |
*------------------*/
CASE
CASE
WHEN condition THEN result
[ ... ]
[ ELSE else_result ]
END
Description
Evaluates the condition of each successive WHEN
clause and returns the
first result where the condition evaluates to TRUE
; any remaining WHEN
clauses and else_result
aren't evaluated.
If all conditions evaluate to FALSE
or NULL
, returns evaluation of
else_result
if present; if else_result
isn't present, then returns NULL
.
For additional rules on how values are evaluated, see the three-valued logic table in Logical operators.
condition
must be a boolean expression. There may be multiple result
types.
result
and else_result
expressions must be implicitly coercible to a common
supertype.
This expression supports specifying collation.
Return Data Type
Supertype of result
[, ...] and else_result
.
Example
WITH Numbers AS (
SELECT 90 as A, 2 as B UNION ALL
SELECT 50, 6 UNION ALL
SELECT 20, 10
)
SELECT
A,
B,
CASE
WHEN A > 60 THEN 'red'
WHEN B = 6 THEN 'blue'
ELSE 'green'
END
AS result
FROM Numbers
/*------------------*
| A | B | result |
+------------------+
| 90 | 2 | red |
| 50 | 6 | blue |
| 20 | 10 | green |
*------------------*/
COALESCE
COALESCE(expr[, ...])
Description
Returns the value of the first non-NULL
expression, if any, otherwise
NULL
. The remaining expressions aren't evaluated. An input expression can be
any type. There may be multiple input expression types.
All input expressions must be implicitly coercible to a common
supertype.
Return Data Type
Supertype of expr
[, ...].
Examples
SELECT COALESCE('A', 'B', 'C') as result
/*--------*
| result |
+--------+
| A |
*--------*/
SELECT COALESCE(NULL, 'B', 'C') as result
/*--------*
| result |
+--------+
| B |
*--------*/
IF
IF(expr, true_result, else_result)
Description
If expr
evaluates to TRUE
, returns true_result
, else returns the
evaluation for else_result
. else_result
isn't evaluated if expr
evaluates
to TRUE
. true_result
isn't evaluated if expr
evaluates to FALSE
or
NULL
.
expr
must be a boolean expression. true_result
and else_result
must be coercible to a common supertype.
Return Data Type
Supertype of true_result
and else_result
.
Examples
SELECT
10 AS A,
20 AS B,
IF(10 < 20, 'true', 'false') AS result
/*------------------*
| A | B | result |
+------------------+
| 10 | 20 | true |
*------------------*/
SELECT
30 AS A,
20 AS B,
IF(30 < 20, 'true', 'false') AS result
/*------------------*
| A | B | result |
+------------------+
| 30 | 20 | false |
*------------------*/
IFNULL
IFNULL(expr, null_result)
Description
If expr
evaluates to NULL
, returns null_result
. Otherwise, returns
expr
. If expr
doesn't evaluate to NULL
, null_result
isn't evaluated.
expr
and null_result
can be any type and must be implicitly coercible to
a common supertype. Synonym for
COALESCE(expr, null_result)
.
Return Data Type
Supertype of expr
or null_result
.
Examples
SELECT IFNULL(NULL, 0) as result
/*--------*
| result |
+--------+
| 0 |
*--------*/
SELECT IFNULL(10, 0) as result
/*--------*
| result |
+--------+
| 10 |
*--------*/
NULLIF
NULLIF(expr, expr_to_match)
Description
Returns NULL
if expr = expr_to_match
evaluates to TRUE
, otherwise
returns expr
.
expr
and expr_to_match
must be implicitly coercible to a
common supertype, and must be comparable.
This expression supports specifying collation.
Return Data Type
Supertype of expr
and expr_to_match
.
Example
SELECT NULLIF(0, 0) as result
/*--------*
| result |
+--------+
| NULL |
*--------*/
SELECT NULLIF(10, 0) as result
/*--------*
| result |
+--------+
| 10 |
*--------*/
FUNCTIONS
AEAD encryption functions
GoogleSQL for BigQuery supports the following AEAD encryption functions. For a description of how the AEAD encryption functions work, see AEAD encryption concepts.
Function list
Name | Summary |
---|---|
AEAD.DECRYPT_BYTES
|
Uses the matching key from a keyset to decrypt a
BYTES ciphertext.
|
AEAD.DECRYPT_STRING
|
Uses the matching key from a keyset to decrypt a BYTES
ciphertext into a STRING plaintext.
|
AEAD.ENCRYPT
|
Encrypts STRING plaintext, using the primary cryptographic key
in a keyset.
|
DETERMINISTIC_DECRYPT_BYTES
|
Uses the matching key from a keyset to decrypt a BYTES
ciphertext, using deterministic AEAD.
|
DETERMINISTIC_DECRYPT_STRING
|
Uses the matching key from a keyset to decrypt a BYTES
ciphertext into a STRING plaintext, using deterministic AEAD.
|
DETERMINISTIC_ENCRYPT
|
Encrypts STRING plaintext, using the primary cryptographic key
in a keyset, using deterministic AEAD encryption.
|
KEYS.ADD_KEY_FROM_RAW_BYTES
|
Adds a key to a keyset, and return the new keyset as a serialized
BYTES value.
|
KEYS.KEYSET_CHAIN
|
Produces a Tink keyset that is encrypted with a Cloud KMS key. |
KEYS.KEYSET_FROM_JSON
|
Converts a STRING JSON keyset to a serialized
BYTES value.
|
KEYS.KEYSET_LENGTH
|
Gets the number of keys in the provided keyset. |
KEYS.KEYSET_TO_JSON
|
Gets a JSON STRING representation of a keyset.
|
KEYS.NEW_KEYSET
|
Gets a serialized keyset containing a new key based on the key type. |
KEYS.NEW_WRAPPED_KEYSET
|
Creates a new keyset and encrypts it with a Cloud KMS key. |
KEYS.REWRAP_KEYSET
|
Re-encrypts a wrapped keyset with a new Cloud KMS key. |
KEYS.ROTATE_KEYSET
|
Adds a new primary cryptographic key to a keyset, based on the key type. |
KEYS.ROTATE_WRAPPED_KEYSET
|
Rewraps a keyset and rotates it. |
AEAD.DECRYPT_BYTES
AEAD.DECRYPT_BYTES(keyset, ciphertext, additional_data)
Description
Uses the matching key from keyset
to decrypt ciphertext
and verifies the
integrity of the data using additional_data
. Returns an error if decryption or
verification fails.
keyset
is a serialized BYTES
value returned by one of the
KEYS
functions or a STRUCT
returned by
KEYS.KEYSET_CHAIN
. keyset
must contain the key that was used to
encrypt ciphertext
, and the key must be in an 'ENABLED'
state, or else the
function returns an error. AEAD.DECRYPT_BYTES
identifies the matching key
in keyset
by finding the key with the key ID that matches the one encrypted in
ciphertext
.
ciphertext
is a BYTES
value that is the result of
a call to AEAD.ENCRYPT
where the input plaintext
was of type
BYTES
.
If ciphertext
includes an initialization vector (IV),
it should be the first bytes of ciphertext
. If ciphertext
includes an
authentication tag, it should be the last bytes of ciphertext
. If the
IV and authentic tag are one (SIV), it should be the first bytes of
ciphertext
. The IV and authentication tag commonly require 16 bytes, but may
vary in size.
additional_data
is a STRING
or BYTES
value that binds the ciphertext to
its context. This forces the ciphertext to be decrypted in the same context in
which it was encrypted. This function casts any
STRING
value to BYTES
.
This must be the same as the additional_data
provided to AEAD.ENCRYPT
to
encrypt ciphertext
, ignoring its type, or else the function returns an error.
Return Data Type
BYTES
Example
This example creates a table of unique IDs with associated plaintext values and
keysets. Then it uses these keysets to encrypt the plaintext values as
BYTES
and store them in a new table. Finally, it
uses AEAD.DECRYPT_BYTES
to decrypt the encrypted values and display them as
plaintext.
The following statement creates a table CustomerKeysets
containing a column of
unique IDs, a column of AEAD_AES_GCM_256
keysets, and a column of favorite
animals.
CREATE TABLE aead.CustomerKeysets AS
SELECT
1 AS customer_id,
KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS keyset,
b'jaguar' AS favorite_animal
UNION ALL
SELECT
2 AS customer_id,
KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS keyset,
b'zebra' AS favorite_animal
UNION ALL
SELECT
3 AS customer_id,
KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS keyset,
b'nautilus' AS favorite_animal;
The following statement creates a table EncryptedCustomerData
containing a
column of unique IDs and a column of ciphertext. The statement encrypts the
plaintext favorite_animal
using the keyset value from CustomerKeysets
corresponding to each unique ID.
CREATE TABLE aead.EncryptedCustomerData AS
SELECT
customer_id,
AEAD.ENCRYPT(keyset, favorite_animal, CAST(CAST(customer_id AS STRING) AS BYTES))
AS encrypted_animal
FROM
aead.CustomerKeysets AS ck;
The following query uses the keysets in the CustomerKeysets
table to decrypt
data in the EncryptedCustomerData
table.
SELECT
ecd.customer_id,
AEAD.DECRYPT_BYTES(
(SELECT ck.keyset
FROM aead.CustomerKeysets AS ck
WHERE ecd.customer_id = ck.customer_id),
ecd.encrypted_animal,
CAST(CAST(customer_id AS STRING) AS BYTES)
) AS favorite_animal
FROM aead.EncryptedCustomerData AS ecd;
AEAD.DECRYPT_STRING
AEAD.DECRYPT_STRING(keyset, ciphertext, additional_data)
Description
Like AEAD.DECRYPT_BYTES
, but where additional_data
is
of type STRING
.
Return Data Type
STRING
AEAD.ENCRYPT
AEAD.ENCRYPT(keyset, plaintext, additional_data)
Description
Encrypts plaintext
using the primary cryptographic key in keyset
. The
algorithm of the primary key must be AEAD_AES_GCM_256
. Binds the ciphertext to
the context defined by additional_data
. Returns NULL
if any input is NULL
.
keyset
is a serialized BYTES
value returned by one of the
KEYS
functions or a STRUCT
returned by
KEYS.KEYSET_CHAIN
.
plaintext
is the STRING
or
BYTES
value to be encrypted.
additional_data
is a STRING
or BYTES
value that binds the ciphertext to
its context. This forces the ciphertext to be decrypted in the same context in
which it was encrypted. plaintext
and additional_data
must be of the same
type. AEAD.ENCRYPT(keyset, string1, string2)
is equivalent to
AEAD.ENCRYPT(keyset, CAST(string1 AS BYTES), CAST(string2 AS BYTES))
.
The output is ciphertext BYTES
. The ciphertext contains a
Tink-specific prefix indicating the key used to perform the encryption.
Return Data Type
BYTES
Example
The following query uses the keysets for each customer_id
in the
CustomerKeysets
table to encrypt the value of the plaintext favorite_animal
in the PlaintextCustomerData
table corresponding to that customer_id
. The
output contains a column of customer_id
values and a column of
corresponding ciphertext output as BYTES
.
WITH CustomerKeysets AS (
SELECT 1 AS customer_id, KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS keyset UNION ALL
SELECT 2, KEYS.NEW_KEYSET('AEAD_AES_GCM_256') UNION ALL
SELECT 3, KEYS.NEW_KEYSET('AEAD_AES_GCM_256')
), PlaintextCustomerData AS (
SELECT 1 AS customer_id, 'elephant' AS favorite_animal UNION ALL
SELECT 2, 'walrus' UNION ALL
SELECT 3, 'leopard'
)
SELECT
pcd.customer_id,
AEAD.ENCRYPT(
(SELECT keyset
FROM CustomerKeysets AS ck
WHERE ck.customer_id = pcd.customer_id),
pcd.favorite_animal,
CAST(pcd.customer_id AS STRING)
) AS encrypted_animal
FROM PlaintextCustomerData AS pcd;
DETERMINISTIC_DECRYPT_BYTES
DETERMINISTIC_DECRYPT_BYTES(keyset, ciphertext, additional_data)
Description
Uses the matching key from keyset
to decrypt ciphertext
and verifies the
integrity of the data using additional_data
. Returns an error if decryption
fails.
keyset
is a serialized BYTES
value or a STRUCT
value returned by one of the KEYS
functions. keyset
must contain
the key that was used to encrypt ciphertext
, the key must be in an 'ENABLED'
state, and the key must be of type DETERMINISTIC_AEAD_AES_SIV_CMAC_256
, or
else the function returns an error. DETERMINISTIC_DECRYPT_BYTES
identifies the
matching key in keyset
by finding the key with the key ID that matches the one
encrypted in ciphertext
.
ciphertext
is a BYTES
value that is the result of a call to
DETERMINISTIC_ENCRYPT
where the input plaintext
was of type BYTES
.
The ciphertext must follow Tink's wire format. The first
byte of ciphertext
should contain a Tink key version followed by a 4 byte key
hint. If ciphertext
includes an initialization vector (IV), it should be the
next bytes of ciphertext
. If ciphertext
includes an authentication tag, it
should be the last bytes of ciphertext
. If the IV and authentic tag are one
(SIV), it should be the first bytes of ciphertext
. The IV and authentication
tag commonly require 16 bytes, but may vary in size.
additional_data
is a STRING
or BYTES
value that binds the ciphertext to
its context. This forces the ciphertext to be decrypted in the same context in
which it was encrypted. This function casts any STRING
value to BYTES
. This
must be the same as the additional_data
provided to DETERMINISTIC_ENCRYPT
to
encrypt ciphertext
, ignoring its type, or else the function returns an error.
Return Data Type
BYTES
Example
This example creates a table of unique IDs with associated plaintext values and
keysets. Then it uses these keysets to encrypt the plaintext values as BYTES
and store them in a new table. Finally, it uses DETERMINISTIC_DECRYPT_BYTES
to
decrypt the encrypted values and display them as plaintext.
The following statement creates a table CustomerKeysets
containing a column of
unique IDs, a column of DETERMINISTIC_AEAD_AES_SIV_CMAC_256
keysets, and a
column of favorite animals.
CREATE TABLE deterministic.CustomerKeysets AS
SELECT
1 AS customer_id,
KEYS.NEW_KEYSET('DETERMINISTIC_AEAD_AES_SIV_CMAC_256') AS keyset,
b'jaguar' AS favorite_animal
UNION ALL
SELECT
2 AS customer_id,
KEYS.NEW_KEYSET('DETERMINISTIC_AEAD_AES_SIV_CMAC_256') AS keyset,
b'zebra' AS favorite_animal
UNION ALL
SELECT
3 AS customer_id,
KEYS.NEW_KEYSET('DETERMINISTIC_AEAD_AES_SIV_CMAC_256') AS keyset,
b'nautilus' AS favorite_animal;
The following statement creates a table EncryptedCustomerData
containing a
column of unique IDs and a column of ciphertext. The statement encrypts the
plaintext favorite_animal
using the keyset value from CustomerKeysets
corresponding to each unique ID.
CREATE TABLE deterministic.EncryptedCustomerData AS
SELECT
customer_id,
DETERMINISTIC_ENCRYPT(ck.keyset, favorite_animal, CAST(CAST(customer_id AS STRING) AS BYTES))
AS encrypted_animal
FROM
deterministic.CustomerKeysets AS ck;
The following query uses the keysets in the CustomerKeysets
table to decrypt
data in the EncryptedCustomerData
table.
SELECT
ecd.customer_id,
DETERMINISTIC_DECRYPT_BYTES(
(SELECT ck.keyset
FROM deterministic.CustomerKeysets AS ck
WHERE ecd.customer_id = ck.customer_id),
ecd.encrypted_animal,
CAST(CAST(ecd.customer_id AS STRING) AS BYTES)
) AS favorite_animal
FROM deterministic.EncryptedCustomerData AS ecd;
DETERMINISTIC_DECRYPT_STRING
DETERMINISTIC_DECRYPT_STRING(keyset, ciphertext, additional_data)
Description
Like DETERMINISTIC_DECRYPT_BYTES
, but where
plaintext
is of type STRING
.
Return Data Type
STRING
DETERMINISTIC_ENCRYPT
DETERMINISTIC_ENCRYPT(keyset, plaintext, additional_data)
Description
Encrypts plaintext
using the primary cryptographic key in keyset
using
deterministic AEAD. The algorithm of the primary key must
be DETERMINISTIC_AEAD_AES_SIV_CMAC_256
. Binds the ciphertext to the context
defined by additional_data
. Returns NULL
if any input is NULL
.
keyset
is a serialized BYTES
value or a STRUCT
value returned by one of the KEYS
functions.
plaintext
is the STRING
or BYTES
value to be encrypted.
additional_data
is a STRING
or BYTES
value that binds the ciphertext to
its context. This forces the ciphertext to be decrypted in the same context in
which it was encrypted. plaintext
and additional_data
must be of the same
type. DETERMINISTIC_ENCRYPT(keyset, string1, string2)
is equivalent to
DETERMINISTIC_ENCRYPT(keyset, CAST(string1 AS BYTES), CAST(string2 AS BYTES))
.
The output is ciphertext BYTES
. The ciphertext contains a
Tink-specific prefix indicating the key used to perform the encryption.
Given an identical keyset
and plaintext
, this function returns the same
ciphertext each time it is invoked (including across queries).
Return Data Type
BYTES
Example
The following query uses the keysets for each customer_id
in the
CustomerKeysets
table to encrypt the value of the plaintext favorite_animal
in the PlaintextCustomerData
table corresponding to that customer_id
. The
output contains a column of customer_id
values and a column of corresponding
ciphertext output as BYTES
.
WITH CustomerKeysets AS (
SELECT 1 AS customer_id,
KEYS.NEW_KEYSET('DETERMINISTIC_AEAD_AES_SIV_CMAC_256') AS keyset UNION ALL
SELECT 2, KEYS.NEW_KEYSET('DETERMINISTIC_AEAD_AES_SIV_CMAC_256') UNION ALL
SELECT 3, KEYS.NEW_KEYSET('DETERMINISTIC_AEAD_AES_SIV_CMAC_256')
), PlaintextCustomerData AS (
SELECT 1 AS customer_id, 'elephant' AS favorite_animal UNION ALL
SELECT 2, 'walrus' UNION ALL
SELECT 3, 'leopard'
)
SELECT
pcd.customer_id,
DETERMINISTIC_ENCRYPT(
(SELECT keyset
FROM CustomerKeysets AS ck
WHERE ck.customer_id = pcd.customer_id),
pcd.favorite_animal,
CAST(pcd.customer_id AS STRING)
) AS encrypted_animal
FROM PlaintextCustomerData AS pcd;
KEYS.ADD_KEY_FROM_RAW_BYTES
KEYS.ADD_KEY_FROM_RAW_BYTES(keyset, key_type, raw_key_bytes)
Description
Returns a serialized keyset as BYTES
with the
addition of a key to keyset
based on key_type
and raw_key_bytes
.
The primary cryptographic key remains the same as in keyset
. The expected
length of raw_key_bytes
depends on the value of key_type
. The following are
supported key_types
:
'AES_CBC_PKCS'
: Creates a key for AES decryption using cipher block chaining and PKCS padding.raw_key_bytes
is expected to be a raw keyBYTES
value of length 16, 24, or 32; these lengths have sizes of 128, 192, and 256 bits, respectively. GoogleSQL AEAD functions do not support keys of these types for encryption; instead, prefer'AEAD_AES_GCM_256'
or'AES_GCM'
keys.'AES_GCM'
: Creates a key for AES decryption or encryption using Galois/Counter Mode.raw_key_bytes
must be a raw keyBYTES
value of length 16 or 32; these lengths have sizes of 128 and 256 bits, respectively. When keys of this type are inputs toAEAD.ENCRYPT
, the output ciphertext does not have a Tink-specific prefix indicating which key was used as input.
Return Data Type
BYTES
Example
The following query creates a table of customer IDs along with raw key bytes,
called CustomerRawKeys
, and a table of unique IDs, called CustomerIds
. It
creates a new 'AEAD_AES_GCM_256'
keyset for each customer_id
; then it adds a
new key to each keyset, using the raw_key_bytes
value corresponding to that
customer_id
. The output is a table where each row contains a customer_id
and
a keyset in BYTES
, which contains the raw key added
using KEYS.ADD_KEY_FROM_RAW_BYTES.
WITH CustomerRawKeys AS (
SELECT 1 AS customer_id, b'0123456789012345' AS raw_key_bytes UNION ALL
SELECT 2, b'9876543210543210' UNION ALL
SELECT 3, b'0123012301230123'
), CustomerIds AS (
SELECT 1 AS customer_id UNION ALL
SELECT 2 UNION ALL
SELECT 3
)
SELECT
ci.customer_id,
KEYS.ADD_KEY_FROM_RAW_BYTES(
KEYS.NEW_KEYSET('AEAD_AES_GCM_256'),
'AES_CBC_PKCS',
(SELECT raw_key_bytes FROM CustomerRawKeys AS crk
WHERE crk.customer_id = ci.customer_id)
) AS keyset
FROM CustomerIds AS ci;
The output keysets each contain two things: the primary cryptographic key
created using KEYS.NEW_KEYSET('AEAD_AES_GCM_256')
, and the raw key added using
KEYS.ADD_KEY_FROM_RAW_BYTES
. If a keyset in the output is used with
AEAD.ENCRYPT
, GoogleSQL uses the primary cryptographic key created
using KEYS.NEW_KEYSET('AEAD_AES_GCM_256')
to encrypt the input plaintext. If
the keyset is used with AEAD.DECRYPT_STRING
or AEAD.DECRYPT_BYTES
,
GoogleSQL returns the resulting plaintext if either key succeeds in
decrypting the ciphertext.
KEYS.KEYSET_CHAIN
KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset)
Description
Can be used in place of the keyset
argument to the AEAD
and deterministic
encryption functions to pass a Tink keyset that is encrypted
with a Cloud KMS key. This function lets you use
other AEAD functions without including plaintext keys in a query.
This function takes the following arguments:
kms_resource_name
: ASTRING
literal that contains the resource path to the Cloud KMS key that's used to decryptfirst_level_keyset
. This key must reside in the same Cloud region where this function is executed. A Cloud KMS key looks like this:gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key
first_level_keyset
: ABYTES
literal that represents a keyset or wrapped keyset.
Return Data Type
STRUCT
Example
This example creates a table of example data, then shows how to encrypt that data using a wrapped (encrypted) keyset. Finally it shows how to query the encrypted version of the data.
The following statement creates a table RawCustomerData
containing a column of
customer ids and a column of favorite animals.
CREATE TABLE aead.RawCustomerData AS
SELECT
1 AS customer_id,
b'jaguar' AS favorite_animal
UNION ALL
SELECT
2 AS customer_id,
b'zebra' AS favorite_animal
UNION ALL
SELECT
3 AS customer_id,
b'zebra' AS favorite_animal;
The following statement creates a table EncryptedCustomerData
containing a
column of unique IDs and a column of ciphertext. The statement encrypts the
plaintext favorite_animal
using the first_level_keyset provided.
DECLARE kms_resource_name STRING;
DECLARE first_level_keyset BYTES;
SET kms_resource_name = 'gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key';
SET first_level_keyset = b'\012\044\000\107\275\360\176\264\206\332\235\215\304...';
CREATE TABLE aead.EncryptedCustomerData AS
SELECT
customer_id,
AEAD.ENCRYPT(
KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
favorite_animal,
CAST(CAST(customer_id AS STRING) AS BYTES)
) AS encrypted_animal
FROM
aead.RawCustomerData;
The following query uses the first_level_keyset to decrypt data in the
EncryptedCustomerData
table.
DECLARE kms_resource_name STRING;
DECLARE first_level_keyset BYTES;
SET kms_resource_name = 'gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key';
SET first_level_keyset = b'\012\044\000\107\275\360\176\264\206\332\235\215\304...';
SELECT
customer_id,
AEAD.DECRYPT_BYTES(
KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
encrypted_animal,
CAST(CAST(customer_id AS STRING) AS BYTES)
) AS favorite_animal
FROM
aead.EncryptedCustomerData;
The previous two steps also work with the DETERMINISTIC_ENCRYPT
and
DETERMINISTIC_DECRYPT_BYTES
functions. The wrapped keyset must be created
using the DETERMINISTIC_AEAD_AES_SIV_CMAC_256
type.
The following statement creates a table EncryptedCustomerData
containing a
column of unique IDs and a column of ciphertext. The statement encrypts the
plaintext favorite_animal
using the first_level_keyset provided. You can see
that the ciphertext for favorite_animal
is the same for customers 2 and 3
since their plaintext favorite_animal
is the same.
DECLARE kms_resource_name STRING;
DECLARE first_level_keyset BYTES;
SET kms_resource_name = 'gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key';
SET first_level_keyset = b'\012\044\000\107\275\360\176\264\206\332\235\215\304...';
CREATE TABLE daead.EncryptedCustomerData AS
SELECT
customer_id,
DETERMINISTC_ENCRYPT(
KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
favorite_animal,
CAST(CAST(customer_id AS STRING) AS BYTES)
) AS encrypted_animal
FROM
daead.RawCustomerData;
The following query uses the first_level_keyset to decrypt data in the
EncryptedCustomerData
table.
DECLARE kms_resource_name STRING;
DECLARE first_level_keyset BYTES;
SET kms_resource_name = 'gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key';
SET first_level_keyset = b'\012\044\000\107\275\360\176\264\206\332\235\215\304...';
SELECT
customer_id,
DETERMINISTIC_DECRYPT_BYTES(
KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset),
encrypted_animal,
CAST(CAST(customer_id AS STRING) AS BYTES)
) AS favorite_animal
FROM dead.EncryptedCustomerData;
KEYS.KEYSET_FROM_JSON
KEYS.KEYSET_FROM_JSON(json_keyset)
Description
Returns the input json_keyset
STRING
as
serialized BYTES
, which is a valid input for other
KEYS
and AEAD
functions. The JSON STRING
must
be compatible with the definition of the
google.crypto.tink.Keyset
protocol buffer message: the JSON keyset should be a JSON object containing
objects and name-value pairs corresponding to those in the "keyset" message in
the google.crypto.tink.Keyset definition. You can convert the output serialized
BYTES
representation back to a JSON
STRING
using KEYS.KEYSET_TO_JSON
.
Return Data Type
BYTES
Example
KEYS.KEYSET_FROM_JSON
takes JSON-formatted STRING
values like the following:
{
"key":[
{
"keyData":{
"keyMaterialType":"SYMMETRIC",
"typeUrl":"type.googleapis.com/google.crypto.tink.AesGcmKey",
"value":"GiD80Z8kL6AP3iSNHhqseZGAIvq7TVQzClT7FQy8YwK3OQ=="
},
"keyId":3101427138,
"outputPrefixType":"TINK",
"status":"ENABLED"
}
],
"primaryKeyId":3101427138
}
The following query creates a new keyset from a JSON-formatted
STRING
json_keyset
:
SELECT KEYS.KEYSET_FROM_JSON(json_keyset);
This returns the json_keyset
serialized as BYTES
, like the following:
\x08\x9d\x8e\x85\x82\x09\x12d\x0aX\x0a0
type.googleapis.com/google.crypto.tink.AesGcmKey\x12\"\x1a qX\xe4IG\x87\x1f\xde
\xe3)+e\x98\x0a\x1c}\xfe\x88<\x12\xeb\xc1t\xb8\x83\x1a\xcd\xa8\x97\x84g\x18\x01
\x10\x01\x18\x9d\x8e\x85\x82\x09 \x01
KEYS.KEYSET_LENGTH
KEYS.KEYSET_LENGTH(keyset)
Description
Returns the number of keys in the provided keyset.
Return Data Type
INT64
Example
This example references a JSON-formatted STRING
called json_keyset
that contains two keys:
{
"primaryKeyId":1354994251,
"key":[
{
"keyData":{
"keyMaterialType":"SYMMETRIC",
"typeUrl":"type.googleapis.com/google.crypto.tink.AesGcmKey",
"value":"GiD9sxQRgFj4aYN78vaIlxInjZkG/uvyWSY9a8GN+ELV2Q=="
},
"keyId":1354994251,
"outputPrefixType":"TINK",
"status":"ENABLED"
}
],
"key":[
{
"keyData":{
"keyMaterialType":"SYMMETRIC",
"typeUrl":"type.googleapis.com/google.crypto.tink.AesGcmKey",
"value":"PRn76sxQRgFj4aYN00vaIlxInjZkG/uvyWSY9a2bLRm"
},
"keyId":852264701,
"outputPrefixType":"TINK",
"status":"DISABLED"
}
]
}
The following query converts json_keyset
to a keyset and then returns
the number of keys in the keyset:
SELECT KEYS.KEYSET_LENGTH(KEYS.KEYSET_FROM_JSON(json_keyset)) as key_count;
/*-----------*
| key_count |
+-----------+
| 2 |
*-----------*/
KEYS.KEYSET_TO_JSON
KEYS.KEYSET_TO_JSON(keyset)
Description
Returns a JSON STRING
representation of the input
keyset
. The returned JSON STRING
is compatible
with the definition of the
google.crypto.tink.Keyset
protocol buffer message. You can convert the JSON
STRING
representation back to
BYTES
using KEYS.KEYSET_FROM_JSON
.
Return Data Type
STRING
Example
The following query returns a new 'AEAD_AES_GCM_256'
keyset as a
JSON-formatted STRING
.
SELECT KEYS.KEYSET_TO_JSON(KEYS.NEW_KEYSET('AEAD_AES_GCM_256'));
The result is a STRING
like the following.
{
"key":[
{
"keyData":{
"keyMaterialType":"SYMMETRIC",
"typeUrl":"type.googleapis.com/google.crypto.tink.AesGcmKey",
"value":"GiD80Z8kL6AP3iSNHhqseZGAIvq7TVQzClT7FQy8YwK3OQ=="
},
"keyId":3101427138,
"outputPrefixType":"TINK",
"status":"ENABLED"
}
],
"primaryKeyId":3101427138
}
KEYS.NEW_KEYSET
KEYS.NEW_KEYSET(key_type)
Description
Returns a serialized keyset containing a new key based on key_type
. The
returned keyset is a serialized BYTES
representation of
google.crypto.tink.Keyset
that contains a primary cryptographic key and no additional keys. You can use
the keyset with the AEAD.ENCRYPT
, AEAD.DECRYPT_BYTES
, and
AEAD.DECRYPT_STRING
functions for encryption and decryption, as well as with
the KEYS
group of key- and keyset-related functions.
key_type
is a STRING
literal representation of the type of key to create.
key_type
cannot be NULL
. key_type
can be:
AEAD_AES_GCM_256
: Creates a 256-bit key with the pseudo-random number generator provided by boringSSL. The key uses AES-GCM for encryption and decryption operations.DETERMINISTIC_AEAD_AES_SIV_CMAC_256
: Creates a 512-bitAES-SIV-CMAC
key, which contains a 256-bitAES-CTR
key and 256-bitAES-CMAC
key. TheAES-SIV-CMAC
key is created with the pseudo-random number generator provided by boringSSL. The key uses AES-SIV for encryption and decryption operations.
Return Data Type
BYTES
Example
The following query creates a keyset for each row in CustomerIds
, which can
subsequently be used to encrypt data. Each keyset contains a single encryption
key with randomly-generated key data. Each row in the output contains a
customer_id
and an 'AEAD_AES_GCM_256'
key in
BYTES
.
SELECT customer_id, KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS keyset
FROM (
SELECT 1 AS customer_id UNION ALL
SELECT 2 UNION ALL
SELECT 3
) AS CustomerIds;
KEYS.NEW_WRAPPED_KEYSET
KEYS.NEW_WRAPPED_KEYSET(kms_resource_name, key_type)
Description
Creates a new keyset and encrypts it with a
Cloud KMS key.
Returns the wrapped keyset as a BYTES
representation of google.crypto.tink.Keyset
that contains a primary cryptographic key and no additional keys.
This function takes the following arguments:
kms_resource_name
: ASTRING
literal representation of the Cloud KMS key.kms_resource_name
cannot beNULL
. The Cloud KMS key must reside in the same Cloud region where this function is executed. A Cloud KMS key looks like this:gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key
key_type
: ASTRING
literal representation of the keyset type.key_type
cannot beNULL
but can be one of the following values:AEAD_AES_GCM_256
: Creates a 256-bit key with the pseudo-random number generator provided by boringSSL. The key uses AES-GCM for encryption and decryption operations.DETERMINISTIC_AEAD_AES_SIV_CMAC_256
: Creates a 512-bitAES-SIV-CMAC
key, which contains a 256-bitAES-CTR
key and 256-bitAES-CMAC
key. TheAES-SIV-CMAC
key is created with the pseudo-random number generator provided by boringSSL. The key uses AES-SIV for encryption and decryption operations.
Return Data Type
BYTES
Example
Put the following variables above each example query that you run:
DECLARE kms_resource_name STRING;
SET kms_resource_name = 'gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key';
The following query creates a wrapped keyset, which contains the ciphertext produced by encrypting a Tink keyset with the specified Cloud KMS key. If you run the query multiple times, it generates multiple wrapped keysets, and each wrapped keyset is unique to each query that is run.
SELECT KEYS.NEW_WRAPPED_KEYSET(kms_resource_name, 'AEAD_AES_GCM_256');
Multiple calls to this function with the same arguments in one query
returns the same value. For example, the following query only creates one
wrapped keyset and returns it for each row in a table called my_table
.
SELECT
*,
KEYS.NEW_WRAPPED_KEYSET(kms_resource_name, 'AEAD_AES_GCM_256')
FROM my_table
KEYS.REWRAP_KEYSET
KEYS.REWRAP_KEYSET(source_kms_resource_name, target_kms_resource_name, wrapped_keyset)
Description
Re-encrypts a wrapped keyset with a new
Cloud KMS key. Returns the wrapped keyset as a
BYTES
representation of google.crypto.tink.Keyset
that contains a primary cryptographic key and no additional keys.
When this function is used, a wrapped keyset is decrypted by
source_kms_resource_name
and then re-encrypted by target_kms_resource_name
.
During this process, the decrypted keyset is never visible to customers.
This function takes the following arguments:
source_kms_resource_name
: ASTRING
literal representation of the Cloud KMS key you want to replace. This key must reside in the same Cloud region where this function is executed. A Cloud KMS key looks like this:gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key
target_kms_resource_name
: ASTRING
literal representation of the new Cloud KMS key that you want to use.wrapped_keyset
: ABYTES
literal representation of the keyset that you want to re-encrypt.
Return Data Type
BYTES
Example
Put the following variables above each example query that you run:
DECLARE source_kms_resource_name STRING;
DECLARE target_kms_resource_name STRING;
DECLARE wrapped_keyset BYTES;
SET source_kms_resource_name = 'gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key';
SET target_kms_resource_name = 'gcp-kms://projects/my-project/locations/another-location/keyRings/my-key-ring/cryptoKeys/my-other-crypto-key';
SET wrapped_keyset = b'\012\044\000\107\275\360\176\264\206\332\235\215\304...';
The following query rewraps a wrapped keyset. If you run the query multiple times, it generates multiple wrapped keysets, and each wrapped keyset is unique to each query that is run.
SELECT KEYS.REWRAP_KEYSET(source_kms_resource_name, target_kms_resource_name, wrapped_keyset);
Multiple calls to this function with the same arguments in one query
returns the same value. For example, the following query only creates one
wrapped keyset and returns it for each row in a table called my_table
.
SELECT
*,
KEYS.REWRAP_KEYSET(source_kms_resource_name, target_kms_resource_name, wrapped_keyset)
FROM my_table
KEYS.ROTATE_KEYSET
KEYS.ROTATE_KEYSET(keyset, key_type)
Description
Adds a new key to keyset
based on key_type
. This new key becomes the primary
cryptographic key of the new keyset. Returns the new keyset serialized as
BYTES
.
The old primary cryptographic key from the input keyset
remains an additional
key in the returned keyset.
The new key_type
must match the key type of existing keys in the keyset
.
Return Data Type
BYTES
Example
The following statement creates a table containing a column of unique
customer_id
values and 'AEAD_AES_GCM_256'
keysets. Then, it creates a new
primary cryptographic key within each keyset in the source table using
KEYS.ROTATE_KEYSET
. Each row in the output contains a customer_id
and an
'AEAD_AES_GCM_256'
keyset in BYTES
.
WITH ExistingKeysets AS (
SELECT 1 AS customer_id, KEYS.NEW_KEYSET('AEAD_AES_GCM_256') AS keyset
UNION ALL
SELECT 2, KEYS.NEW_KEYSET('AEAD_AES_GCM_256') UNION ALL
SELECT 3, KEYS.NEW_KEYSET('AEAD_AES_GCM_256')
)
SELECT customer_id, KEYS.ROTATE_KEYSET(keyset, 'AEAD_AES_GCM_256') AS keyset
FROM ExistingKeysets;
KEYS.ROTATE_WRAPPED_KEYSET
KEYS.ROTATE_WRAPPED_KEYSET(kms_resource_name, wrapped_keyset, key_type)
Description
Takes an existing wrapped keyset and returns a rotated and
rewrapped keyset. The returned wrapped keyset is a BYTES
representation of google.crypto.tink.Keyset.
When this function is used, the wrapped keyset is decrypted,
the new key is added, and then the keyset is re-encrypted. The primary
cryptographic key from the input wrapped_keyset
remains as an
additional key in the returned keyset. During this rotation process,
the decrypted keyset is never visible to customers.
This function takes the following arguments:
kms_resource_name
: ASTRING
literal representation of the Cloud KMS key that was used to wrap the wrapped keyset. The Cloud KMS key must reside in the same Cloud region where this function is executed. A Cloud KMS key looks like this:gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key
wrapped_keyset
: ABYTES
literal representation of the existing keyset that you want to work with.key_type
: ASTRING
literal representation of the keyset type. This must match the key type of existing keys inwrapped_keyset
.
Return Data Type
BYTES
Example
Put the following variables above each example query that you run:
DECLARE kms_resource_name STRING;
DECLARE wrapped_keyset BYTES;
SET kms_resource_name = 'gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key';
SET wrapped_keyset = b'\012\044\000\107\275\360\176\264\206\332\235\215\304...';
The following query rotates a wrapped keyset. If you run the query multiple times, it generates multiple wrapped keysets, and each wrapped keyset is unique to each query that is run.
SELECT KEYS.ROTATE_WRAPPED_KEYSET(kms_resource_name, wrapped_keyset, 'AEAD_AES_GCM_256');
Multiple calls to this function with the same arguments in one query
returns the same value. For example, the following query only creates one
wrapped keyset and returns it for each row in a table called my_table
.
SELECT
*,
KEYS.ROTATE_WRAPPED_KEYSET(kms_resource_name, wrapped_keyset, 'AEAD_AES_GCM_256')
FROM my_table
Aggregate functions
GoogleSQL for BigQuery supports the following general aggregate functions. To learn about the syntax for aggregate function calls, see Aggregate function calls.
Function list
Name | Summary |
---|---|
ANY_VALUE
|
Gets an expression for some row. |
ARRAY_AGG
|
Gets an array of values. |
ARRAY_CONCAT_AGG
|
Concatenates arrays and returns a single array as a result. |
AVG
|
Gets the average of non-NULL values.
|
BIT_AND
|
Performs a bitwise AND operation on an expression. |
BIT_OR
|
Performs a bitwise OR operation on an expression. |
BIT_XOR
|
Performs a bitwise XOR operation on an expression. |
COUNT
|
Gets the number of rows in the input, or the number of rows with an
expression evaluated to any value other than NULL .
|
COUNTIF
|
Gets the count of TRUE values for an expression.
|
GROUPING
|
Checks if a groupable value in the GROUP BY clause is
aggregated.
|
LOGICAL_AND
|
Gets the logical AND of all non-NULL expressions.
|
LOGICAL_OR
|
Gets the logical OR of all non-NULL expressions.
|
MAX
|
Gets the maximum non-NULL value.
|
MAX_BY
|
Synonym for ANY_VALUE(x HAVING MAX y) .
|
MIN
|
Gets the minimum non-NULL value.
|
MIN_BY
|
Synonym for ANY_VALUE(x HAVING MIN y) .
|
STRING_AGG
|
Concatenates non-NULL STRING or
BYTES values.
|
SUM
|
Gets the sum of non-NULL values.
|
ANY_VALUE
ANY_VALUE(
expression
[ HAVING { MAX | MIN } expression2 ]
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns expression
for some row chosen from the group. Which row is chosen is
nondeterministic, not random. Returns NULL
when the input produces no
rows. Returns NULL
when expression
or expression2
is
NULL
for all rows in the group.
ANY_VALUE
behaves as if IGNORE NULLS
is specified;
rows for which expression
is NULL
are not considered and won't be
selected.
If the HAVING
clause is included in the ANY_VALUE
function, the
OVER
clause can't be used with this function.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
Any
Returned Data Types
Matches the input data type.
Examples
SELECT ANY_VALUE(fruit) as any_value
FROM UNNEST(["apple", "banana", "pear"]) as fruit;
/*-----------*
| any_value |
+-----------+
| apple |
*-----------*/
SELECT
fruit,
ANY_VALUE(fruit) OVER (ORDER BY LENGTH(fruit) ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS any_value
FROM UNNEST(["apple", "banana", "pear"]) as fruit;
/*--------+-----------*
| fruit | any_value |
+--------+-----------+
| pear | pear |
| apple | pear |
| banana | apple |
*--------+-----------*/
WITH
Store AS (
SELECT 20 AS sold, "apples" AS fruit
UNION ALL
SELECT 30 AS sold, "pears" AS fruit
UNION ALL
SELECT 30 AS sold, "bananas" AS fruit
UNION ALL
SELECT 10 AS sold, "oranges" AS fruit
)
SELECT ANY_VALUE(fruit HAVING MAX sold) AS a_highest_selling_fruit FROM Store;
/*-------------------------*
| a_highest_selling_fruit |
+-------------------------+
| pears |
*-------------------------*/
WITH
Store AS (
SELECT 20 AS sold, "apples" AS fruit
UNION ALL
SELECT 30 AS sold, "pears" AS fruit
UNION ALL
SELECT 30 AS sold, "bananas" AS fruit
UNION ALL
SELECT 10 AS sold, "oranges" AS fruit
)
SELECT ANY_VALUE(fruit HAVING MIN sold) AS a_lowest_selling_fruit FROM Store;
/*-------------------------*
| a_lowest_selling_fruit |
+-------------------------+
| oranges |
*-------------------------*/
ARRAY_AGG
ARRAY_AGG(
[ DISTINCT ]
expression
[ { IGNORE | RESPECT } NULLS ]
[ ORDER BY key [ { ASC | DESC } ] [, ... ] ]
[ LIMIT n ]
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns an ARRAY of expression
values.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
If this function is used with the OVER
clause, it's part of a
window function call. In a window function call,
aggregate function clauses can't be used.
To learn more about the OVER
clause and how to use it, see
Window function calls.
An error is raised if an array in the final query result contains a NULL
element.
Supported Argument Types
All data types except ARRAY.
Returned Data Types
ARRAY
If there are zero input rows, this function returns NULL
.
Examples
SELECT ARRAY_AGG(x) AS array_agg FROM UNNEST([2, 1,-2, 3, -2, 1, 2]) AS x;
/*-------------------------*
| array_agg |
+-------------------------+
| [2, 1, -2, 3, -2, 1, 2] |
*-------------------------*/
SELECT ARRAY_AGG(DISTINCT x) AS array_agg
FROM UNNEST([2, 1, -2, 3, -2, 1, 2]) AS x;
/*---------------*
| array_agg |
+---------------+
| [2, 1, -2, 3] |
*---------------*/
SELECT ARRAY_AGG(x IGNORE NULLS) AS array_agg
FROM UNNEST([NULL, 1, -2, 3, -2, 1, NULL]) AS x;
/*-------------------*
| array_agg |
+-------------------+
| [1, -2, 3, -2, 1] |
*-------------------*/
SELECT ARRAY_AGG(x ORDER BY ABS(x)) AS array_agg
FROM UNNEST([2, 1, -2, 3, -2, 1, 2]) AS x;
/*-------------------------*
| array_agg |
+-------------------------+
| [1, 1, 2, -2, -2, 2, 3] |
*-------------------------*/
SELECT ARRAY_AGG(x LIMIT 5) AS array_agg
FROM UNNEST([2, 1, -2, 3, -2, 1, 2]) AS x;
/*-------------------*
| array_agg |
+-------------------+
| [2, 1, -2, 3, -2] |
*-------------------*/
WITH vals AS
(
SELECT 1 x UNION ALL
SELECT -2 x UNION ALL
SELECT 3 x UNION ALL
SELECT -2 x UNION ALL
SELECT 1 x
)
SELECT ARRAY_AGG(DISTINCT x ORDER BY x) as array_agg
FROM vals;
/*------------*
| array_agg |
+------------+
| [-2, 1, 3] |
*------------*/
WITH vals AS
(
SELECT 1 x, 'a' y UNION ALL
SELECT 1 x, 'b' y UNION ALL
SELECT 2 x, 'a' y UNION ALL
SELECT 2 x, 'c' y
)
SELECT x, ARRAY_AGG(y) as array_agg
FROM vals
GROUP BY x;
/*---------------*
| x | array_agg |
+---------------+
| 1 | [a, b] |
| 2 | [a, c] |
*---------------*/
SELECT
x,
ARRAY_AGG(x) OVER (ORDER BY ABS(x)) AS array_agg
FROM UNNEST([2, 1, -2, 3, -2, 1, 2]) AS x;
/*----+-------------------------*
| x | array_agg |
+----+-------------------------+
| 1 | [1, 1] |
| 1 | [1, 1] |
| 2 | [1, 1, 2, -2, -2, 2] |
| -2 | [1, 1, 2, -2, -2, 2] |
| -2 | [1, 1, 2, -2, -2, 2] |
| 2 | [1, 1, 2, -2, -2, 2] |
| 3 | [1, 1, 2, -2, -2, 2, 3] |
*----+-------------------------*/
ARRAY_CONCAT_AGG
ARRAY_CONCAT_AGG(
expression
[ ORDER BY key [ { ASC | DESC } ] [, ... ] ]
[ LIMIT n ]
)
Description
Concatenates elements from expression
of type ARRAY
, returning a single
array as a result.
This function ignores NULL
input arrays, but respects the NULL
elements in
non-NULL
input arrays. An
error is raised, however, if an array in the final query result contains a
NULL
element. Returns NULL
if there are zero input rows or
expression
evaluates to NULL
for all rows.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
Supported Argument Types
ARRAY
Returned Data Types
ARRAY
Examples
SELECT FORMAT("%T", ARRAY_CONCAT_AGG(x)) AS array_concat_agg FROM (
SELECT [NULL, 1, 2, 3, 4] AS x
UNION ALL SELECT NULL
UNION ALL SELECT [5, 6]
UNION ALL SELECT [7, 8, 9]
);
/*-----------------------------------*
| array_concat_agg |
+-----------------------------------+
| [NULL, 1, 2, 3, 4, 5, 6, 7, 8, 9] |
*-----------------------------------*/
SELECT FORMAT("%T", ARRAY_CONCAT_AGG(x ORDER BY ARRAY_LENGTH(x))) AS array_concat_agg FROM (
SELECT [1, 2, 3, 4] AS x
UNION ALL SELECT [5, 6]
UNION ALL SELECT [7, 8, 9]
);
/*-----------------------------------*
| array_concat_agg |
+-----------------------------------+
| [5, 6, 7, 8, 9, 1, 2, 3, 4] |
*-----------------------------------*/
SELECT FORMAT("%T", ARRAY_CONCAT_AGG(x LIMIT 2)) AS array_concat_agg FROM (
SELECT [1, 2, 3, 4] AS x
UNION ALL SELECT [5, 6]
UNION ALL SELECT [7, 8, 9]
);
/*--------------------------*
| array_concat_agg |
+--------------------------+
| [1, 2, 3, 4, 5, 6] |
*--------------------------*/
SELECT FORMAT("%T", ARRAY_CONCAT_AGG(x ORDER BY ARRAY_LENGTH(x) LIMIT 2)) AS array_concat_agg FROM (
SELECT [1, 2, 3, 4] AS x
UNION ALL SELECT [5, 6]
UNION ALL SELECT [7, 8, 9]
);
/*------------------*
| array_concat_agg |
+------------------+
| [5, 6, 7, 8, 9] |
*------------------*/
AVG
AVG(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the average of non-NULL
values in an aggregated group.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
If this function is used with the OVER
clause, it's part of a
window function call. In a window function call,
aggregate function clauses can't be used.
To learn more about the OVER
clause and how to use it, see
Window function calls.
AVG
can be used with differential privacy. For more information, see
Differentially private aggregate functions.
Caveats:
- If the aggregated group is empty or the argument is
NULL
for all rows in the group, returnsNULL
. - If the argument is
NaN
for any row in the group, returnsNaN
. - If the argument is
[+|-]Infinity
for any row in the group, returns either[+|-]Infinity
orNaN
. - If there is numeric overflow, produces an error.
- If a floating-point type is returned, the result is non-deterministic, which means you might receive a different result each time you use this function.
Supported Argument Types
- Any numeric input type
INTERVAL
Returned Data Types
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 | INTERVAL |
---|---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 | INTERVAL |
Examples
SELECT AVG(x) as avg
FROM UNNEST([0, 2, 4, 4, 5]) as x;
/*-----*
| avg |
+-----+
| 3 |
*-----*/
SELECT AVG(DISTINCT x) AS avg
FROM UNNEST([0, 2, 4, 4, 5]) AS x;
/*------*
| avg |
+------+
| 2.75 |
*------*/
SELECT
x,
AVG(x) OVER (ORDER BY x ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS avg
FROM UNNEST([0, 2, NULL, 4, 4, 5]) AS x;
/*------+------*
| x | avg |
+------+------+
| NULL | NULL |
| 0 | 0 |
| 2 | 1 |
| 4 | 3 |
| 4 | 4 |
| 5 | 4.5 |
*------+------*/
BIT_AND
BIT_AND(
expression
)
Description
Performs a bitwise AND operation on expression
and returns the result.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
Supported Argument Types
- INT64
Returned Data Types
INT64
Examples
SELECT BIT_AND(x) as bit_and FROM UNNEST([0xF001, 0x00A1]) as x;
/*---------*
| bit_and |
+---------+
| 1 |
*---------*/
BIT_OR
BIT_OR(
expression
)
Description
Performs a bitwise OR operation on expression
and returns the result.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
Supported Argument Types
- INT64
Returned Data Types
INT64
Examples
SELECT BIT_OR(x) as bit_or FROM UNNEST([0xF001, 0x00A1]) as x;
/*--------*
| bit_or |
+--------+
| 61601 |
*--------*/
BIT_XOR
BIT_XOR(
[ DISTINCT ]
expression
)
Description
Performs a bitwise XOR operation on expression
and returns the result.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
Supported Argument Types
- INT64
Returned Data Types
INT64
Examples
SELECT BIT_XOR(x) AS bit_xor FROM UNNEST([5678, 1234]) AS x;
/*---------*
| bit_xor |
+---------+
| 4860 |
*---------*/
SELECT BIT_XOR(x) AS bit_xor FROM UNNEST([1234, 5678, 1234]) AS x;
/*---------*
| bit_xor |
+---------+
| 5678 |
*---------*/
SELECT BIT_XOR(DISTINCT x) AS bit_xor FROM UNNEST([1234, 5678, 1234]) AS x;
/*---------*
| bit_xor |
+---------+
| 4860 |
*---------*/
COUNT
1.
COUNT(*)
[OVER over_clause]
2.
COUNT(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
- Returns the number of rows in the input.
- Returns the number of rows with
expression
evaluated to any value other thanNULL
.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
To learn more about the OVER
clause and how to use it, see
Window function calls.
This function with DISTINCT supports specifying collation.
COUNT
can be used with differential privacy. For more information, see
Differentially private aggregate functions.
Supported Argument Types
expression
can be any data type. If
DISTINCT
is present, expression
can only be a data type that is
groupable.
Return Data Types
INT64
Examples
You can use the COUNT
function to return the number of rows in a table or the
number of distinct values of an expression. For example:
SELECT
COUNT(*) AS count_star,
COUNT(DISTINCT x) AS count_dist_x
FROM UNNEST([1, 4, 4, 5]) AS x;
/*------------+--------------*
| count_star | count_dist_x |
+------------+--------------+
| 4 | 3 |
*------------+--------------*/
SELECT
x,
COUNT(*) OVER (PARTITION BY MOD(x, 3)) AS count_star,
COUNT(DISTINCT x) OVER (PARTITION BY MOD(x, 3)) AS count_dist_x
FROM UNNEST([1, 4, 4, 5]) AS x;
/*------+------------+--------------*
| x | count_star | count_dist_x |
+------+------------+--------------+
| 1 | 3 | 2 |
| 4 | 3 | 2 |
| 4 | 3 | 2 |
| 5 | 1 | 1 |
*------+------------+--------------*/
SELECT
x,
COUNT(*) OVER (PARTITION BY MOD(x, 3)) AS count_star,
COUNT(x) OVER (PARTITION BY MOD(x, 3)) AS count_x
FROM UNNEST([1, 4, NULL, 4, 5]) AS x;
/*------+------------+---------*
| x | count_star | count_x |
+------+------------+---------+
| NULL | 1 | 0 |
| 1 | 3 | 3 |
| 4 | 3 | 3 |
| 4 | 3 | 3 |
| 5 | 1 | 1 |
*------+------------+---------*/
If you want to count the number of distinct values of an expression for which a certain condition is satisfied, this is one recipe that you can use:
COUNT(DISTINCT IF(condition, expression, NULL))
Here, IF
will return the value of expression
if condition
is TRUE
, or
NULL
otherwise. The surrounding COUNT(DISTINCT ...)
will ignore the NULL
values, so it will count only the distinct values of expression
for which
condition
is TRUE
.
For example, to count the number of distinct positive values of x
:
SELECT COUNT(DISTINCT IF(x > 0, x, NULL)) AS distinct_positive
FROM UNNEST([1, -2, 4, 1, -5, 4, 1, 3, -6, 1]) AS x;
/*-------------------*
| distinct_positive |
+-------------------+
| 3 |
*-------------------*/
Or to count the number of distinct dates on which a certain kind of event occurred:
WITH Events AS (
SELECT DATE '2021-01-01' AS event_date, 'SUCCESS' AS event_type
UNION ALL
SELECT DATE '2021-01-02' AS event_date, 'SUCCESS' AS event_type
UNION ALL
SELECT DATE '2021-01-02' AS event_date, 'FAILURE' AS event_type
UNION ALL
SELECT DATE '2021-01-03' AS event_date, 'SUCCESS' AS event_type
UNION ALL
SELECT DATE '2021-01-04' AS event_date, 'FAILURE' AS event_type
UNION ALL
SELECT DATE '2021-01-04' AS event_date, 'FAILURE' AS event_type
)
SELECT
COUNT(DISTINCT IF(event_type = 'FAILURE', event_date, NULL))
AS distinct_dates_with_failures
FROM Events;
/*------------------------------*
| distinct_dates_with_failures |
+------------------------------+
| 2 |
*------------------------------*/
COUNTIF
COUNTIF(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the count of TRUE
values for expression
. Returns 0
if there are
zero input rows, or if expression
evaluates to FALSE
or NULL
for all rows.
Since expression
must be a BOOL
, the form COUNTIF(DISTINCT ...)
is
generally not useful: there is only one distinct value of TRUE
. So
COUNTIF(DISTINCT ...)
will return 1 if expression
evaluates to TRUE
for
one or more input rows, or 0 otherwise.
Usually when someone wants to combine COUNTIF
and DISTINCT
, they
want to count the number of distinct values of an expression for which a certain
condition is satisfied. One recipe to achieve this is the following:
COUNT(DISTINCT IF(condition, expression, NULL))
Note that this uses COUNT
, not COUNTIF
; the IF
part has been moved inside.
To learn more, see the examples for COUNT
.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
BOOL
Return Data Types
INT64
Examples
SELECT COUNTIF(x<0) AS num_negative, COUNTIF(x>0) AS num_positive
FROM UNNEST([5, -2, 3, 6, -10, -7, 4, 0]) AS x;
/*--------------+--------------*
| num_negative | num_positive |
+--------------+--------------+
| 3 | 4 |
*--------------+--------------*/
SELECT
x,
COUNTIF(x<0) OVER (ORDER BY ABS(x) ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS num_negative
FROM UNNEST([5, -2, 3, 6, -10, NULL, -7, 4, 0]) AS x;
/*------+--------------*
| x | num_negative |
+------+--------------+
| NULL | 0 |
| 0 | 1 |
| -2 | 1 |
| 3 | 1 |
| 4 | 0 |
| 5 | 0 |
| 6 | 1 |
| -7 | 2 |
| -10 | 2 |
*------+--------------*/
GROUPING
GROUPING(groupable_value)
Description
If a groupable item in the GROUP BY
clause is aggregated
(and thus not grouped), this function returns 1
. Otherwise,
this function returns 0
.
Definitions:
groupable_value
: An expression that represents a value that can be grouped in theGROUP BY
clause.
Details:
The GROUPING
function is helpful if you need to determine which rows are
produced by which grouping sets. A grouping set is a group of columns by which
rows can be grouped together. So, if you need to filter rows by
a few specific grouping sets, you can use the GROUPING
function to identify
which grouping sets grouped which rows by creating a matrix of the results.
In addition, you can use the GROUPING
function to determine the type of
NULL
produced by the GROUP BY
clause. In some cases, the GROUP BY
clause
produces a NULL
placeholder. This placeholder represents all groupable items
that are aggregated (not grouped) in the current grouping set. This is different
from a standard NULL
, which can also be produced by a query.
For more information, see the following examples.
Returned Data Type
INT64
Examples
In the following example, it's difficult to determine which rows are grouped by
the grouping value product_type
or product_name
. The GROUPING
function
makes this easier to determine.
Pay close attention to what's in the product_type_agg
and
product_name_agg
column matrix. This determines how the rows are grouped.
product_type_agg |
product_name_agg |
Notes |
---|---|---|
1 | 0 | Rows are grouped by product_name . |
0 | 1 | Rows are grouped by product_type . |
0 | 0 | Rows are grouped by product_type and product_name . |
1 | 1 | Grand total row. |
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT 'shirt', 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT
product_type,
product_name,
SUM(product_count) AS product_sum,
GROUPING(product_type) AS product_type_agg,
GROUPING(product_name) AS product_name_agg,
FROM Products
GROUP BY GROUPING SETS(product_type, product_name, ())
ORDER BY product_name;
/*--------------+--------------+-------------+------------------+------------------+
| product_type | product_name | product_sum | product_type_agg | product_name_agg |
+--------------+--------------+-------------+------------------+------------------+
| NULL | NULL | 42 | 1 | 1 |
| shirt | NULL | 36 | 0 | 1 |
| pants | NULL | 6 | 0 | 1 |
| NULL | jeans | 6 | 1 | 0 |
| NULL | polo | 25 | 1 | 0 |
| NULL | t-shirt | 11 | 1 | 0 |
+--------------+--------------+-------------+------------------+------------------*/
In the following example, it's difficult to determine
if NULL
represents a NULL
placeholder or a standard NULL
value in the
product_type
column. The GROUPING
function makes it easier to
determine what type of NULL
is being produced. If
product_type_is_aggregated
is 1
, the NULL
value for
the product_type
column is a NULL
placeholder.
WITH
Products AS (
SELECT 'shirt' AS product_type, 't-shirt' AS product_name, 3 AS product_count UNION ALL
SELECT 'shirt', 't-shirt', 8 UNION ALL
SELECT NULL, 'polo', 25 UNION ALL
SELECT 'pants', 'jeans', 6
)
SELECT
product_type,
product_name,
SUM(product_count) AS product_sum,
GROUPING(product_type) AS product_type_is_aggregated
FROM Products
GROUP BY GROUPING SETS(product_type, product_name)
ORDER BY product_name;
/*--------------+--------------+-------------+----------------------------+
| product_type | product_name | product_sum | product_type_is_aggregated |
+--------------+--------------+-------------+----------------------------+
| shirt | NULL | 11 | 0 |
| NULL | NULL | 25 | 0 |
| pants | NULL | 6 | 0 |
| NULL | jeans | 6 | 1 |
| NULL | polo | 25 | 1 |
| NULL | t-shirt | 11 | 1 |
+--------------+--------------+-------------+----------------------------*/
LOGICAL_AND
LOGICAL_AND(
expression
)
Description
Returns the logical AND of all non-NULL
expressions. Returns NULL
if there
are zero input rows or expression
evaluates to NULL
for all rows.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
Supported Argument Types
BOOL
Return Data Types
BOOL
Examples
LOGICAL_AND
returns FALSE
because not all of the values in the array are
less than 3.
SELECT LOGICAL_AND(x < 3) AS logical_and FROM UNNEST([1, 2, 4]) AS x;
/*-------------*
| logical_and |
+-------------+
| FALSE |
*-------------*/
LOGICAL_OR
LOGICAL_OR(
expression
)
Description
Returns the logical OR of all non-NULL
expressions. Returns NULL
if there
are zero input rows or expression
evaluates to NULL
for all rows.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
Supported Argument Types
BOOL
Return Data Types
BOOL
Examples
LOGICAL_OR
returns TRUE
because at least one of the values in the array is
less than 3.
SELECT LOGICAL_OR(x < 3) AS logical_or FROM UNNEST([1, 2, 4]) AS x;
/*------------*
| logical_or |
+------------+
| TRUE |
*------------*/
MAX
MAX(
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the maximum non-NULL
value in an aggregated group.
Caveats:
- If the aggregated group is empty or the argument is
NULL
for all rows in the group, returnsNULL
. - If the argument is
NaN
for any row in the group, returnsNaN
.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
To learn more about the OVER
clause and how to use it, see
Window function calls.
This function supports specifying collation.
Supported Argument Types
Any orderable data type except for ARRAY
.
Return Data Types
The data type of the input values.
Examples
SELECT MAX(x) AS max
FROM UNNEST([8, 37, 55, 4]) AS x;
/*-----*
| max |
+-----+
| 55 |
*-----*/
SELECT x, MAX(x) OVER (PARTITION BY MOD(x, 2)) AS max
FROM UNNEST([8, NULL, 37, 55, NULL, 4]) AS x;
/*------+------*
| x | max |
+------+------+
| NULL | NULL |
| NULL | NULL |
| 8 | 8 |
| 4 | 8 |
| 37 | 55 |
| 55 | 55 |
*------+------*/
MAX_BY
MAX_BY(
x, y
)
Description
Synonym for ANY_VALUE(x HAVING MAX y)
.
Return Data Types
Matches the input x
data type.
Examples
WITH fruits AS (
SELECT "apple" fruit, 3.55 price UNION ALL
SELECT "banana" fruit, 2.10 price UNION ALL
SELECT "pear" fruit, 4.30 price
)
SELECT MAX_BY(fruit, price) as fruit
FROM fruits;
/*-------*
| fruit |
+-------+
| pear |
*-------*/
MIN
MIN(
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the minimum non-NULL
value in an aggregated group.
Caveats:
- If the aggregated group is empty or the argument is
NULL
for all rows in the group, returnsNULL
. - If the argument is
NaN
for any row in the group, returnsNaN
.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
To learn more about the OVER
clause and how to use it, see
Window function calls.
This function supports specifying collation.
Supported Argument Types
Any orderable data type except for ARRAY
.
Return Data Types
The data type of the input values.
Examples
SELECT MIN(x) AS min
FROM UNNEST([8, 37, 4, 55]) AS x;
/*-----*
| min |
+-----+
| 4 |
*-----*/
SELECT x, MIN(x) OVER (PARTITION BY MOD(x, 2)) AS min
FROM UNNEST([8, NULL, 37, 4, NULL, 55]) AS x;
/*------+------*
| x | min |
+------+------+
| NULL | NULL |
| NULL | NULL |
| 8 | 4 |
| 4 | 4 |
| 37 | 37 |
| 55 | 37 |
*------+------*/
MIN_BY
MIN_BY(
x, y
)
Description
Synonym for ANY_VALUE(x HAVING MIN y)
.
Return Data Types
Matches the input x
data type.
Examples
WITH fruits AS (
SELECT "apple" fruit, 3.55 price UNION ALL
SELECT "banana" fruit, 2.10 price UNION ALL
SELECT "pear" fruit, 4.30 price
)
SELECT MIN_BY(fruit, price) as fruit
FROM fruits;
/*--------*
| fruit |
+--------+
| banana |
*--------*/
STRING_AGG
STRING_AGG(
[ DISTINCT ]
expression [, delimiter]
[ ORDER BY key [ { ASC | DESC } ] [, ... ] ]
[ LIMIT n ]
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns a value (either STRING
or BYTES
) obtained by concatenating
non-NULL
values. Returns NULL
if there are zero input rows or expression
evaluates to NULL
for all rows.
If a delimiter
is specified, concatenated values are separated by that
delimiter; otherwise, a comma is used as a delimiter.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
If this function is used with the OVER
clause, it's part of a
window function call. In a window function call,
aggregate function clauses can't be used.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
Either STRING
or BYTES
.
Return Data Types
Either STRING
or BYTES
.
Examples
SELECT STRING_AGG(fruit) AS string_agg
FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit;
/*------------------------*
| string_agg |
+------------------------+
| apple,pear,banana,pear |
*------------------------*/
SELECT STRING_AGG(fruit, " & ") AS string_agg
FROM UNNEST(["apple", "pear", "banana", "pear"]) AS fruit;
/*------------------------------*
| string_agg |
+------------------------------+
| apple & pear & banana & pear |
*------------------------------*/
SELECT STRING_AGG(DISTINCT fruit, " & ") AS string_agg
FROM UNNEST(["apple", "pear", "banana", "pear"]) AS fruit;
/*-----------------------*
| string_agg |
+-----------------------+
| apple & pear & banana |
*-----------------------*/
SELECT STRING_AGG(fruit, " & " ORDER BY LENGTH(fruit)) AS string_agg
FROM UNNEST(["apple", "pear", "banana", "pear"]) AS fruit;
/*------------------------------*
| string_agg |
+------------------------------+
| pear & pear & apple & banana |
*------------------------------*/
SELECT STRING_AGG(fruit, " & " LIMIT 2) AS string_agg
FROM UNNEST(["apple", "pear", "banana", "pear"]) AS fruit;
/*--------------*
| string_agg |
+--------------+
| apple & pear |
*--------------*/
SELECT STRING_AGG(DISTINCT fruit, " & " ORDER BY fruit DESC LIMIT 2) AS string_agg
FROM UNNEST(["apple", "pear", "banana", "pear"]) AS fruit;
/*---------------*
| string_agg |
+---------------+
| pear & banana |
*---------------*/
SELECT
fruit,
STRING_AGG(fruit, " & ") OVER (ORDER BY LENGTH(fruit)) AS string_agg
FROM UNNEST(["apple", NULL, "pear", "banana", "pear"]) AS fruit;
/*--------+------------------------------*
| fruit | string_agg |
+--------+------------------------------+
| NULL | NULL |
| pear | pear & pear |
| pear | pear & pear |
| apple | pear & pear & apple |
| banana | pear & pear & apple & banana |
*--------+------------------------------*/
SUM
SUM(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the sum of non-NULL
values in an aggregated group.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
To learn more about the OVER
clause and how to use it, see
Window function calls.
SUM
can be used with differential privacy. For more information, see
Differentially private aggregate functions.
Caveats:
- If the aggregated group is empty or the argument is
NULL
for all rows in the group, returnsNULL
. - If the argument is
NaN
for any row in the group, returnsNaN
. - If the argument is
[+|-]Infinity
for any row in the group, returns either[+|-]Infinity
orNaN
. - If there is numeric overflow, produces an error.
- If a floating-point type is returned, the result is non-deterministic, which means you might receive a different result each time you use this function.
Supported Argument Types
- Any supported numeric data type
INTERVAL
Return Data Types
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 | INTERVAL |
---|---|---|---|---|---|
OUTPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 | INTERVAL |
Examples
SELECT SUM(x) AS sum
FROM UNNEST([1, 2, 3, 4, 5, 4, 3, 2, 1]) AS x;
/*-----*
| sum |
+-----+
| 25 |
*-----*/
SELECT SUM(DISTINCT x) AS sum
FROM UNNEST([1, 2, 3, 4, 5, 4, 3, 2, 1]) AS x;
/*-----*
| sum |
+-----+
| 15 |
*-----*/
SELECT
x,
SUM(x) OVER (PARTITION BY MOD(x, 3)) AS sum
FROM UNNEST([1, 2, 3, 4, 5, 4, 3, 2, 1]) AS x;
/*---+-----*
| x | sum |
+---+-----+
| 3 | 6 |
| 3 | 6 |
| 1 | 10 |
| 4 | 10 |
| 4 | 10 |
| 1 | 10 |
| 2 | 9 |
| 5 | 9 |
| 2 | 9 |
*---+-----*/
SELECT
x,
SUM(DISTINCT x) OVER (PARTITION BY MOD(x, 3)) AS sum
FROM UNNEST([1, 2, 3, 4, 5, 4, 3, 2, 1]) AS x;
/*---+-----*
| x | sum |
+---+-----+
| 3 | 3 |
| 3 | 3 |
| 1 | 5 |
| 4 | 5 |
| 4 | 5 |
| 1 | 5 |
| 2 | 7 |
| 5 | 7 |
| 2 | 7 |
*---+-----*/
SELECT SUM(x) AS sum
FROM UNNEST([]) AS x;
/*------*
| sum |
+------+
| NULL |
*------*/
Approximate aggregate functions
GoogleSQL for BigQuery supports approximate aggregate functions. To learn about the syntax for aggregate function calls, see Aggregate function calls.
Approximate aggregate functions are scalable in terms of memory usage and time,
but produce approximate results instead of exact results. These functions
typically require less memory than exact aggregation functions
like COUNT(DISTINCT ...)
, but also introduce statistical uncertainty.
This makes approximate aggregation appropriate for large data streams for
which linear memory usage is impractical, as well as for data that is
already approximate.
The approximate aggregate functions in this section work directly on the input data, rather than an intermediate estimation of the data. These functions do not allow users to specify the precision for the estimation with sketches. If you would like to specify precision with sketches, see:
- HyperLogLog++ functions to estimate cardinality.
Function list
Name | Summary |
---|---|
APPROX_COUNT_DISTINCT
|
Gets the approximate result for COUNT(DISTINCT expression) .
|
APPROX_QUANTILES
|
Gets the approximate quantile boundaries. |
APPROX_TOP_COUNT
|
Gets the approximate top elements and their approximate count. |
APPROX_TOP_SUM
|
Gets the approximate top elements and sum, based on the approximate sum of an assigned weight. |
APPROX_COUNT_DISTINCT
APPROX_COUNT_DISTINCT(
expression
)
Description
Returns the approximate result for COUNT(DISTINCT expression)
. The value
returned is a statistical estimate, not necessarily the actual value.
This function is less accurate than COUNT(DISTINCT expression)
, but performs
better on huge input.
Supported Argument Types
Any data type except:
ARRAY
STRUCT
INTERVAL
Returned Data Types
INT64
Examples
SELECT APPROX_COUNT_DISTINCT(x) as approx_distinct
FROM UNNEST([0, 1, 1, 2, 3, 5]) as x;
/*-----------------*
| approx_distinct |
+-----------------+
| 5 |
*-----------------*/
APPROX_QUANTILES
APPROX_QUANTILES(
[ DISTINCT ]
expression, number
[ { IGNORE | RESPECT } NULLS ]
)
Description
Returns the approximate boundaries for a group of expression
values, where
number
represents the number of quantiles to create. This function returns an
array of number
+ 1 elements, sorted in ascending order, where the
first element is the approximate minimum and the last element is the approximate
maximum.
Returns NULL
if there are zero input rows or expression
evaluates to
NULL
for all rows.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
Supported Argument Types
expression
: Any supported data type except:ARRAY
STRUCT
INTERVAL
number
:INT64
literal or query parameter.
Returned Data Types
ARRAY<T>
where T
is the type specified by expression
.
Examples
SELECT APPROX_QUANTILES(x, 2) AS approx_quantiles
FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;
/*------------------*
| approx_quantiles |
+------------------+
| [1, 5, 10] |
*------------------*/
SELECT APPROX_QUANTILES(x, 100)[OFFSET(90)] AS percentile_90
FROM UNNEST([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) AS x;
/*---------------*
| percentile_90 |
+---------------+
| 9 |
*---------------*/
SELECT APPROX_QUANTILES(DISTINCT x, 2) AS approx_quantiles
FROM UNNEST([1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;
/*------------------*
| approx_quantiles |
+------------------+
| [1, 6, 10] |
*------------------*/
SELECT FORMAT("%T", APPROX_QUANTILES(x, 2 RESPECT NULLS)) AS approx_quantiles
FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;
/*------------------*
| approx_quantiles |
+------------------+
| [NULL, 4, 10] |
*------------------*/
SELECT FORMAT("%T", APPROX_QUANTILES(DISTINCT x, 2 RESPECT NULLS)) AS approx_quantiles
FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;
/*------------------*
| approx_quantiles |
+------------------+
| [NULL, 6, 10] |
*------------------*/
APPROX_TOP_COUNT
APPROX_TOP_COUNT(
expression, number
)
Description
Returns the approximate top elements of expression
as an array of STRUCT
s.
The number
parameter specifies the number of elements returned.
Each STRUCT
contains two fields. The first field (named value
) contains an
input value. The second field (named count
) contains an INT64
specifying the
number of times the value was returned.
Returns NULL
if there are zero input rows.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
Supported Argument Types
expression
: Any data type that theGROUP BY
clause supports.number
:INT64
literal or query parameter.
Returned Data Types
ARRAY<STRUCT>
Examples
SELECT APPROX_TOP_COUNT(x, 2) as approx_top_count
FROM UNNEST(["apple", "apple", "pear", "pear", "pear", "banana"]) as x;
/*-------------------------*
| approx_top_count |
+-------------------------+
| [{pear, 3}, {apple, 2}] |
*-------------------------*/
NULL handling
APPROX_TOP_COUNT
does not ignore NULL
s in the input. For example:
SELECT APPROX_TOP_COUNT(x, 2) as approx_top_count
FROM UNNEST([NULL, "pear", "pear", "pear", "apple", NULL]) as x;
/*------------------------*
| approx_top_count |
+------------------------+
| [{pear, 3}, {NULL, 2}] |
*------------------------*/
APPROX_TOP_SUM
APPROX_TOP_SUM(
expression, weight, number
)
Description
Returns the approximate top elements of expression
, ordered by the sum of the
weight
values provided for each unique value of expression
. The number
parameter specifies the number of elements returned.
If the weight
input is negative or NaN
, this function returns an error.
The elements are returned as an array of STRUCT
s.
Each STRUCT
contains two fields: value
and sum
.
The value
field contains the value of the input expression. The sum
field is
the same type as weight
, and is the approximate sum of the input weight
associated with the value
field.
Returns NULL
if there are zero input rows.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
Supported Argument Types
expression
: Any data type that theGROUP BY
clause supports.weight
: One of the following:INT64
NUMERIC
BIGNUMERIC
FLOAT64
number
:INT64
literal or query parameter.
Returned Data Types
ARRAY<STRUCT>
Examples
SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM
UNNEST([
STRUCT("apple" AS x, 3 AS weight),
("pear", 2),
("apple", 0),
("banana", 5),
("pear", 4)
]);
/*--------------------------*
| approx_top_sum |
+--------------------------+
| [{pear, 6}, {banana, 5}] |
*--------------------------*/
NULL handling
APPROX_TOP_SUM
does not ignore NULL
values for the expression
and weight
parameters.
SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM
UNNEST([STRUCT("apple" AS x, NULL AS weight), ("pear", 0), ("pear", NULL)]);
/*----------------------------*
| approx_top_sum |
+----------------------------+
| [{pear, 0}, {apple, NULL}] |
*----------------------------*/
SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM
UNNEST([STRUCT("apple" AS x, 0 AS weight), (NULL, 2)]);
/*-------------------------*
| approx_top_sum |
+-------------------------+
| [{NULL, 2}, {apple, 0}] |
*-------------------------*/
SELECT APPROX_TOP_SUM(x, weight, 2) AS approx_top_sum FROM
UNNEST([STRUCT("apple" AS x, 0 AS weight), (NULL, NULL)]);
/*----------------------------*
| approx_top_sum |
+----------------------------+
| [{apple, 0}, {NULL, NULL}] |
*----------------------------*/
Array functions
GoogleSQL for BigQuery supports the following array functions.
Function list
Name | Summary |
---|---|
ARRAY
|
Produces an array with one element for each row in a subquery. |
ARRAY_CONCAT
|
Concatenates one or more arrays with the same element type into a single array. |
ARRAY_LENGTH
|
Gets the number of elements in an array. |
ARRAY_REVERSE
|
Reverses the order of elements in an array. |
ARRAY_TO_STRING
|
Produces a concatenation of the elements in an array as a
STRING value.
|
GENERATE_ARRAY
|
Generates an array of values in a range. |
GENERATE_DATE_ARRAY
|
Generates an array of dates in a range. |
GENERATE_TIMESTAMP_ARRAY
|
Generates an array of timestamps in a range. |
ARRAY
ARRAY(subquery)
Description
The ARRAY
function returns an ARRAY
with one element for each row in a
subquery.
If subquery
produces a
SQL table,
the table must have exactly one column. Each element in the output ARRAY
is
the value of the single column of a row in the table.
If subquery
produces a
value table,
then each element in the output ARRAY
is the entire corresponding row of the
value table.
Constraints
- Subqueries are unordered, so the elements of the output
ARRAY
are not guaranteed to preserve any order in the source table for the subquery. However, if the subquery includes anORDER BY
clause, theARRAY
function will return anARRAY
that honors that clause. - If the subquery returns more than one column, the
ARRAY
function returns an error. - If the subquery returns an
ARRAY
typed column orARRAY
typed rows, theARRAY
function returns an error that GoogleSQL does not supportARRAY
s with elements of typeARRAY
. - If the subquery returns zero rows, the
ARRAY
function returns an emptyARRAY
. It never returns aNULL
ARRAY
.
Return type
ARRAY
Examples
SELECT ARRAY
(SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3) AS new_array;
/*-----------*
| new_array |
+-----------+
| [1, 2, 3] |
*-----------*/
To construct an ARRAY
from a subquery that contains multiple
columns, change the subquery to use SELECT AS STRUCT
. Now
the ARRAY
function will return an ARRAY
of STRUCT
s. The ARRAY
will
contain one STRUCT
for each row in the subquery, and each of these STRUCT
s
will contain a field for each column in that row.
SELECT
ARRAY
(SELECT AS STRUCT 1, 2, 3
UNION ALL SELECT AS STRUCT 4, 5, 6) AS new_array;
/*------------------------*
| new_array |
+------------------------+
| [{1, 2, 3}, {4, 5, 6}] |
*------------------------*/
Similarly, to construct an ARRAY
from a subquery that contains
one or more ARRAY
s, change the subquery to use SELECT AS STRUCT
.
SELECT ARRAY
(SELECT AS STRUCT [1, 2, 3] UNION ALL
SELECT AS STRUCT [4, 5, 6]) AS new_array;
/*----------------------------*
| new_array |
+----------------------------+
| [{[1, 2, 3]}, {[4, 5, 6]}] |
*----------------------------*/
ARRAY_CONCAT
ARRAY_CONCAT(array_expression[, ...])
Description
Concatenates one or more arrays with the same element type into a single array.
The function returns NULL
if any input argument is NULL
.
Return type
ARRAY
Examples
SELECT ARRAY_CONCAT([1, 2], [3, 4], [5, 6]) as count_to_six;
/*--------------------------------------------------*
| count_to_six |
+--------------------------------------------------+
| [1, 2, 3, 4, 5, 6] |
*--------------------------------------------------*/
ARRAY_LENGTH
ARRAY_LENGTH(array_expression)
Description
Returns the size of the array. Returns 0 for an empty array. Returns NULL
if
the array_expression
is NULL
.
Return type
INT64
Examples
SELECT
ARRAY_LENGTH(["coffee", NULL, "milk" ]) AS size_a,
ARRAY_LENGTH(["cake", "pie"]) AS size_b;
/*--------+--------*
| size_a | size_b |
+--------+--------+
| 3 | 2 |
*--------+--------*/
ARRAY_REVERSE
ARRAY_REVERSE(value)
Description
Returns the input ARRAY
with elements in reverse order.
Return type
ARRAY
Examples
SELECT ARRAY_REVERSE([1, 2, 3]) AS reverse_arr
/*-------------*
| reverse_arr |
+-------------+
| [3, 2, 1] |
*-------------*/
ARRAY_TO_STRING
ARRAY_TO_STRING(array_expression, delimiter[, null_text])
Description
Returns a concatenation of the elements in array_expression
as a STRING
. The value for array_expression
can either be an array of STRING
or
BYTES
data types.
If the null_text
parameter is used, the function replaces any NULL
values in
the array with the value of null_text
.
If the null_text
parameter is not used, the function omits the NULL
value
and its preceding delimiter.
Return type
STRING
Examples
SELECT ARRAY_TO_STRING(['coffee', 'tea', 'milk', NULL], '--', 'MISSING') AS text
/*--------------------------------*
| text |
+--------------------------------+
| coffee--tea--milk--MISSING |
*--------------------------------*/
SELECT ARRAY_TO_STRING(['cake', 'pie', NULL], '--', 'MISSING') AS text
/*--------------------------------*
| text |
+--------------------------------+
| cake--pie--MISSING |
*--------------------------------*/
GENERATE_ARRAY
GENERATE_ARRAY(start_expression, end_expression[, step_expression])
Description
Returns an array of values. The start_expression
and end_expression
parameters determine the inclusive start and end of the array.
The GENERATE_ARRAY
function accepts the following data types as inputs:
INT64
NUMERIC
BIGNUMERIC
FLOAT64
The step_expression
parameter determines the increment used to
generate array values. The default value for this parameter is 1
.
This function returns an error if step_expression
is set to 0, or if any
input is NaN
.
If any argument is NULL
, the function will return a NULL
array.
Return Data Type
ARRAY
Examples
The following returns an array of integers, with a default step of 1.
SELECT GENERATE_ARRAY(1, 5) AS example_array;
/*-----------------*
| example_array |
+-----------------+
| [1, 2, 3, 4, 5] |
*-----------------*/
The following returns an array using a user-specified step size.
SELECT GENERATE_ARRAY(0, 10, 3) AS example_array;
/*---------------*
| example_array |
+---------------+
| [0, 3, 6, 9] |
*---------------*/
The following returns an array using a negative value, -3
for its step size.
SELECT GENERATE_ARRAY(10, 0, -3) AS example_array;
/*---------------*
| example_array |
+---------------+
| [10, 7, 4, 1] |
*---------------*/
The following returns an array using the same value for the start_expression
and end_expression
.
SELECT GENERATE_ARRAY(4, 4, 10) AS example_array;
/*---------------*
| example_array |
+---------------+
| [4] |
*---------------*/
The following returns an empty array, because the start_expression
is greater
than the end_expression
, and the step_expression
value is positive.
SELECT GENERATE_ARRAY(10, 0, 3) AS example_array;
/*---------------*
| example_array |
+---------------+
| [] |
*---------------*/
The following returns a NULL
array because end_expression
is NULL
.
SELECT GENERATE_ARRAY(5, NULL, 1) AS example_array;
/*---------------*
| example_array |
+---------------+
| NULL |
*---------------*/
The following returns multiple arrays.
SELECT GENERATE_ARRAY(start, 5) AS example_array
FROM UNNEST([3, 4, 5]) AS start;
/*---------------*
| example_array |
+---------------+
| [3, 4, 5] |
| [4, 5] |
| [5] |
+---------------*/
GENERATE_DATE_ARRAY
GENERATE_DATE_ARRAY(start_date, end_date[, INTERVAL INT64_expr date_part])
Description
Returns an array of dates. The start_date
and end_date
parameters determine the inclusive start and end of the array.
The GENERATE_DATE_ARRAY
function accepts the following data types as inputs:
start_date
must be aDATE
.end_date
must be aDATE
.INT64_expr
must be anINT64
.date_part
must be either DAY, WEEK, MONTH, QUARTER, or YEAR.
The INT64_expr
parameter determines the increment used to generate dates. The
default value for this parameter is 1 day.
This function returns an error if INT64_expr
is set to 0.
Return Data Type
ARRAY
containing 0 or more DATE
values.
Examples
The following returns an array of dates, with a default step of 1.
SELECT GENERATE_DATE_ARRAY('2016-10-05', '2016-10-08') AS example;
/*--------------------------------------------------*
| example |
+--------------------------------------------------+
| [2016-10-05, 2016-10-06, 2016-10-07, 2016-10-08] |
*--------------------------------------------------*/
The following returns an array using a user-specified step size.
SELECT GENERATE_DATE_ARRAY(
'2016-10-05', '2016-10-09', INTERVAL 2 DAY) AS example;
/*--------------------------------------*
| example |
+--------------------------------------+
| [2016-10-05, 2016-10-07, 2016-10-09] |
*--------------------------------------*/
The following returns an array using a negative value, -3
for its step size.
SELECT GENERATE_DATE_ARRAY('2016-10-05',
'2016-10-01', INTERVAL -3 DAY) AS example;
/*--------------------------*
| example |
+--------------------------+
| [2016-10-05, 2016-10-02] |
*--------------------------*/
The following returns an array using the same value for the start_date
and
end_date
.
SELECT GENERATE_DATE_ARRAY('2016-10-05',
'2016-10-05', INTERVAL 8 DAY) AS example;
/*--------------*
| example |
+--------------+
| [2016-10-05] |
*--------------*/
The following returns an empty array, because the start_date
is greater
than the end_date
, and the step
value is positive.
SELECT GENERATE_DATE_ARRAY('2016-10-05',
'2016-10-01', INTERVAL 1 DAY) AS example;
/*---------*
| example |
+---------+
| [] |
*---------*/
The following returns a NULL
array, because one of its inputs is
NULL
.
SELECT GENERATE_DATE_ARRAY('2016-10-05', NULL) AS example;
/*---------*
| example |
+---------+
| NULL |
*---------*/
The following returns an array of dates, using MONTH as the date_part
interval:
SELECT GENERATE_DATE_ARRAY('2016-01-01',
'2016-12-31', INTERVAL 2 MONTH) AS example;
/*--------------------------------------------------------------------------*
| example |
+--------------------------------------------------------------------------+
| [2016-01-01, 2016-03-01, 2016-05-01, 2016-07-01, 2016-09-01, 2016-11-01] |
*--------------------------------------------------------------------------*/
The following uses non-constant dates to generate an array.
SELECT GENERATE_DATE_ARRAY(date_start, date_end, INTERVAL 1 WEEK) AS date_range
FROM (
SELECT DATE '2016-01-01' AS date_start, DATE '2016-01-31' AS date_end
UNION ALL SELECT DATE "2016-04-01", DATE "2016-04-30"
UNION ALL SELECT DATE "2016-07-01", DATE "2016-07-31"
UNION ALL SELECT DATE "2016-10-01", DATE "2016-10-31"
) AS items;
/*--------------------------------------------------------------*
| date_range |
+--------------------------------------------------------------+
| [2016-01-01, 2016-01-08, 2016-01-15, 2016-01-22, 2016-01-29] |
| [2016-04-01, 2016-04-08, 2016-04-15, 2016-04-22, 2016-04-29] |
| [2016-07-01, 2016-07-08, 2016-07-15, 2016-07-22, 2016-07-29] |
| [2016-10-01, 2016-10-08, 2016-10-15, 2016-10-22, 2016-10-29] |
*--------------------------------------------------------------*/
GENERATE_TIMESTAMP_ARRAY
GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp,
INTERVAL step_expression date_part)
Description
Returns an ARRAY
of TIMESTAMPS
separated by a given interval. The
start_timestamp
and end_timestamp
parameters determine the inclusive
lower and upper bounds of the ARRAY
.
The GENERATE_TIMESTAMP_ARRAY
function accepts the following data types as
inputs:
start_timestamp
:TIMESTAMP
end_timestamp
:TIMESTAMP
step_expression
:INT64
- Allowed
date_part
values are:MICROSECOND
,MILLISECOND
,SECOND
,MINUTE
,HOUR
, orDAY
.
The step_expression
parameter determines the increment used to generate
timestamps.
Return Data Type
An ARRAY
containing 0 or more TIMESTAMP
values.
Examples
The following example returns an ARRAY
of TIMESTAMP
s at intervals of 1 day.
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-07 00:00:00',
INTERVAL 1 DAY) AS timestamp_array;
/*--------------------------------------------------------------------------*
| timestamp_array |
+--------------------------------------------------------------------------+
| [2016-10-05 00:00:00+00, 2016-10-06 00:00:00+00, 2016-10-07 00:00:00+00] |
*--------------------------------------------------------------------------*/
The following example returns an ARRAY
of TIMESTAMP
s at intervals of 1
second.
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-05 00:00:02',
INTERVAL 1 SECOND) AS timestamp_array;
/*--------------------------------------------------------------------------*
| timestamp_array |
+--------------------------------------------------------------------------+
| [2016-10-05 00:00:00+00, 2016-10-05 00:00:01+00, 2016-10-05 00:00:02+00] |
*--------------------------------------------------------------------------*/
The following example returns an ARRAY
of TIMESTAMPS
with a negative
interval.
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-06 00:00:00', '2016-10-01 00:00:00',
INTERVAL -2 DAY) AS timestamp_array;
/*--------------------------------------------------------------------------*
| timestamp_array |
+--------------------------------------------------------------------------+
| [2016-10-06 00:00:00+00, 2016-10-04 00:00:00+00, 2016-10-02 00:00:00+00] |
*--------------------------------------------------------------------------*/
The following example returns an ARRAY
with a single element, because
start_timestamp
and end_timestamp
have the same value.
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', '2016-10-05 00:00:00',
INTERVAL 1 HOUR) AS timestamp_array;
/*--------------------------*
| timestamp_array |
+--------------------------+
| [2016-10-05 00:00:00+00] |
*--------------------------*/
The following example returns an empty ARRAY
, because start_timestamp
is
later than end_timestamp
.
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-06 00:00:00', '2016-10-05 00:00:00',
INTERVAL 1 HOUR) AS timestamp_array;
/*-----------------*
| timestamp_array |
+-----------------+
| [] |
*-----------------*/
The following example returns a null ARRAY
, because one of the inputs is
NULL
.
SELECT GENERATE_TIMESTAMP_ARRAY('2016-10-05 00:00:00', NULL, INTERVAL 1 HOUR)
AS timestamp_array;
/*-----------------*
| timestamp_array |
+-----------------+
| NULL |
*-----------------*/
The following example generates ARRAY
s of TIMESTAMP
s from columns containing
values for start_timestamp
and end_timestamp
.
SELECT GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp, INTERVAL 1 HOUR)
AS timestamp_array
FROM
(SELECT
TIMESTAMP '2016-10-05 00:00:00' AS start_timestamp,
TIMESTAMP '2016-10-05 02:00:00' AS end_timestamp
UNION ALL
SELECT
TIMESTAMP '2016-10-05 12:00:00' AS start_timestamp,
TIMESTAMP '2016-10-05 14:00:00' AS end_timestamp
UNION ALL
SELECT
TIMESTAMP '2016-10-05 23:59:00' AS start_timestamp,
TIMESTAMP '2016-10-06 01:59:00' AS end_timestamp);
/*--------------------------------------------------------------------------*
| timestamp_array |
+--------------------------------------------------------------------------+
| [2016-10-05 00:00:00+00, 2016-10-05 01:00:00+00, 2016-10-05 02:00:00+00] |
| [2016-10-05 12:00:00+00, 2016-10-05 13:00:00+00, 2016-10-05 14:00:00+00] |
| [2016-10-05 23:59:00+00, 2016-10-06 00:59:00+00, 2016-10-06 01:59:00+00] |
*--------------------------------------------------------------------------*/
Supplemental materials
OFFSET and ORDINAL
For information about using OFFSET
and ORDINAL
with arrays, see
Array subscript operator and Accessing array
elements.
Bit functions
GoogleSQL for BigQuery supports the following bit functions.
Function list
Name | Summary |
---|---|
BIT_COUNT
|
Gets the number of bits that are set in an input expression. |
BIT_COUNT
BIT_COUNT(expression)
Description
The input, expression
, must be an
integer or BYTES
.
Returns the number of bits that are set in the input expression
.
For signed integers, this is the number of bits in two's complement form.
Return Data Type
INT64
Example
SELECT a, BIT_COUNT(a) AS a_bits, FORMAT("%T", b) as b, BIT_COUNT(b) AS b_bits
FROM UNNEST([
STRUCT(0 AS a, b'' AS b), (0, b'\x00'), (5, b'\x05'), (8, b'\x00\x08'),
(0xFFFF, b'\xFF\xFF'), (-2, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFE'),
(-1, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF'),
(NULL, b'\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF')
]) AS x;
/*-------+--------+---------------------------------------------+--------*
| a | a_bits | b | b_bits |
+-------+--------+---------------------------------------------+--------+
| 0 | 0 | b"" | 0 |
| 0 | 0 | b"\x00" | 0 |
| 5 | 2 | b"\x05" | 2 |
| 8 | 1 | b"\x00\x08" | 1 |
| 65535 | 16 | b"\xff\xff" | 16 |
| -2 | 63 | b"\xff\xff\xff\xff\xff\xff\xff\xfe" | 63 |
| -1 | 64 | b"\xff\xff\xff\xff\xff\xff\xff\xff" | 64 |
| NULL | NULL | b"\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff" | 80 |
*-------+--------+---------------------------------------------+--------*/
Conversion functions
GoogleSQL for BigQuery supports conversion functions. These data type conversions are explicit, but some conversions can happen implicitly. You can learn more about implicit and explicit conversion here.
Function list
Name | Summary |
---|---|
CAST
|
Convert the results of an expression to the given type. |
PARSE_BIGNUMERIC
|
Converts a STRING value to a BIGNUMERIC value.
|
PARSE_NUMERIC
|
Converts a STRING value to a NUMERIC value.
|
SAFE_CAST
|
Similar to the CAST function, but returns NULL
when a runtime error is produced.
|
CAST
CAST(expression AS typename [format_clause])
Description
Cast syntax is used in a query to indicate that the result type of an expression should be converted to some other type.
When using CAST
, a query can fail if GoogleSQL is unable to perform
the cast. If you want to protect your queries from these types of errors, you
can use SAFE_CAST.
Casts between supported types that do not successfully map from the original
value to the target domain produce runtime errors. For example, casting
BYTES
to STRING
where the byte sequence is not valid UTF-8 results in a
runtime error.
Some casts can include a format clause, which provides instructions for how to conduct the cast. For example, you could instruct a cast to convert a sequence of bytes to a BASE64-encoded string instead of a UTF-8-encoded string.
The structure of the format clause is unique to each type of cast and more information is available in the section for that cast.
Examples
The following query results in "true"
if x
is 1
, "false"
for any other
non-NULL
value, and NULL
if x
is NULL
.
CAST(x=1 AS STRING)
CAST AS ARRAY
CAST(expression AS ARRAY<element_type>)
Description
GoogleSQL supports casting to ARRAY
. The
expression
parameter can represent an expression for these data types:
ARRAY
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
ARRAY |
ARRAY |
Must be the exact same array type. |
CAST AS BIGNUMERIC
CAST(expression AS BIGNUMERIC)
Description
GoogleSQL supports casting to BIGNUMERIC
. The
expression
parameter can represent an expression for these data types:
INT64
FLOAT64
NUMERIC
BIGNUMERIC
STRING
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
FLOAT64 | BIGNUMERIC |
The floating point number will round
half away from zero.
Casting a NaN , +inf or
-inf will return an error. Casting a value outside the range
of BIGNUMERIC returns an overflow error.
|
STRING |
BIGNUMERIC |
The numeric literal contained in the string must not exceed
the maximum precision or range of the
BIGNUMERIC type, or an error will occur. If the number of
digits after the decimal point exceeds 38, then the resulting
BIGNUMERIC value will round
half away from zero
to have 38 digits after the decimal point.
|
CAST AS BOOL
CAST(expression AS BOOL)
Description
GoogleSQL supports casting to BOOL
. The
expression
parameter can represent an expression for these data types:
INT64
BOOL
STRING
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
INT64 | BOOL |
Returns FALSE if x is 0 ,
TRUE otherwise.
|
STRING |
BOOL |
Returns TRUE if x is "true" and
FALSE if x is "false" All other values of x are invalid and throw an error instead
of casting to a boolean.A string is case-insensitive when converting to a boolean. |
CAST AS BYTES
CAST(expression AS BYTES [format_clause])
Description
GoogleSQL supports casting to BYTES
. The
expression
parameter can represent an expression for these data types:
BYTES
STRING
Format clause
When an expression of one type is cast to another type, you can use the
format clause to provide instructions for how to conduct
the cast. You can use the format clause in this section if expression
is a
STRING
.
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
STRING |
BYTES |
Strings are cast to bytes using UTF-8 encoding. For example, the string "©", when cast to bytes, would become a 2-byte sequence with the hex values C2 and A9. |
CAST AS DATE
CAST(expression AS DATE [format_clause])
Description
GoogleSQL supports casting to DATE
. The expression
parameter can represent an expression for these data types:
STRING
DATETIME
TIMESTAMP
Format clause
When an expression of one type is cast to another type, you can use the
format clause to provide instructions for how to conduct
the cast. You can use the format clause in this section if expression
is a
STRING
.
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
STRING |
DATE |
When casting from string to date, the string must conform to the supported date literal format, and is independent of time zone. If the string expression is invalid or represents a date that is outside of the supported min/max range, then an error is produced. |
TIMESTAMP |
DATE |
Casting from a timestamp to date effectively truncates the timestamp as of the default time zone. |
CAST AS DATETIME
CAST(expression AS DATETIME [format_clause])
Description
GoogleSQL supports casting to DATETIME
. The
expression
parameter can represent an expression for these data types:
STRING
DATETIME
TIMESTAMP
Format clause
When an expression of one type is cast to another type, you can use the
format clause to provide instructions for how to conduct
the cast. You can use the format clause in this section if expression
is a
STRING
.
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
STRING |
DATETIME |
When casting from string to datetime, the string must conform to the supported datetime literal format, and is independent of time zone. If the string expression is invalid or represents a datetime that is outside of the supported min/max range, then an error is produced. |
TIMESTAMP |
DATETIME |
Casting from a timestamp to datetime effectively truncates the timestamp as of the default time zone. |
CAST AS FLOAT64
CAST(expression AS FLOAT64)
Description
GoogleSQL supports casting to floating point types.
The expression
parameter can represent an expression for these data types:
INT64
FLOAT64
NUMERIC
BIGNUMERIC
STRING
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
INT64 | FLOAT64 | Returns a close but potentially not exact floating point value. |
NUMERIC |
FLOAT64 |
NUMERIC will convert to the closest floating point number
with a possible loss of precision.
|
BIGNUMERIC |
FLOAT64 |
BIGNUMERIC will convert to the closest floating point number
with a possible loss of precision.
|
STRING |
FLOAT64 |
Returns x as a floating point value, interpreting it as
having the same form as a valid floating point literal.
Also supports casts from "[+,-]inf" to
[,-]Infinity ,
"[+,-]infinity" to [,-]Infinity , and
"[+,-]nan" to NaN .
Conversions are case-insensitive.
|
CAST AS INT64
CAST(expression AS INT64)
Description
GoogleSQL supports casting to integer types.
The expression
parameter can represent an expression for these data types:
INT64
FLOAT64
NUMERIC
BIGNUMERIC
BOOL
STRING
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
FLOAT64 | INT64 |
Returns the closest integer value. Halfway cases such as 1.5 or -0.5 round away from zero. |
BOOL |
INT64 |
Returns 1 if x is TRUE ,
0 otherwise.
|
STRING |
INT64 |
A hex string can be cast to an integer. For example,
0x123 to 291 or -0x123 to
-291 .
|
Examples
If you are working with hex strings (0x123
), you can cast those strings as
integers:
SELECT '0x123' as hex_value, CAST('0x123' as INT64) as hex_to_int;
/*-----------+------------*
| hex_value | hex_to_int |
+-----------+------------+
| 0x123 | 291 |
*-----------+------------*/
SELECT '-0x123' as hex_value, CAST('-0x123' as INT64) as hex_to_int;
/*-----------+------------*
| hex_value | hex_to_int |
+-----------+------------+
| -0x123 | -291 |
*-----------+------------*/
CAST AS INTERVAL
CAST(expression AS INTERVAL)
Description
GoogleSQL supports casting to INTERVAL
. The
expression
parameter can represent an expression for these data types:
STRING
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
STRING |
INTERVAL |
When casting from string to interval, the string must conform to either ISO 8601 Duration standard or to interval literal format 'Y-M D H:M:S.F'. Partial interval literal formats are also accepted when they are not ambiguous, for example 'H:M:S'. If the string expression is invalid or represents an interval that is outside of the supported min/max range, then an error is produced. |
Examples
SELECT input, CAST(input AS INTERVAL) AS output
FROM UNNEST([
'1-2 3 10:20:30.456',
'1-2',
'10:20:30',
'P1Y2M3D',
'PT10H20M30,456S'
]) input
/*--------------------+--------------------*
| input | output |
+--------------------+--------------------+
| 1-2 3 10:20:30.456 | 1-2 3 10:20:30.456 |
| 1-2 | 1-2 0 0:0:0 |
| 10:20:30 | 0-0 0 10:20:30 |
| P1Y2M3D | 1-2 3 0:0:0 |
| PT10H20M30,456S | 0-0 0 10:20:30.456 |
*--------------------+--------------------*/
CAST AS NUMERIC
CAST(expression AS NUMERIC)
Description
GoogleSQL supports casting to NUMERIC
. The
expression
parameter can represent an expression for these data types:
INT64
FLOAT64
NUMERIC
BIGNUMERIC
STRING
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
FLOAT64 |
NUMERIC |
The floating point number will round
half away from zero.
Casting a NaN , +inf or
-inf will return an error. Casting a value outside the range
of NUMERIC returns an overflow error.
|
STRING |
NUMERIC |
The numeric literal contained in the string must not exceed
the maximum precision or range of the NUMERIC
type, or an error will occur. If the number of digits
after the decimal point exceeds nine, then the resulting
NUMERIC value will round
half away from zero.
to have nine digits after the decimal point.
|
CAST AS RANGE
CAST(expression AS RANGE)
Description
GoogleSQL supports casting to RANGE
. The
expression
parameter can represent an expression for these data types:
STRING
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
STRING |
RANGE |
When casting from string to range, the string must conform to the supported range literal format. If the string expression is invalid or represents a range that is outside of the supported subtype min/max range, then an error is produced. |
Examples
SELECT CAST(
'[2020-01-01, 2020-01-02)'
AS RANGE<DATE>) AS string_to_range
/*----------------------------------------*
| string_to_range |
+----------------------------------------+
| [DATE '2020-01-01', DATE '2020-01-02') |
*----------------------------------------*/
SELECT CAST(
'[2014-09-27 12:30:00.45, 2016-10-17 11:15:00.33)'
AS RANGE<DATETIME>) AS string_to_range
/*------------------------------------------------------------------------*
| string_to_range |
+------------------------------------------------------------------------+
| [DATETIME '2014-09-27 12:30:00.45', DATETIME '2016-10-17 11:15:00.33') |
*------------------------------------------------------------------------*/
SELECT CAST(
'[2014-09-27 12:30:00+08, 2016-10-17 11:15:00+08)'
AS RANGE<TIMESTAMP>) AS string_to_range
-- Results depend upon where this query was executed.
/*---------------------------------------------------------------------------*
| string_to_range |
+---------------------------------------------------------------------------+
| [TIMESTAMP '2014-09-27 12:30:00+08', TIMESTAMP '2016-10-17 11:15:00 UTC') |
*---------------------------------------------------------------------------*/
SELECT CAST(
'[UNBOUNDED, 2020-01-02)'
AS RANGE<DATE>) AS string_to_range
/*--------------------------------*
| string_to_range |
+--------------------------------+
| [UNBOUNDED, DATE '2020-01-02') |
*--------------------------------*/
SELECT CAST(
'[2020-01-01, NULL)'
AS RANGE<DATE>) AS string_to_range
/*--------------------------------*
| string_to_range |
+--------------------------------+
| [DATE '2020-01-01', UNBOUNDED) |
*--------------------------------*/
CAST AS STRING
CAST(expression AS STRING [format_clause [AT TIME ZONE timezone_expr]])
Description
GoogleSQL supports casting to STRING
. The
expression
parameter can represent an expression for these data types:
INT64
FLOAT64
NUMERIC
BIGNUMERIC
BOOL
BYTES
TIME
DATE
DATETIME
TIMESTAMP
RANGE
INTERVAL
STRING
Format clause
When an expression of one type is cast to another type, you can use the
format clause to provide instructions for how to conduct
the cast. You can use the format clause in this section if expression
is one
of these data types:
INT64
FLOAT64
NUMERIC
BIGNUMERIC
BYTES
TIME
DATE
DATETIME
TIMESTAMP
The format clause for STRING
has an additional optional clause called
AT TIME ZONE timezone_expr
, which you can use to specify a specific time zone
to use during formatting of a TIMESTAMP
. If this optional clause is not
included when formatting a TIMESTAMP
, your current time zone is used.
For more information, see the following topics:
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
FLOAT64 | STRING |
Returns an approximate string representation. A returned
NaN or 0 will not be signed. |
BOOL |
STRING |
Returns "true" if x is TRUE ,
"false" otherwise. |
BYTES |
STRING |
Returns x interpreted as a UTF-8 string.For example, the bytes literal b'\xc2\xa9' , when cast to a string,
is interpreted as UTF-8 and becomes the unicode character "©".An error occurs if x is not valid UTF-8. |
TIME |
STRING |
Casting from a time type to a string is independent of time zone and
is of the form HH:MM:SS .
|
DATE |
STRING |
Casting from a date type to a string is independent of time zone and is
of the form YYYY-MM-DD .
|
DATETIME |
STRING |
Casting from a datetime type to a string is independent of time zone and
is of the form YYYY-MM-DD HH:MM:SS .
|
TIMESTAMP |
STRING |
When casting from timestamp types to string, the timestamp is interpreted using the default time zone, UTC. The number of subsecond digits produced depends on the number of trailing zeroes in the subsecond part: the CAST function will truncate zero, three, or six digits. |
INTERVAL |
STRING |
Casting from an interval to a string is of the form
Y-M D H:M:S .
|
Examples
SELECT CAST(CURRENT_DATE() AS STRING) AS current_date
/*---------------*
| current_date |
+---------------+
| 2021-03-09 |
*---------------*/
SELECT CAST(CURRENT_DATE() AS STRING FORMAT 'DAY') AS current_day
/*-------------*
| current_day |
+-------------+
| MONDAY |
*-------------*/
SELECT CAST(
TIMESTAMP '2008-12-25 00:00:00+00:00'
AS STRING FORMAT 'YYYY-MM-DD HH24:MI:SS TZH:TZM') AS date_time_to_string
-- Results depend upon where this query was executed.
/*------------------------------*
| date_time_to_string |
+------------------------------+
| 2008-12-24 16:00:00 -08:00 |
*------------------------------*/
SELECT CAST(
TIMESTAMP '2008-12-25 00:00:00+00:00'
AS STRING FORMAT 'YYYY-MM-DD HH24:MI:SS TZH:TZM'
AT TIME ZONE 'Asia/Kolkata') AS date_time_to_string
-- Because the time zone is specified, the result is always the same.
/*------------------------------*
| date_time_to_string |
+------------------------------+
| 2008-12-25 05:30:00 +05:30 |
*------------------------------*/
SELECT CAST(INTERVAL 3 DAY AS STRING) AS interval_to_string
/*--------------------*
| interval_to_string |
+--------------------+
| 0-0 3 0:0:0 |
*--------------------*/
SELECT CAST(
INTERVAL "1-2 3 4:5:6.789" YEAR TO SECOND
AS STRING) AS interval_to_string
/*--------------------*
| interval_to_string |
+--------------------+
| 1-2 3 4:5:6.789 |
*--------------------*/
CAST AS STRUCT
CAST(expression AS STRUCT)
Description
GoogleSQL supports casting to STRUCT
. The
expression
parameter can represent an expression for these data types:
STRUCT
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
STRUCT |
STRUCT |
Allowed if the following conditions are met:
|
CAST AS TIME
CAST(expression AS TIME [format_clause])
Description
GoogleSQL supports casting to TIME. The expression
parameter can represent an expression for these data types:
STRING
TIME
DATETIME
TIMESTAMP
Format clause
When an expression of one type is cast to another type, you can use the
format clause to provide instructions for how to conduct
the cast. You can use the format clause in this section if expression
is a
STRING
.
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
STRING |
TIME |
When casting from string to time, the string must conform to the supported time literal format, and is independent of time zone. If the string expression is invalid or represents a time that is outside of the supported min/max range, then an error is produced. |
CAST AS TIMESTAMP
CAST(expression AS TIMESTAMP [format_clause [AT TIME ZONE timezone_expr]])
Description
GoogleSQL supports casting to TIMESTAMP
. The
expression
parameter can represent an expression for these data types:
STRING
DATETIME
TIMESTAMP
Format clause
When an expression of one type is cast to another type, you can use the
format clause to provide instructions for how to conduct
the cast. You can use the format clause in this section if expression
is a
STRING
.
The format clause for TIMESTAMP
has an additional optional clause called
AT TIME ZONE timezone_expr
, which you can use to specify a specific time zone
to use during formatting. If this optional clause is not included, your
current time zone is used.
Conversion rules
From | To | Rule(s) when casting x |
---|---|---|
STRING |
TIMESTAMP |
When casting from string to a timestamp, string_expression
must conform to the supported timestamp literal formats, or else a runtime
error occurs. The string_expression may itself contain a
time zone.
If there is a time zone in the string_expression , that
time zone is used for conversion, otherwise the default time zone,
UTC, is used. If the string has fewer than six digits,
then it is implicitly widened.
An error is produced if the string_expression is invalid,
has more than six subsecond digits (i.e., precision greater than
microseconds), or represents a time outside of the supported timestamp
range.
|
DATE |
TIMESTAMP |
Casting from a date to a timestamp interprets date_expression
as of midnight (start of the day) in the default time zone,
UTC.
|
DATETIME |
TIMESTAMP |
Casting from a datetime to a timestamp interprets
datetime_expression in the default time zone,
UTC.
Most valid datetime values have exactly one corresponding timestamp in each time zone. However, there are certain combinations of valid datetime values and time zones that have zero or two corresponding timestamp values. This happens in a time zone when clocks are set forward or set back, such as for Daylight Savings Time. When there are two valid timestamps, the earlier one is used. When there is no valid timestamp, the length of the gap in time (typically one hour) is added to the datetime. |
Examples
The following example casts a string-formatted timestamp as a timestamp:
SELECT CAST("2020-06-02 17:00:53.110+00:00" AS TIMESTAMP) AS as_timestamp
-- Results depend upon where this query was executed.
/*-----------------------------*
| as_timestamp |
+-----------------------------+
| 2020-06-03 00:00:53.110 UTC |
*-----------------------------*/
The following examples cast a string-formatted date and time as a timestamp. These examples return the same output as the previous example.
SELECT CAST('06/02/2020 17:00:53.110' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3' AT TIME ZONE 'UTC') AS as_timestamp
SELECT CAST('06/02/2020 17:00:53.110' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3' AT TIME ZONE '+00') AS as_timestamp
SELECT CAST('06/02/2020 17:00:53.110 +00' AS TIMESTAMP FORMAT 'MM/DD/YYYY HH24:MI:SS.FF3 TZH') AS as_timestamp
PARSE_BIGNUMERIC
PARSE_BIGNUMERIC(string_expression)
Description
Converts a STRING
to a BIGNUMERIC
value.
The numeric literal contained in the string must not exceed the
maximum precision or range of the BIGNUMERIC
type, or an
error occurs. If the number of digits after the decimal point exceeds 38, then
the resulting BIGNUMERIC
value rounds
half away from zero to have 38 digits after the
decimal point.
-- This example shows how a string with a decimal point is parsed.
SELECT PARSE_BIGNUMERIC("123.45") AS parsed;
/*--------*
| parsed |
+--------+
| 123.45 |
*--------*/
-- This example shows how a string with an exponent is parsed.
SELECT PARSE_BIGNUMERIC("123.456E37") AS parsed;
/*-----------------------------------------*
| parsed |
+-----------------------------------------+
| 123400000000000000000000000000000000000 |
*-----------------------------------------*/
-- This example shows the rounding when digits after the decimal point exceeds 38.
SELECT PARSE_BIGNUMERIC("1.123456789012345678901234567890123456789") as parsed;
/*------------------------------------------*
| parsed |
+------------------------------------------+
| 1.12345678901234567890123456789012345679 |
*------------------------------------------*/
This function is similar to using the CAST AS BIGNUMERIC
function except that the PARSE_BIGNUMERIC
function only accepts string inputs
and allows the following in the string:
- Spaces between the sign (+/-) and the number
- Signs (+/-) after the number
Rules for valid input strings:
Rule | Example Input | Output |
---|---|---|
The string can only contain digits, commas, decimal points and signs. | "- 12,34567,89.0" | -123456789 |
Whitespaces are allowed anywhere except between digits. | " - 12.345 " | -12.345 |
Only digits and commas are allowed before the decimal point. | " 12,345,678" | 12345678 |
Only digits are allowed after the decimal point. | "1.234 " | 1.234 |
Use E or e for exponents. After the
e , digits and a leading sign indicator are allowed.
|
" 123.45e-1" | 12.345 |
If the integer part is not empty, then it must contain at least one digit. | " 0,.12 -" | -0.12 |
If the string contains a decimal point, then it must contain at least one digit. | " .1" | 0.1 |
The string cannot contain more than one sign. | " 0.5 +" | 0.5 |
Return Data Type
BIGNUMERIC
Examples
This example shows an input with spaces before, after, and between the sign and the number:
SELECT PARSE_BIGNUMERIC(" - 12.34 ") as parsed;
/*--------*
| parsed |
+--------+
| -12.34 |
*--------*/
This example shows an input with an exponent as well as the sign after the number:
SELECT PARSE_BIGNUMERIC("12.34e-1-") as parsed;
/*--------*
| parsed |
+--------+
| -1.234 |
*--------*/
This example shows an input with multiple commas in the integer part of the number:
SELECT PARSE_BIGNUMERIC(" 1,2,,3,.45 + ") as parsed;
/*--------*
| parsed |
+--------+
| 123.45 |
*--------*/
This example shows an input with a decimal point and no digits in the whole number part:
SELECT PARSE_BIGNUMERIC(".1234 ") as parsed;
/*--------*
| parsed |
+--------+
| 0.1234 |
*--------*/
Examples of invalid inputs
This example is invalid because the whole number part contains no digits:
SELECT PARSE_BIGNUMERIC(",,,.1234 ") as parsed;
This example is invalid because there are whitespaces between digits:
SELECT PARSE_BIGNUMERIC("1 23.4 5 ") as parsed;
This example is invalid because the number is empty except for an exponent:
SELECT PARSE_BIGNUMERIC(" e1 ") as parsed;
This example is invalid because the string contains multiple signs:
SELECT PARSE_BIGNUMERIC(" - 12.3 - ") as parsed;
This example is invalid because the value of the number falls outside the range
of BIGNUMERIC
:
SELECT PARSE_BIGNUMERIC("12.34E100 ") as parsed;
This example is invalid because the string contains invalid characters:
SELECT PARSE_BIGNUMERIC("$12.34") as parsed;
PARSE_NUMERIC
PARSE_NUMERIC(string_expression)
Description
Converts a STRING
to a NUMERIC
value.
The numeric literal contained in the string must not exceed the
maximum precision or range of the NUMERIC
type, or an error
occurs. If the number of digits after the decimal point exceeds nine, then the
resulting NUMERIC
value rounds
half away from zero to have nine digits after the
decimal point.
-- This example shows how a string with a decimal point is parsed.
SELECT PARSE_NUMERIC("123.45") AS parsed;
/*--------*
| parsed |
+--------+
| 123.45 |
*--------*/
-- This example shows how a string with an exponent is parsed.
SELECT PARSE_NUMERIC("12.34E27") as parsed;
/*-------------------------------*
| parsed |
+-------------------------------+
| 12340000000000000000000000000 |
*-------------------------------*/
-- This example shows the rounding when digits after the decimal point exceeds 9.
SELECT PARSE_NUMERIC("1.0123456789") as parsed;
/*-------------*
| parsed |
+-------------+
| 1.012345679 |
*-------------*/
This function is similar to using the CAST AS NUMERIC
function
except that the PARSE_NUMERIC
function only accepts string inputs and allows
the following in the string:
- Spaces between the sign (+/-) and the number
- Signs (+/-) after the number
Rules for valid input strings:
Rule | Example Input | Output |
---|---|---|
The string can only contain digits, commas, decimal points and signs. | "- 12,34567,89.0" | -123456789 |
Whitespaces are allowed anywhere except between digits. | " - 12.345 " | -12.345 |
Only digits and commas are allowed before the decimal point. | " 12,345,678" | 12345678 |
Only digits are allowed after the decimal point. | "1.234 " | 1.234 |
Use E or e for exponents. After
the e ,
digits and a leading sign indicator are allowed.
|
" 123.45e-1" | 12.345 |
If the integer part is not empty, then it must contain at least one digit. | " 0,.12 -" | -0.12 |
If the string contains a decimal point, then it must contain at least one digit. | " .1" | 0.1 |
The string cannot contain more than one sign. | " 0.5 +" | 0.5 |
Return Data Type
NUMERIC
Examples
This example shows an input with spaces before, after, and between the sign and the number:
SELECT PARSE_NUMERIC(" - 12.34 ") as parsed;
/*--------*
| parsed |
+--------+
| -12.34 |
*--------*/
This example shows an input with an exponent as well as the sign after the number:
SELECT PARSE_NUMERIC("12.34e-1-") as parsed;
/*--------*
| parsed |
+--------+
| -1.234 |
*--------*/
This example shows an input with multiple commas in the integer part of the number:
SELECT PARSE_NUMERIC(" 1,2,,3,.45 + ") as parsed;
/*--------*
| parsed |
+--------+
| 123.45 |
*--------*/
This example shows an input with a decimal point and no digits in the whole number part:
SELECT PARSE_NUMERIC(".1234 ") as parsed;
/*--------*
| parsed |
+--------+
| 0.1234 |
*--------*/
Examples of invalid inputs
This example is invalid because the whole number part contains no digits:
SELECT PARSE_NUMERIC(",,,.1234 ") as parsed;
This example is invalid because there are whitespaces between digits:
SELECT PARSE_NUMERIC("1 23.4 5 ") as parsed;
This example is invalid because the number is empty except for an exponent:
SELECT PARSE_NUMERIC(" e1 ") as parsed;
This example is invalid because the string contains multiple signs:
SELECT PARSE_NUMERIC(" - 12.3 - ") as parsed;
This example is invalid because the value of the number falls outside the range
of BIGNUMERIC
:
SELECT PARSE_NUMERIC("12.34E100 ") as parsed;
This example is invalid because the string contains invalid characters:
SELECT PARSE_NUMERIC("$12.34") as parsed;
SAFE_CAST
SAFE_CAST(expression AS typename [format_clause])
Description
When using CAST
, a query can fail if GoogleSQL is unable to perform
the cast. For example, the following query generates an error:
SELECT CAST("apple" AS INT64) AS not_a_number;
If you want to protect your queries from these types of errors, you can use
SAFE_CAST
. SAFE_CAST
replaces runtime errors with NULL
s. However, during
static analysis, impossible casts between two non-castable types still produce
an error because the query is invalid.
SELECT SAFE_CAST("apple" AS INT64) AS not_a_number;
/*--------------*
| not_a_number |
+--------------+
| NULL |
*--------------*/
Some casts can include a format clause, which provides instructions for how to conduct the cast. For example, you could instruct a cast to convert a sequence of bytes to a BASE64-encoded string instead of a UTF-8-encoded string.
The structure of the format clause is unique to each type of cast and more information is available in the section for that cast.
If you are casting from bytes to strings, you can also use the
function, SAFE_CONVERT_BYTES_TO_STRING
. Any invalid UTF-8 characters
are replaced with the unicode replacement character, U+FFFD
.
Supplemental materials
Other conversion functions
You can learn more about these conversion functions elsewhere in the documentation:
Conversion function | From | To |
---|---|---|
ARRAY_TO_STRING | ARRAY | STRING |
BOOL | JSON | BOOL |
DATE | Various data types | DATE |
DATE_FROM_UNIX_DATE | INT64 | DATE |
DATETIME | Various data types | DATETIME |
FLOAT64 | JSON | FLOAT64 |
FROM_BASE32 | STRING | BYTEs |
FROM_BASE64 | STRING | BYTES |
FROM_HEX | STRING | BYTES |
INT64 | JSON | INT64 |
PARSE_DATE | STRING | DATE |
PARSE_DATETIME | STRING | DATETIME |
PARSE_JSON | STRING | JSON |
PARSE_TIME | STRING | TIME |
PARSE_TIMESTAMP | STRING | TIMESTAMP |
SAFE_CONVERT_BYTES_TO_STRING | BYTES | STRING |
STRING | TIMESTAMP | STRING |
STRING | JSON | STRING |
TIME | Various data types | TIME |
TIMESTAMP | Various data types | TIMESTAMP |
TIMESTAMP_MICROS | INT64 | TIMESTAMP |
TIMESTAMP_MILLIS | INT64 | TIMESTAMP |
TIMESTAMP_SECONDS | INT64 | TIMESTAMP |
TO_BASE32 | BYTES | STRING |
TO_BASE64 | BYTES | STRING |
TO_HEX | BYTES | STRING |
TO_JSON | All data types | JSON |
TO_JSON_STRING | All data types | STRING |
Date functions
GoogleSQL for BigQuery supports the following date functions.
Function list
Name | Summary |
---|---|
CURRENT_DATE
|
Returns the current date as a DATE value.
|
DATE
|
Constructs a DATE value.
|
DATE_ADD
|
Adds a specified time interval to a DATE value.
|
DATE_DIFF
|
Gets the number of unit boundaries between two DATE values
at a particular time granularity.
|
DATE_FROM_UNIX_DATE
|
Interprets an INT64 expression as the number of days
since 1970-01-01.
|
DATE_SUB
|
Subtracts a specified time interval from a DATE value.
|
DATE_TRUNC
|
Truncates a DATE , DATETIME , or
TIMESTAMP value at a particular
granularity.
|
EXTRACT
|
Extracts part of a date from a DATE value.
|
FORMAT_DATE
|
Formats a DATE value according to a specified format string.
|
LAST_DAY
|
Gets the last day in a specified time period that contains a
DATE value.
|
PARSE_DATE
|
Converts a STRING value to a DATE value.
|
UNIX_DATE
|
Converts a DATE value to the number of days since 1970-01-01.
|
CURRENT_DATE
CURRENT_DATE()
CURRENT_DATE(time_zone_expression)
CURRENT_DATE
Description
Returns the current date as a DATE
object. Parentheses are optional when
called with no arguments.
This function supports the following arguments:
time_zone_expression
: ASTRING
expression that represents a time zone. If no time zone is specified, the default time zone, UTC, is used. If this expression is used and it evaluates toNULL
, this function returnsNULL
.
The current date is recorded at the start of the query statement which contains this function, not when this specific function is evaluated.
Return Data Type
DATE
Examples
The following query produces the current date in the default time zone:
SELECT CURRENT_DATE() AS the_date;
/*--------------*
| the_date |
+--------------+
| 2016-12-25 |
*--------------*/
The following queries produce the current date in a specified time zone:
SELECT CURRENT_DATE('America/Los_Angeles') AS the_date;
/*--------------*
| the_date |
+--------------+
| 2016-12-25 |
*--------------*/
SELECT CURRENT_DATE('-08') AS the_date;
/*--------------*
| the_date |
+--------------+
| 2016-12-25 |
*--------------*/
The following query produces the current date in the default time zone. Parentheses are not needed if the function has no arguments.
SELECT CURRENT_DATE AS the_date;
/*--------------*
| the_date |
+--------------+
| 2016-12-25 |
*--------------*/
DATE
DATE(year, month, day)
DATE(timestamp_expression)
DATE(timestamp_expression, time_zone_expression)
DATE(datetime_expression)
Description
Constructs or extracts a date.
This function supports the following arguments:
year
: TheINT64
value for year.month
: TheINT64
value for month.day
: TheINT64
value for day.timestamp_expression
: ATIMESTAMP
expression that contains the date.time_zone_expression
: ASTRING
expression that represents a time zone. If no time zone is specified withtimestamp_expression
, the default time zone, UTC, is used.datetime_expression
: ADATETIME
expression that contains the date.
Return Data Type
DATE
Example
SELECT
DATE(2016, 12, 25) AS date_ymd,
DATE(DATETIME '2016-12-25 23:59:59') AS date_dt,
DATE(TIMESTAMP '2016-12-25 05:30:00+07', 'America/Los_Angeles') AS date_tstz;
/*------------+------------+------------*
| date_ymd | date_dt | date_tstz |
+------------+------------+------------+
| 2016-12-25 | 2016-12-25 | 2016-12-24 |
*------------+------------+------------*/
DATE_ADD
DATE_ADD(date_expression, INTERVAL int64_expression date_part)
Description
Adds a specified time interval to a DATE.
DATE_ADD
supports the following date_part
values:
DAY
WEEK
. Equivalent to 7DAY
s.MONTH
QUARTER
YEAR
Special handling is required for MONTH, QUARTER, and YEAR parts when the date is at (or near) the last day of the month. If the resulting month has fewer days than the original date's day, then the resulting date is the last date of that month.
Return Data Type
DATE
Example
SELECT DATE_ADD(DATE '2008-12-25', INTERVAL 5 DAY) AS five_days_later;
/*--------------------*
| five_days_later |
+--------------------+
| 2008-12-30 |
*--------------------*/
DATE_DIFF
DATE_DIFF(end_date, start_date, granularity)
Description
Gets the number of unit boundaries between two DATE
values (end_date
-
start_date
) at a particular time granularity.
Definitions
start_date
: The startingDATE
value.end_date
: The endingDATE
value.granularity
: The date part that represents the granularity. If you have passed inDATE
values for the first arguments,granularity
can be:DAY
WEEK
This date part begins on Sunday.WEEK(<WEEKDAY>)
: This date part begins onWEEKDAY
. Valid values forWEEKDAY
areSUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, andSATURDAY
.ISOWEEK
: Uses ISO 8601 week boundaries. ISO weeks begin on Monday.MONTH
QUARTER
YEAR
ISOYEAR
: Uses the ISO 8601 week-numbering year boundary. The ISO year boundary is the Monday of the first week whose Thursday belongs to the corresponding Gregorian calendar year.
Details
If end_date
is earlier than start_date
, the output is negative.
Return Data Type
INT64
Example
SELECT DATE_DIFF(DATE '2010-07-07', DATE '2008-12-25', DAY) AS days_diff;
/*-----------*
| days_diff |
+-----------+
| 559 |
*-----------*/
SELECT
DATE_DIFF(DATE '2017-10-15', DATE '2017-10-14', DAY) AS days_diff,
DATE_DIFF(DATE '2017-10-15', DATE '2017-10-14', WEEK) AS weeks_diff;
/*-----------+------------*
| days_diff | weeks_diff |
+-----------+------------+
| 1 | 1 |
*-----------+------------*/
The example above shows the result of DATE_DIFF
for two days in succession.
DATE_DIFF
with the date part WEEK
returns 1 because DATE_DIFF
counts the
number of date part boundaries in this range of dates. Each WEEK
begins on
Sunday, so there is one date part boundary between Saturday, 2017-10-14
and Sunday, 2017-10-15.
The following example shows the result of DATE_DIFF
for two dates in different
years. DATE_DIFF
with the date part YEAR
returns 3 because it counts the
number of Gregorian calendar year boundaries between the two dates. DATE_DIFF
with the date part ISOYEAR
returns 2 because the second date belongs to the
ISO year 2015. The first Thursday of the 2015 calendar year was 2015-01-01, so
the ISO year 2015 begins on the preceding Monday, 2014-12-29.
SELECT
DATE_DIFF('2017-12-30', '2014-12-30', YEAR) AS year_diff,
DATE_DIFF('2017-12-30', '2014-12-30', ISOYEAR) AS isoyear_diff;
/*-----------+--------------*
| year_diff | isoyear_diff |
+-----------+--------------+
| 3 | 2 |
*-----------+--------------*/
The following example shows the result of DATE_DIFF
for two days in
succession. The first date falls on a Monday and the second date falls on a
Sunday. DATE_DIFF
with the date part WEEK
returns 0 because this date part
uses weeks that begin on Sunday. DATE_DIFF
with the date part WEEK(MONDAY)
returns 1. DATE_DIFF
with the date part ISOWEEK
also returns 1 because
ISO weeks begin on Monday.
SELECT
DATE_DIFF('2017-12-18', '2017-12-17', WEEK) AS week_diff,
DATE_DIFF('2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff,
DATE_DIFF('2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff;
/*-----------+-------------------+--------------*
| week_diff | week_weekday_diff | isoweek_diff |
+-----------+-------------------+--------------+
| 0 | 1 | 1 |
*-----------+-------------------+--------------*/
DATE_FROM_UNIX_DATE
DATE_FROM_UNIX_DATE(int64_expression)
Description
Interprets int64_expression
as the number of days since 1970-01-01.
Return Data Type
DATE
Example
SELECT DATE_FROM_UNIX_DATE(14238) AS date_from_epoch;
/*-----------------*
| date_from_epoch |
+-----------------+
| 2008-12-25 |
*-----------------+*/
DATE_SUB
DATE_SUB(date_expression, INTERVAL int64_expression date_part)
Description
Subtracts a specified time interval from a DATE.
DATE_SUB
supports the following date_part
values:
DAY
WEEK
. Equivalent to 7DAY
s.MONTH
QUARTER
YEAR
Special handling is required for MONTH, QUARTER, and YEAR parts when the date is at (or near) the last day of the month. If the resulting month has fewer days than the original date's day, then the resulting date is the last date of that month.
Return Data Type
DATE
Example
SELECT DATE_SUB(DATE '2008-12-25', INTERVAL 5 DAY) AS five_days_ago;
/*---------------*
| five_days_ago |
+---------------+
| 2008-12-20 |
*---------------*/
DATE_TRUNC
DATE_TRUNC(date_value, date_granularity)
DATE_TRUNC(datetime_value, datetime_granularity)
DATE_TRUNC(timestamp_value, timestamp_granularity[, time_zone])
Description
Truncates a DATE
, DATETIME
, or TIMESTAMP
value at a particular
granularity.
Definitions
date_value
: ADATE
value to truncate.date_granularity
: The truncation granularity for aDATE
value. Date granularities can be used.datetime_value
: ADATETIME
value to truncate.datetime_granularity
: The truncation granularity for aDATETIME
value. Date granularities and time granularities can be used.timestamp_value
: ATIMESTAMP
value to truncate.timestamp_granularity
: The truncation granularity for aTIMESTAMP
value. Date granularities and time granularities can be used.time_zone
: A time zone to use with theTIMESTAMP
value. Time zone parts can be used. Use this argument if you want to use a time zone other than the default time zone, UTC, as part of the truncate operation.
Date granularity definitions
DAY
: The day in the Gregorian calendar year that contains the value to truncate.WEEK
: The first day in the week that contains the value to truncate. Weeks begin on Sundays.WEEK
is equivalent toWEEK(SUNDAY)
.WEEK(WEEKDAY)
: The first day in the week that contains the value to truncate. Weeks begin onWEEKDAY
.WEEKDAY
must be one of the following:SUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, orSATURDAY
.ISOWEEK
: The first day in the ISO 8601 week that contains the value to truncate. The ISO week begins on Monday. The first ISO week of each ISO year contains the first Thursday of the corresponding Gregorian calendar year.MONTH
: The first day in the month that contains the value to truncate.QUARTER
: The first day in the quarter that contains the value to truncate.YEAR
: The first day in the year that contains the value to truncate.ISOYEAR
: The first day in the ISO 8601 week-numbering year that contains the value to truncate. The ISO year is the Monday of the first week where Thursday belongs to the corresponding Gregorian calendar year.
Time granularity definitions
MICROSECOND
: If used, nothing is truncated from the value.MILLISECOND
: The nearest lesser than or equal millisecond.SECOND
: The nearest lesser than or equal second.MINUTE
: The nearest lesser than or equal minute.HOUR
: The nearest lesser than or equal hour.
Time zone part definitions
MINUTE
HOUR
DAY
WEEK
WEEK(<WEEKDAY>)
ISOWEEK
MONTH
QUARTER
YEAR
ISOYEAR
Details
The resulting value is always rounded to the beginning of granularity
.
Return Data Type
The same data type as the first argument passed into this function.
Examples
SELECT DATE_TRUNC(DATE '2008-12-25', MONTH) AS month;
/*------------*
| month |
+------------+
| 2008-12-01 |
*------------*/
In the following example, the original date falls on a Sunday. Because
the date_part
is WEEK(MONDAY)
, DATE_TRUNC
returns the DATE
for the
preceding Monday.
SELECT date AS original, DATE_TRUNC(date, WEEK(MONDAY)) AS truncated
FROM (SELECT DATE('2017-11-05') AS date);
/*------------+------------*
| original | truncated |
+------------+------------+
| 2017-11-05 | 2017-10-30 |
*------------+------------*/
In the following example, the original date_expression
is in the Gregorian
calendar year 2015. However, DATE_TRUNC
with the ISOYEAR
date part
truncates the date_expression
to the beginning of the ISO year, not the
Gregorian calendar year. The first Thursday of the 2015 calendar year was
2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29.
Therefore the ISO year boundary preceding the date_expression
2015-06-15 is
2014-12-29.
SELECT
DATE_TRUNC('2015-06-15', ISOYEAR) AS isoyear_boundary,
EXTRACT(ISOYEAR FROM DATE '2015-06-15') AS isoyear_number;
/*------------------+----------------*
| isoyear_boundary | isoyear_number |
+------------------+----------------+
| 2014-12-29 | 2015 |
*------------------+----------------*/
EXTRACT
EXTRACT(part FROM date_expression)
Description
Returns the value corresponding to the specified date part. The part
must
be one of:
DAYOFWEEK
: Returns values in the range [1,7] with Sunday as the first day of the week.DAY
DAYOFYEAR
WEEK
: Returns the week number of the date in the range [0, 53]. Weeks begin with Sunday, and dates prior to the first Sunday of the year are in week 0.WEEK(<WEEKDAY>)
: Returns the week number of the date in the range [0, 53]. Weeks begin onWEEKDAY
. Dates prior to the firstWEEKDAY
of the year are in week 0. Valid values forWEEKDAY
areSUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, andSATURDAY
.ISOWEEK
: Returns the ISO 8601 week number of thedate_expression
.ISOWEEK
s begin on Monday. Return values are in the range [1, 53]. The firstISOWEEK
of each ISO year begins on the Monday before the first Thursday of the Gregorian calendar year.MONTH
QUARTER
: Returns values in the range [1,4].YEAR
ISOYEAR
: Returns the ISO 8601 week-numbering year, which is the Gregorian calendar year containing the Thursday of the week to whichdate_expression
belongs.
Return Data Type
INT64
Examples
In the following example, EXTRACT
returns a value corresponding to the DAY
date part.
SELECT EXTRACT(DAY FROM DATE '2013-12-25') AS the_day;
/*---------*
| the_day |
+---------+
| 25 |
*---------*/
In the following example, EXTRACT
returns values corresponding to different
date parts from a column of dates near the end of the year.
SELECT
date,
EXTRACT(ISOYEAR FROM date) AS isoyear,
EXTRACT(ISOWEEK FROM date) AS isoweek,
EXTRACT(YEAR FROM date) AS year,
EXTRACT(WEEK FROM date) AS week
FROM UNNEST(GENERATE_DATE_ARRAY('2015-12-23', '2016-01-09')) AS date
ORDER BY date;
/*------------+---------+---------+------+------*
| date | isoyear | isoweek | year | week |
+------------+---------+---------+------+------+
| 2015-12-23 | 2015 | 52 | 2015 | 51 |
| 2015-12-24 | 2015 | 52 | 2015 | 51 |
| 2015-12-25 | 2015 | 52 | 2015 | 51 |
| 2015-12-26 | 2015 | 52 | 2015 | 51 |
| 2015-12-27 | 2015 | 52 | 2015 | 52 |
| 2015-12-28 | 2015 | 53 | 2015 | 52 |
| 2015-12-29 | 2015 | 53 | 2015 | 52 |
| 2015-12-30 | 2015 | 53 | 2015 | 52 |
| 2015-12-31 | 2015 | 53 | 2015 | 52 |
| 2016-01-01 | 2015 | 53 | 2016 | 0 |
| 2016-01-02 | 2015 | 53 | 2016 | 0 |
| 2016-01-03 | 2015 | 53 | 2016 | 1 |
| 2016-01-04 | 2016 | 1 | 2016 | 1 |
| 2016-01-05 | 2016 | 1 | 2016 | 1 |
| 2016-01-06 | 2016 | 1 | 2016 | 1 |
| 2016-01-07 | 2016 | 1 | 2016 | 1 |
| 2016-01-08 | 2016 | 1 | 2016 | 1 |
| 2016-01-09 | 2016 | 1 | 2016 | 1 |
*------------+---------+---------+------+------*/
In the following example, date_expression
falls on a Sunday. EXTRACT
calculates the first column using weeks that begin on Sunday, and it calculates
the second column using weeks that begin on Monday.
WITH table AS (SELECT DATE('2017-11-05') AS date)
SELECT
date,
EXTRACT(WEEK(SUNDAY) FROM date) AS week_sunday,
EXTRACT(WEEK(MONDAY) FROM date) AS week_monday FROM table;
/*------------+-------------+-------------*
| date | week_sunday | week_monday |
+------------+-------------+-------------+
| 2017-11-05 | 45 | 44 |
*------------+-------------+-------------*/
FORMAT_DATE
FORMAT_DATE(format_string, date_expr)
Description
Formats a DATE
value according to a specified format string.
Definitions
format_string
: ASTRING
value that contains the format elements to use withdate_expr
.date_expr
: ADATE
value that represents the date to format.
Return Data Type
STRING
Examples
SELECT FORMAT_DATE('%x', DATE '2008-12-25') AS US_format;
/*------------*
| US_format |
+------------+
| 12/25/08 |
*------------*/
SELECT FORMAT_DATE('%b-%d-%Y', DATE '2008-12-25') AS formatted;
/*-------------*
| formatted |
+-------------+
| Dec-25-2008 |
*-------------*/
SELECT FORMAT_DATE('%b %Y', DATE '2008-12-25') AS formatted;
/*-------------*
| formatted |
+-------------+
| Dec 2008 |
*-------------*/
LAST_DAY
LAST_DAY(date_expression[, date_part])
Description
Returns the last day from a date expression. This is commonly used to return the last day of the month.
You can optionally specify the date part for which the last day is returned.
If this parameter is not used, the default value is MONTH
.
LAST_DAY
supports the following values for date_part
:
YEAR
QUARTER
MONTH
WEEK
. Equivalent to 7DAY
s.WEEK(<WEEKDAY>)
.<WEEKDAY>
represents the starting day of the week. Valid values areSUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, andSATURDAY
.ISOWEEK
. Uses ISO 8601 week boundaries. ISO weeks begin on Monday.ISOYEAR
. Uses the ISO 8601 week-numbering year boundary. The ISO year boundary is the Monday of the first week whose Thursday belongs to the corresponding Gregorian calendar year.
Return Data Type
DATE
Example
These both return the last day of the month:
SELECT LAST_DAY(DATE '2008-11-25', MONTH) AS last_day
/*------------*
| last_day |
+------------+
| 2008-11-30 |
*------------*/
SELECT LAST_DAY(DATE '2008-11-25') AS last_day
/*------------*
| last_day |
+------------+
| 2008-11-30 |
*------------*/
This returns the last day of the year:
SELECT LAST_DAY(DATE '2008-11-25', YEAR) AS last_day
/*------------*
| last_day |
+------------+
| 2008-12-31 |
*------------*/
This returns the last day of the week for a week that starts on a Sunday:
SELECT LAST_DAY(DATE '2008-11-10', WEEK(SUNDAY)) AS last_day
/*------------*
| last_day |
+------------+
| 2008-11-15 |
*------------*/
This returns the last day of the week for a week that starts on a Monday:
SELECT LAST_DAY(DATE '2008-11-10', WEEK(MONDAY)) AS last_day
/*------------*
| last_day |
+------------+
| 2008-11-16 |
*------------*/
PARSE_DATE
PARSE_DATE(format_string, date_string)
Description
Converts a STRING
value to a DATE
value.
Definitions
format_string
: ASTRING
value that contains the format elements to use withdate_string
.date_string
: ASTRING
value that represents the date to parse.
Details
Each element in date_string
must have a corresponding element in
format_string
. The location of each element in format_string
must match the
location of each element in date_string
.
-- This works because elements on both sides match.
SELECT PARSE_DATE('%A %b %e %Y', 'Thursday Dec 25 2008');
-- This produces an error because the year element is in different locations.
SELECT PARSE_DATE('%Y %A %b %e', 'Thursday Dec 25 2008');
-- This produces an error because one of the year elements is missing.
SELECT PARSE_DATE('%A %b %e', 'Thursday Dec 25 2008');
-- This works because %F can find all matching elements in date_string.
SELECT PARSE_DATE('%F', '2000-12-30');
When using PARSE_DATE
, keep the following in mind:
- Unspecified fields. Any unspecified field is initialized from
1970-01-01
. - Case insensitivity. Names, such as
Monday
,February
, and so on, are case insensitive. - Whitespace. One or more consecutive white spaces in the format string matches zero or more consecutive white spaces in the date string. In addition, leading and trailing white spaces in the date string are always allowed -- even if they are not in the format string.
- Format precedence. When two (or more) format elements have overlapping
information (for example both
%F
and%Y
affect the year), the last one generally overrides any earlier ones.
Return Data Type
DATE
Examples
This example converts a MM/DD/YY
formatted string to a DATE
object:
SELECT PARSE_DATE('%x', '12/25/08') AS parsed;
/*------------*
| parsed |
+------------+
| 2008-12-25 |
*------------*/
This example converts a YYYYMMDD
formatted string to a DATE
object:
SELECT PARSE_DATE('%Y%m%d', '20081225') AS parsed;
/*------------*
| parsed |
+------------+
| 2008-12-25 |
*------------*/
UNIX_DATE
UNIX_DATE(date_expression)
Description
Returns the number of days since 1970-01-01
.
Return Data Type
INT64
Example
SELECT UNIX_DATE(DATE '2008-12-25') AS days_from_epoch;
/*-----------------*
| days_from_epoch |
+-----------------+
| 14238 |
*-----------------*/
Datetime functions
GoogleSQL for BigQuery supports the following datetime functions.
All outputs are automatically formatted as per ISO 8601,
separating date and time with aT
.
Function list
Name | Summary |
---|---|
CURRENT_DATETIME
|
Returns the current date and time as a DATETIME value.
|
DATETIME
|
Constructs a DATETIME value.
|
DATETIME_ADD
|
Adds a specified time interval to a DATETIME value.
|
DATETIME_DIFF
|
Gets the number of unit boundaries between two DATETIME values
at a particular time granularity.
|
DATETIME_SUB
|
Subtracts a specified time interval from a DATETIME value.
|
DATETIME_TRUNC
|
Truncates a DATETIME or
TIMESTAMP value at a particular
granularity.
|
EXTRACT
|
Extracts part of a date and time from a DATETIME value.
|
FORMAT_DATETIME
|
Formats a DATETIME value according to a specified
format string.
|
LAST_DAY
|
Gets the last day in a specified time period that contains a
DATETIME value.
|
PARSE_DATETIME
|
Converts a STRING value to a DATETIME value.
|
CURRENT_DATETIME
CURRENT_DATETIME([time_zone])
CURRENT_DATETIME
Description
Returns the current time as a DATETIME
object. Parentheses are optional when
called with no arguments.
This function supports an optional time_zone
parameter.
See Time zone definitions for
information on how to specify a time zone.
The current date and time is recorded at the start of the query statement which contains this function, not when this specific function is evaluated.
Return Data Type
DATETIME
Example
SELECT CURRENT_DATETIME() as now;
/*----------------------------*
| now |
+----------------------------+
| 2016-05-19T10:38:47.046465 |
*----------------------------*/
DATETIME
1. DATETIME(year, month, day, hour, minute, second)
2. DATETIME(date_expression[, time_expression])
3. DATETIME(timestamp_expression [, time_zone])
Description
- Constructs a
DATETIME
object usingINT64
values representing the year, month, day, hour, minute, and second. - Constructs a
DATETIME
object using a DATE object and an optionalTIME
object. - Constructs a
DATETIME
object using aTIMESTAMP
object. It supports an optional parameter to specify a time zone. If no time zone is specified, the default time zone, UTC, is used.
Return Data Type
DATETIME
Example
SELECT
DATETIME(2008, 12, 25, 05, 30, 00) as datetime_ymdhms,
DATETIME(TIMESTAMP "2008-12-25 05:30:00+00", "America/Los_Angeles") as datetime_tstz;
/*---------------------+---------------------*
| datetime_ymdhms | datetime_tstz |
+---------------------+---------------------+
| 2008-12-25T05:30:00 | 2008-12-24T21:30:00 |
*---------------------+---------------------*/
DATETIME_ADD
DATETIME_ADD(datetime_expression, INTERVAL int64_expression part)
Description
Adds int64_expression
units of part
to the DATETIME
object.
DATETIME_ADD
supports the following values for part
:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
. Equivalent to 7DAY
s.MONTH
QUARTER
YEAR
Special handling is required for MONTH, QUARTER, and YEAR parts when the date is at (or near) the last day of the month. If the resulting month has fewer days than the original DATETIME's day, then the result day is the last day of the new month.
Return Data Type
DATETIME
Example
SELECT
DATETIME "2008-12-25 15:30:00" as original_date,
DATETIME_ADD(DATETIME "2008-12-25 15:30:00", INTERVAL 10 MINUTE) as later;
/*-----------------------------+------------------------*
| original_date | later |
+-----------------------------+------------------------+
| 2008-12-25T15:30:00 | 2008-12-25T15:40:00 |
*-----------------------------+------------------------*/
DATETIME_DIFF
DATETIME_DIFF(end_datetime, start_datetime, granularity)
Description
Gets the number of unit boundaries between two DATETIME
values
(end_datetime
- start_datetime
) at a particular time granularity.
Definitions
start_datetime
: The startingDATETIME
value.end_datetime
: The endingDATETIME
value.granularity
: The datetime part that represents the granularity. If you have passed inDATETIME
values for the first arguments,granularity
can be:MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
: This date part begins on Sunday.WEEK(<WEEKDAY>)
: This date part begins onWEEKDAY
. Valid values forWEEKDAY
areSUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, andSATURDAY
.ISOWEEK
: Uses ISO 8601 week boundaries. ISO weeks begin on Monday.MONTH
QUARTER
YEAR
ISOYEAR
: Uses the ISO 8601 week-numbering year boundary. The ISO year boundary is the Monday of the first week whose Thursday belongs to the corresponding Gregorian calendar year.
Details
If end_datetime
is earlier than start_datetime
, the output is negative.
Produces an error if the computation overflows, such as if the difference
in microseconds
between the two DATETIME
values overflows.
Return Data Type
INT64
Example
SELECT
DATETIME "2010-07-07 10:20:00" as first_datetime,
DATETIME "2008-12-25 15:30:00" as second_datetime,
DATETIME_DIFF(DATETIME "2010-07-07 10:20:00",
DATETIME "2008-12-25 15:30:00", DAY) as difference;
/*----------------------------+------------------------+------------------------*
| first_datetime | second_datetime | difference |
+----------------------------+------------------------+------------------------+
| 2010-07-07T10:20:00 | 2008-12-25T15:30:00 | 559 |
*----------------------------+------------------------+------------------------*/
SELECT
DATETIME_DIFF(DATETIME '2017-10-15 00:00:00',
DATETIME '2017-10-14 00:00:00', DAY) as days_diff,
DATETIME_DIFF(DATETIME '2017-10-15 00:00:00',
DATETIME '2017-10-14 00:00:00', WEEK) as weeks_diff;
/*-----------+------------*
| days_diff | weeks_diff |
+-----------+------------+
| 1 | 1 |
*-----------+------------*/
The example above shows the result of DATETIME_DIFF
for two DATETIME
s that
are 24 hours apart. DATETIME_DIFF
with the part WEEK
returns 1 because
DATETIME_DIFF
counts the number of part boundaries in this range of
DATETIME
s. Each WEEK
begins on Sunday, so there is one part boundary between
Saturday, 2017-10-14 00:00:00
and Sunday, 2017-10-15 00:00:00
.
The following example shows the result of DATETIME_DIFF
for two dates in
different years. DATETIME_DIFF
with the date part YEAR
returns 3 because it
counts the number of Gregorian calendar year boundaries between the two
DATETIME
s. DATETIME_DIFF
with the date part ISOYEAR
returns 2 because the
second DATETIME
belongs to the ISO year 2015. The first Thursday of the 2015
calendar year was 2015-01-01, so the ISO year 2015 begins on the preceding
Monday, 2014-12-29.
SELECT
DATETIME_DIFF('2017-12-30 00:00:00',
'2014-12-30 00:00:00', YEAR) AS year_diff,
DATETIME_DIFF('2017-12-30 00:00:00',
'2014-12-30 00:00:00', ISOYEAR) AS isoyear_diff;
/*-----------+--------------*
| year_diff | isoyear_diff |
+-----------+--------------+
| 3 | 2 |
*-----------+--------------*/
The following example shows the result of DATETIME_DIFF
for two days in
succession. The first date falls on a Monday and the second date falls on a
Sunday. DATETIME_DIFF
with the date part WEEK
returns 0 because this time
part uses weeks that begin on Sunday. DATETIME_DIFF
with the date part
WEEK(MONDAY)
returns 1. DATETIME_DIFF
with the date part
ISOWEEK
also returns 1 because ISO weeks begin on Monday.
SELECT
DATETIME_DIFF('2017-12-18', '2017-12-17', WEEK) AS week_diff,
DATETIME_DIFF('2017-12-18', '2017-12-17', WEEK(MONDAY)) AS week_weekday_diff,
DATETIME_DIFF('2017-12-18', '2017-12-17', ISOWEEK) AS isoweek_diff;
/*-----------+-------------------+--------------*
| week_diff | week_weekday_diff | isoweek_diff |
+-----------+-------------------+--------------+
| 0 | 1 | 1 |
*-----------+-------------------+--------------*/
DATETIME_SUB
DATETIME_SUB(datetime_expression, INTERVAL int64_expression part)
Description
Subtracts int64_expression
units of part
from the DATETIME
.
DATETIME_SUB
supports the following values for part
:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
. Equivalent to 7DAY
s.MONTH
QUARTER
YEAR
Special handling is required for MONTH
, QUARTER
, and YEAR
parts when the
date is at (or near) the last day of the month. If the resulting month has fewer
days than the original DATETIME
's day, then the result day is the last day of
the new month.
Return Data Type
DATETIME
Example
SELECT
DATETIME "2008-12-25 15:30:00" as original_date,
DATETIME_SUB(DATETIME "2008-12-25 15:30:00", INTERVAL 10 MINUTE) as earlier;
/*-----------------------------+------------------------*
| original_date | earlier |
+-----------------------------+------------------------+
| 2008-12-25T15:30:00 | 2008-12-25T15:20:00 |
*-----------------------------+------------------------*/
DATETIME_TRUNC
DATETIME_TRUNC(datetime_value, datetime_granularity)
DATETIME_TRUNC(timestamp_value, timestamp_granularity[, time_zone])
Description
Truncates a DATETIME
or TIMESTAMP
value at a particular granularity.
Definitions
datetime_value
: ADATETIME
value to truncate.datetime_granularity
: The truncation granularity for aDATETIME
value. Date granularities and time granularities can be used.timestamp_value
: ATIMESTAMP
value to truncate.timestamp_granularity
: The truncation granularity for aTIMESTAMP
value. Date granularities and time granularities can be used.time_zone
: A time zone to use with theTIMESTAMP
value. Time zone parts can be used. Use this argument if you want to use a time zone other than the default time zone, UTC, as part of the truncate operation.
Date granularity definitions
DAY
: The day in the Gregorian calendar year that contains the value to truncate.WEEK
: The first day in the week that contains the value to truncate. Weeks begin on Sundays.WEEK
is equivalent toWEEK(SUNDAY)
.WEEK(WEEKDAY)
: The first day in the week that contains the value to truncate. Weeks begin onWEEKDAY
.WEEKDAY
must be one of the following:SUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, orSATURDAY
.ISOWEEK
: The first day in the ISO 8601 week that contains the value to truncate. The ISO week begins on Monday. The first ISO week of each ISO year contains the first Thursday of the corresponding Gregorian calendar year.MONTH
: The first day in the month that contains the value to truncate.QUARTER
: The first day in the quarter that contains the value to truncate.YEAR
: The first day in the year that contains the value to truncate.ISOYEAR
: The first day in the ISO 8601 week-numbering year that contains the value to truncate. The ISO year is the Monday of the first week where Thursday belongs to the corresponding Gregorian calendar year.
Time granularity definitions
MICROSECOND
: If used, nothing is truncated from the value.MILLISECOND
: The nearest lesser than or equal millisecond.SECOND
: The nearest lesser than or equal second.MINUTE
: The nearest lesser than or equal minute.HOUR
: The nearest lesser than or equal hour.
Time zone part definitions
MINUTE
HOUR
DAY
WEEK
WEEK(<WEEKDAY>)
ISOWEEK
MONTH
QUARTER
YEAR
ISOYEAR
Details
The resulting value is always rounded to the beginning of granularity
.
Return Data Type
The same data type as the first argument passed into this function.
Examples
SELECT
DATETIME "2008-12-25 15:30:00" as original,
DATETIME_TRUNC(DATETIME "2008-12-25 15:30:00", DAY) as truncated;
/*----------------------------+------------------------*
| original | truncated |
+----------------------------+------------------------+
| 2008-12-25T15:30:00 | 2008-12-25T00:00:00 |
*----------------------------+------------------------*/
In the following example, the original DATETIME
falls on a Sunday. Because the
part
is WEEK(MONDAY)
, DATE_TRUNC
returns the DATETIME
for the
preceding Monday.
SELECT
datetime AS original,
DATETIME_TRUNC(datetime, WEEK(MONDAY)) AS truncated
FROM (SELECT DATETIME(TIMESTAMP "2017-11-05 00:00:00+00", "UTC") AS datetime);
/*---------------------+---------------------*
| original | truncated |
+---------------------+---------------------+
| 2017-11-05T00:00:00 | 2017-10-30T00:00:00 |
*---------------------+---------------------*/
In the following example, the original datetime_expression
is in the Gregorian
calendar year 2015. However, DATETIME_TRUNC
with the ISOYEAR
date part
truncates the datetime_expression
to the beginning of the ISO year, not the
Gregorian calendar year. The first Thursday of the 2015 calendar year was
2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29.
Therefore the ISO year boundary preceding the datetime_expression
2015-06-15 00:00:00 is 2014-12-29.
SELECT
DATETIME_TRUNC('2015-06-15 00:00:00', ISOYEAR) AS isoyear_boundary,
EXTRACT(ISOYEAR FROM DATETIME '2015-06-15 00:00:00') AS isoyear_number;
/*---------------------+----------------*
| isoyear_boundary | isoyear_number |
+---------------------+----------------+
| 2014-12-29T00:00:00 | 2015 |
*---------------------+----------------*/
EXTRACT
EXTRACT(part FROM datetime_expression)
Description
Returns a value that corresponds to the
specified part
from a supplied datetime_expression
.
Allowed part
values are:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAYOFWEEK
: Returns values in the range [1,7] with Sunday as the first day of of the week.DAY
DAYOFYEAR
WEEK
: Returns the week number of the date in the range [0, 53]. Weeks begin with Sunday, and dates prior to the first Sunday of the year are in week 0.WEEK(<WEEKDAY>)
: Returns the week number ofdatetime_expression
in the range [0, 53]. Weeks begin onWEEKDAY
.datetime
s prior to the firstWEEKDAY
of the year are in week 0. Valid values forWEEKDAY
areSUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, andSATURDAY
.ISOWEEK
: Returns the ISO 8601 week number of thedatetime_expression
.ISOWEEK
s begin on Monday. Return values are in the range [1, 53]. The firstISOWEEK
of each ISO year begins on the Monday before the first Thursday of the Gregorian calendar year.MONTH
QUARTER
YEAR
ISOYEAR
: Returns the ISO 8601 week-numbering year, which is the Gregorian calendar year containing the Thursday of the week to whichdate_expression
belongs.DATE
TIME
Returned values truncate lower order time periods. For example, when extracting
seconds, EXTRACT
truncates the millisecond and microsecond values.
Return Data Type
INT64
, except in the following cases:
- If
part
isDATE
, returns aDATE
object. - If
part
isTIME
, returns aTIME
object.
Examples
In the following example, EXTRACT
returns a value corresponding to the HOUR
time part.
SELECT EXTRACT(HOUR FROM DATETIME(2008, 12, 25, 15, 30, 00)) as hour;
/*------------------*
| hour |
+------------------+
| 15 |
*------------------*/
In the following example, EXTRACT
returns values corresponding to different
time parts from a column of datetimes.
WITH Datetimes AS (
SELECT DATETIME '2005-01-03 12:34:56' AS datetime UNION ALL
SELECT DATETIME '2007-12-31' UNION ALL
SELECT DATETIME '2009-01-01' UNION ALL
SELECT DATETIME '2009-12-31' UNION ALL
SELECT DATETIME '2017-01-02' UNION ALL
SELECT DATETIME '2017-05-26'
)
SELECT
datetime,
EXTRACT(ISOYEAR FROM datetime) AS isoyear,
EXTRACT(ISOWEEK FROM datetime) AS isoweek,
EXTRACT(YEAR FROM datetime) AS year,
EXTRACT(WEEK FROM datetime) AS week
FROM Datetimes
ORDER BY datetime;
/*---------------------+---------+---------+------+------*
| datetime | isoyear | isoweek | year | week |
+---------------------+---------+---------+------+------+
| 2005-01-03T12:34:56 | 2005 | 1 | 2005 | 1 |
| 2007-12-31T00:00:00 | 2008 | 1 | 2007 | 52 |
| 2009-01-01T00:00:00 | 2009 | 1 | 2009 | 0 |
| 2009-12-31T00:00:00 | 2009 | 53 | 2009 | 52 |
| 2017-01-02T00:00:00 | 2017 | 1 | 2017 | 1 |
| 2017-05-26T00:00:00 | 2017 | 21 | 2017 | 21 |
*---------------------+---------+---------+------+------*/
In the following example, datetime_expression
falls on a Sunday. EXTRACT
calculates the first column using weeks that begin on Sunday, and it calculates
the second column using weeks that begin on Monday.
WITH table AS (SELECT DATETIME(TIMESTAMP "2017-11-05 00:00:00+00", "UTC") AS datetime)
SELECT
datetime,
EXTRACT(WEEK(SUNDAY) FROM datetime) AS week_sunday,
EXTRACT(WEEK(MONDAY) FROM datetime) AS week_monday
FROM table;
/*---------------------+-------------+---------------*
| datetime | week_sunday | week_monday |
+---------------------+-------------+---------------+
| 2017-11-05T00:00:00 | 45 | 44 |
*---------------------+-------------+---------------*/
FORMAT_DATETIME
FORMAT_DATETIME(format_string, datetime_expr)
Description
Formats a DATETIME
value according to a specified format string.
Definitions
format_string
: ASTRING
value that contains the format elements to use withdatetime_expr
.datetime_expr
: ADATETIME
value that represents the date and time to format.
Return Data Type
STRING
Examples
SELECT
FORMAT_DATETIME("%c", DATETIME "2008-12-25 15:30:00")
AS formatted;
/*--------------------------*
| formatted |
+--------------------------+
| Thu Dec 25 15:30:00 2008 |
*--------------------------*/
SELECT
FORMAT_DATETIME("%b-%d-%Y", DATETIME "2008-12-25 15:30:00")
AS formatted;
/*-------------*
| formatted |
+-------------+
| Dec-25-2008 |
*-------------*/
SELECT
FORMAT_DATETIME("%b %Y", DATETIME "2008-12-25 15:30:00")
AS formatted;
/*-------------*
| formatted |
+-------------+
| Dec 2008 |
*-------------*/
LAST_DAY
LAST_DAY(datetime_expression[, date_part])
Description
Returns the last day from a datetime expression that contains the date. This is commonly used to return the last day of the month.
You can optionally specify the date part for which the last day is returned.
If this parameter is not used, the default value is MONTH
.
LAST_DAY
supports the following values for date_part
:
YEAR
QUARTER
MONTH
WEEK
. Equivalent to 7DAY
s.WEEK(<WEEKDAY>)
.<WEEKDAY>
represents the starting day of the week. Valid values areSUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, andSATURDAY
.ISOWEEK
. Uses ISO 8601 week boundaries. ISO weeks begin on Monday.ISOYEAR
. Uses the ISO 8601 week-numbering year boundary. The ISO year boundary is the Monday of the first week whose Thursday belongs to the corresponding Gregorian calendar year.
Return Data Type
DATE
Example
These both return the last day of the month:
SELECT LAST_DAY(DATETIME '2008-11-25', MONTH) AS last_day
/*------------*
| last_day |
+------------+
| 2008-11-30 |
*------------*/
SELECT LAST_DAY(DATETIME '2008-11-25') AS last_day
/*------------*
| last_day |
+------------+
| 2008-11-30 |
*------------*/
This returns the last day of the year:
SELECT LAST_DAY(DATETIME '2008-11-25 15:30:00', YEAR) AS last_day
/*------------*
| last_day |
+------------+
| 2008-12-31 |
*------------*/
This returns the last day of the week for a week that starts on a Sunday:
SELECT LAST_DAY(DATETIME '2008-11-10 15:30:00', WEEK(SUNDAY)) AS last_day
/*------------*
| last_day |
+------------+
| 2008-11-15 |
*------------*/
This returns the last day of the week for a week that starts on a Monday:
SELECT LAST_DAY(DATETIME '2008-11-10 15:30:00', WEEK(MONDAY)) AS last_day
/*------------*
| last_day |
+------------+
| 2008-11-16 |
*------------*/
PARSE_DATETIME
PARSE_DATETIME(format_string, datetime_string)
Description
Converts a STRING
value to a DATETIME
value.
Definitions
format_string
: ASTRING
value that contains the format elements to use withdatetime_string
.datetime_string
: ASTRING
value that represents the date and time to parse.
Details
Each element in datetime_string
must have a corresponding element in
format_string
. The location of each element in format_string
must match the
location of each element in datetime_string
.
-- This works because elements on both sides match.
SELECT PARSE_DATETIME("%a %b %e %I:%M:%S %Y", "Thu Dec 25 07:30:00 2008");
-- This produces an error because the year element is in different locations.
SELECT PARSE_DATETIME("%a %b %e %Y %I:%M:%S", "Thu Dec 25 07:30:00 2008");
-- This produces an error because one of the year elements is missing.
SELECT PARSE_DATETIME("%a %b %e %I:%M:%S", "Thu Dec 25 07:30:00 2008");
-- This works because %c can find all matching elements in datetime_string.
SELECT PARSE_DATETIME("%c", "Thu Dec 25 07:30:00 2008");
The format string fully supports most format elements, except for
%P
.
PARSE_DATETIME
parses string
according to the following rules:
- Unspecified fields. Any unspecified field is initialized from
1970-01-01 00:00:00.0
. For example, if the year is unspecified then it defaults to1970
. - Case insensitivity. Names, such as
Monday
andFebruary
, are case insensitive. - Whitespace. One or more consecutive white spaces in the format string
matches zero or more consecutive white spaces in the
DATETIME
string. Leading and trailing white spaces in theDATETIME
string are always allowed, even if they are not in the format string. - Format precedence. When two or more format elements have overlapping
information, the last one generally overrides any earlier ones, with some
exceptions. For example, both
%F
and%Y
affect the year, so the earlier element overrides the later. See the descriptions of%s
,%C
, and%y
in Supported Format Elements For DATETIME. - Format divergence.
%p
can be used witham
,AM
,pm
, andPM
.
Return Data Type
DATETIME
Examples
The following examples parse a STRING
literal as a
DATETIME
.
SELECT PARSE_DATETIME('%Y-%m-%d %H:%M:%S', '1998-10-18 13:45:55') AS datetime;
/*---------------------*
| datetime |
+---------------------+
| 1998-10-18T13:45:55 |
*---------------------*/
SELECT PARSE_DATETIME('%m/%d/%Y %I:%M:%S %p', '8/30/2018 2:23:38 pm') AS datetime;
/*---------------------*
| datetime |
+---------------------+
| 2018-08-30T14:23:38 |
*---------------------*/
The following example parses a STRING
literal
containing a date in a natural language format as a
DATETIME
.
SELECT PARSE_DATETIME('%A, %B %e, %Y','Wednesday, December 19, 2018')
AS datetime;
/*---------------------*
| datetime |
+---------------------+
| 2018-12-19T00:00:00 |
*---------------------*/
Debugging functions
GoogleSQL for BigQuery supports the following debugging functions.
Function list
Name | Summary |
---|---|
ERROR
|
Produces an error with a custom error message. |
ERROR
ERROR(error_message)
Description
Returns an error.
Definitions
error_message
: ASTRING
value that represents the error message to produce. Any whitespace characters beyond a single space are trimmed from the results.
Details
ERROR
is treated like any other expression that may
result in an error: there is no special guarantee of evaluation order.
Return Data Type
GoogleSQL infers the return type in context.
Examples
In the following example, the query produces an error message:
-- ERROR: Show this error message (while evaluating error("Show this error message"))
SELECT ERROR('Show this error message')
In the following example, the query returns an error message if the value of the row does not match one of two defined values.
SELECT
CASE
WHEN value = 'foo' THEN 'Value is foo.'
WHEN value = 'bar' THEN 'Value is bar.'
ELSE ERROR(CONCAT('Found unexpected value: ', value))
END AS new_value
FROM (
SELECT 'foo' AS value UNION ALL
SELECT 'bar' AS value UNION ALL
SELECT 'baz' AS value);
-- Found unexpected value: baz
In the following example, GoogleSQL may evaluate the ERROR
function
before or after the x > 0
WHERE
clause conditions and
there are no special guarantees for the ERROR
function.
SELECT *
FROM (SELECT -1 AS x)
WHERE x > 0 AND ERROR('Example error');
In the next example, the WHERE
clause evaluates an IF
condition, which
ensures that GoogleSQL only evaluates the ERROR
function if the
condition fails.
SELECT *
FROM (SELECT -1 AS x)
WHERE IF(x > 0, true, ERROR(FORMAT('Error: x must be positive but is %t', x)));
-- Error: x must be positive but is -1
Differentially private aggregate functions
GoogleSQL for BigQuery supports differentially private aggregate functions. For an explanation of how aggregate functions work, see Aggregate function calls.
You can only use differentially private aggregate functions with differentially private queries in a differential privacy clause.
Function list
Name | Summary |
---|---|
AVG
|
DIFFERENTIAL_PRIVACY -supported AVG .Gets the differentially-private average of non- NULL ,
non-NaN values in a query with a
DIFFERENTIAL_PRIVACY clause.
|
COUNT
|
DIFFERENTIAL_PRIVACY -supported COUNT .Signature 1: Gets the differentially-private count of rows in a query with a DIFFERENTIAL_PRIVACY clause.
Signature 2: Gets the differentially-private count of rows with a non- NULL expression in a query with a
DIFFERENTIAL_PRIVACY clause.
|
PERCENTILE_CONT
|
DIFFERENTIAL_PRIVACY -supported PERCENTILE_CONT .Computes a differentially-private percentile across privacy unit columns in a query with a DIFFERENTIAL_PRIVACY clause.
|
SUM
|
DIFFERENTIAL_PRIVACY -supported SUM .Gets the differentially-private sum of non- NULL ,
non-NaN values in a query with a
DIFFERENTIAL_PRIVACY clause.
|
AVG
(DIFFERENTIAL_PRIVACY
)
WITH DIFFERENTIAL_PRIVACY ...
AVG(
expression,
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
)
Description
Returns the average of non-NULL
, non-NaN
values in the expression.
This function first computes the average per privacy unit column, and then
computes the final result by averaging these averages.
This function must be used with the DIFFERENTIAL_PRIVACY
clause
and can support the following arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.contribution_bounds_per_group
: A named argument with a contribution bound. Performs clamping for each group separately before performing intermediate grouping on the privacy unit column.
Return type
FLOAT64
Examples
The following differentially private query gets the average number of each item
requested per professor. Smaller aggregations might not be included. This query
references a table called professors
.
-- With noise, using the epsilon parameter.
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+------------------*
| item | average_quantity |
+----------+------------------+
| pencil | 38.5038356810269 |
| pen | 13.4725028762032 |
*----------+------------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
AVG(quantity) average_quantity
FROM professors
GROUP BY item;
-- These results will not change when you run the query.
/*----------+------------------*
| item | average_quantity |
+----------+------------------+
| scissors | 8 |
| pencil | 40 |
| pen | 18.5 |
*----------+------------------*/
COUNT
(DIFFERENTIAL_PRIVACY
)
- Signature 1: Returns the number of rows in a
differentially private
FROM
clause. - Signature 2: Returns the number of non-
NULL
values in an expression.
Signature 1
WITH DIFFERENTIAL_PRIVACY ...
COUNT(
*,
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
)
Description
Returns the number of rows in the
differentially private FROM
clause. The final result
is an aggregation across a privacy unit column.
This function must be used with the DIFFERENTIAL_PRIVACY
clause
and can support the following argument:
contribution_bounds_per_group
: A named argument with a contribution bound. Performs clamping for each group separately before performing intermediate grouping on the privacy unit column.
Return type
INT64
Examples
The following differentially private query counts the number of requests for
each item. This query references a table called
professors
.
-- With noise, using the epsilon parameter.
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
| item | times_requested |
+----------+-----------------+
| pencil | 5 |
| pen | 2 |
*----------+-----------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
COUNT(*, contribution_bounds_per_group=>(0, 100)) times_requested
FROM professors
GROUP BY item;
-- These results will not change when you run the query.
/*----------+-----------------*
| item | times_requested |
+----------+-----------------+
| scissors | 1 |
| pencil | 4 |
| pen | 3 |
*----------+-----------------*/
Signature 2
WITH DIFFERENTIAL_PRIVACY ...
COUNT(
expression,
[contribution_bounds_per_group => (lower_bound, upper_bound)]
)
Description
Returns the number of non-NULL
expression values. The final result is an
aggregation across a privacy unit column.
This function must be used with the DIFFERENTIAL_PRIVACY
clause
and can support these arguments:
expression
: The input expression. This expression can be any numeric input type, such asINT64
.contribution_bounds_per_group
: A named argument with a contribution bound. Performs clamping per each group separately before performing intermediate grouping on the privacy unit column.
Return type
INT64
Examples
The following differentially private query counts the number of requests made
for each type of item. This query references a table called
professors
.
-- With noise, using the epsilon parameter.
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
COUNT(item, contribution_bounds_per_group => (0,100)) times_requested
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------------*
| item | times_requested |
+----------+-----------------+
| pencil | 5 |
| pen | 2 |
*----------+-----------------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
COUNT(item, contribution_bounds_per_group => (0,100)) times_requested
FROM professors
GROUP BY item;
-- These results will not change when you run the query.
/*----------+-----------------*
| item | times_requested |
+----------+-----------------+
| scissors | 1 |
| pencil | 4 |
| pen | 3 |
*----------+-----------------*/
PERCENTILE_CONT
(DIFFERENTIAL_PRIVACY
)
WITH DIFFERENTIAL_PRIVACY ...
PERCENTILE_CONT(
expression,
percentile,
contribution_bounds_per_row => (lower_bound, upper_bound)
)
Description
Takes an expression and computes a percentile for it. The final result is an aggregation across privacy unit columns.
This function must be used with the DIFFERENTIAL_PRIVACY
clause
and can support these arguments:
expression
: The input expression. This can be most numeric input types, such asINT64
.NULL
values are always ignored.percentile
: The percentile to compute. The percentile must be a literal in the range[0, 1]
.contribution_bounds_per_row
: A named argument with a contribution bounds. Performs clamping for each row separately before performing intermediate grouping on the privacy unit column.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
If you need them, cast them as the
FLOAT64
data type first.
Return type
FLOAT64
Examples
The following differentially private query gets the percentile of items
requested. Smaller aggregations might not be included. This query references a
view called professors
.
-- With noise, using the epsilon parameter.
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row => (0,100)) percentile_requested
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+----------------------*
| item | percentile_requested |
+----------+----------------------+
| pencil | 72.00011444091797 |
| scissors | 8.000175476074219 |
| pen | 23.001075744628906 |
*----------+----------------------*/
SUM
(DIFFERENTIAL_PRIVACY
)
WITH DIFFERENTIAL_PRIVACY ...
SUM(
expression,
[ contribution_bounds_per_group => (lower_bound, upper_bound) ]
)
Description
Returns the sum of non-NULL
, non-NaN
values in the expression. The final
result is an aggregation across privacy unit columns.
This function must be used with the DIFFERENTIAL_PRIVACY
clause
and can support these arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.NULL
values are always ignored.contribution_bounds_per_group
: A named argument with a contribution bound. Performs clamping for each group separately before performing intermediate grouping on the privacy unit column.
Return type
One of the following supertypes:
INT64
FLOAT64
Examples
The following differentially private query gets the sum of items requested.
Smaller aggregations might not be included. This query references a view called
professors
.
-- With noise, using the epsilon parameter.
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
SUM(quantity, contribution_bounds_per_group => (0,100)) quantity
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations might be removed.
/*----------+-----------*
| item | quantity |
+----------+-----------+
| pencil | 143 |
| pen | 59 |
*----------+-----------*/
-- Without noise, using the epsilon parameter.
-- (this un-noised version is for demonstration only)
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
SUM(quantity) quantity
FROM professors
GROUP BY item;
-- These results will not change when you run the query.
/*----------+----------*
| item | quantity |
+----------+----------+
| scissors | 8 |
| pencil | 144 |
| pen | 58 |
*----------+----------*/
Supplemental materials
Clamp values in a differentially private aggregate function
In differentially private queries, aggregation clamping is used to limit the contribution of outliers. You can clamp explicitly or implicitly as follows:
- Clamp explicitly in the
DIFFERENTIAL_PRIVACY
clause. - Clamp implicitly in the
DIFFERENTIAL_PRIVACY
clause.
Implicitly clamp values
If you don't include the contribution bounds named argument with the
DIFFERENTIAL_PRIVACY
clause, clamping is implicit, which
means bounds are derived from the data itself in a differentially private way.
Implicit bounding works best when computed using large datasets. For more information, see Implicit bounding limitations for small datasets.
Details
In differentially private aggregate functions, explicit clamping is optional. If you don't include this clause, clamping is implicit, which means bounds are derived from the data itself in a differentially private way. The process is somewhat random, so aggregations with identical ranges can have different bounds.
Implicit bounds are determined for each aggregation. So if some aggregations have a wide range of values, and others have a narrow range of values, implicit bounding can identify different bounds for different aggregations as appropriate. Implicit bounds might be an advantage or a disadvantage depending on your use case. Different bounds for different aggregations can result in lower error. Different bounds also means that different aggregations have different levels of uncertainty, which might not be directly comparable. Explicit bounds, on the other hand, apply uniformly to all aggregations and should be derived from public information.
When clamping is implicit, part of the total epsilon is spent picking bounds. This leaves less epsilon for aggregations, so these aggregations are noisier.
Example
The following anonymized query clamps each aggregate contribution for each
differential privacy ID and within a derived range from the data itself.
As long as all or most values fall within this range, your results
will be accurate. This query references a view called
view_on_professors
.
--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
OPTIONS (
epsilon = 1e20,
delta = .01,
privacy_unit_column=id
)
item,
AVG(quantity) average_quantity
FROM view_on_professors
GROUP BY item;
/*----------+------------------*
| item | average_quantity |
+----------+------------------+
| scissors | 8 |
| pencil | 72 |
| pen | 18.5 |
*----------+------------------*/
Explicitly clamp values
contribution_bounds_per_group => (lower_bound,upper_bound)
contribution_bounds_per_row => (lower_bound,upper_bound)
Use the contribution bounds named argument to explicitly clamp
values per group or per row between a lower and upper bound in a
DIFFERENTIAL_PRIVACY
clause.
Input values:
contribution_bounds_per_row
: Contributions per privacy unit are clamped on a per-row (per-record) basis. This means the following:- Upper and lower bounds are applied to column values in individual rows produced by the input subquery independently.
- The maximum possible contribution per privacy unit (and per grouping set)
is the product of the per-row contribution limit and
max_groups_contributed
differential privacy parameter.
contribution_bounds_per_group
: Contributions per privacy unit are clamped on a unique set of entity-specifiedGROUP BY
keys. The upper and lower bounds are applied to values per group after the values are aggregated per privacy unit.lower_bound
: Numeric literal that represents the smallest value to include in an aggregation.upper_bound
: Numeric literal that represents the largest value to include in an aggregation.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
Details
In differentially private aggregate functions, clamping explicitly clamps the total contribution from each privacy unit column to within a specified range.
Explicit bounds are uniformly applied to all aggregations. So even if some aggregations have a wide range of values, and others have a narrow range of values, the same bounds are applied to all of them. On the other hand, when implicit bounds are inferred from the data, the bounds applied to each aggregation can be different.
Explicit bounds should be chosen to reflect public information. For example, bounding ages between 0 and 100 reflects public information because the age of most people generally falls within this range.
Examples
The following anonymized query clamps each aggregate contribution for each
differential privacy ID and within a specified range (0
and 100
).
As long as all or most values fall within this range, your results
will be accurate. This query references a view called
view_on_professors
.
--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
OPTIONS (
epsilon = 1e20,
delta = .01,
privacy_unit_column=id
)
item,
AVG(quantity, contribution_bounds_per_group=>(0,100)) AS average_quantity
FROM view_on_professors
GROUP BY item;
/*----------+------------------*
| item | average_quantity |
+----------+------------------+
| scissors | 8 |
| pencil | 40 |
| pen | 18.5 |
*----------+------------------*/
Notice what happens when most or all values fall outside of the clamped range. To get accurate results, ensure that the difference between the upper and lower bound is as small as possible, and that most inputs are between the upper and lower bound.
--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
OPTIONS (
epsilon = 1e20,
delta = .01,
privacy_unit_column=id
)
item,
AVG(quantity, contribution_bounds_per_group=>(50,100)) AS average_quantity
FROM view_on_professors
GROUP BY item;
/*----------+------------------*
| item | average_quantity |
+----------+------------------+
| scissors | 54 |
| pencil | 58 |
| pen | 51 |
*----------+------------------*/
DLP encryption functions
GoogleSQL for BigQuery supports the following DLP functions that allow interoperable encryption and decryption between BigQuery and Cloud Data Loss Prevention (Cloud DLP), using AES-SIV. To use DLP functions, you need a new cryptographic key and then use that key to get a wrapped key.
Function list
Name | Summary |
---|---|
DLP_DETERMINISTIC_ENCRYPT
|
Encrypts data with a DLP compatible algorithm. |
DLP_DETERMINISTIC_DECRYPT
|
Decrypts DLP-encrypted data. |
DLP_KEY_CHAIN
|
Gets a data encryption key that is wrapped by Cloud Key Management Service. |
DLP_DETERMINISTIC_ENCRYPT
DLP_DETERMINISTIC_ENCRYPT(key, plaintext, surrogate)
DLP_DETERMINISTIC_ENCRYPT(key, plaintext, surrogate, context)
Description
This function derives a data encryption key from key
and context
, and then
encrypts plaintext
. You can use surrogate
to prepend the
encryption result. To use DLP functions, you need a
new cryptographic key and then use that key to get a wrapped key.
Definitions
key
: A serializedBYTES
value that is returned byDLP_KEY_CHAIN
.key
must be set toENABLED
in Cloud KMS. For information about how to generate a wrapped key, see gcloud kms encrypt.plaintext
: TheSTRING
value to encrypt.surrogate
: ASTRING
value that you can prepend to output. If you don't want to usesurrogate
, pass an empty string (enclosed in""
).context
: A user-providedSTRING
value that is used with a Cloud KMS key to derive a data encryption key. For more information, see CryptoDeterministicConfig:context.
Return data type
STRING
Examples
In the following query, the wrapped key is presented in a BYTES
literal format:
SELECT
DLP_DETERMINISTIC_ENCRYPT(
DLP_KEY_CHAIN(
'gcp-kms://projects/myproject/locations/us/keyRings/kms-test/cryptoKeys/test-KEK',
b'\012\044\000\325\155\264\153\246\071\172\130\372\305\103\047\342\356\061\077\014\030\126\147\041\126\150\012\036\020\202\215\044\267\310\331\014\116\233\022\071\000\363\344\230\067\274\007\340\273\016\212\151\226\064\200\377\303\207\103\147\052\267\035\350\004\147\365\251\271\133\062\251\246\152\177\017\005\270\044\141\211\116\337\043\035\263\122\340\110\333\266\220\377\247\204\215\233'),
'Plaintext',
'',
'aad') AS results;
/*--------------------------------------*
| results |
+--------------------------------------+
| AWDeSznl9C7+NzTaCgiqiEAZ8Y55fZSuvCQ= |
*--------------------------------------*/
In the following query, the wrapped key is presented in the base64 format:
DECLARE DLP_KEY_VALUE BYTES;
SET DLP_KEY_VALUE =
FROM_BASE64(
'CiQA1W20a6Y5elj6xUMn4u4xPwwYVmchVmgKHhCCjSS3yNkMTpsSOQDz5Jg3vAfguw6KaZY0gP/Dh0NnKrcd6ARn9am5WzKppmp/DwW4JGGJTt8jHbNS4EjbtpD/p4SNmw==');
SELECT
DLP_DETERMINISTIC_ENCRYPT(
DLP_KEY_CHAIN(
'gcp-kms://projects/myproject/locations/us/keyRings/kms-test/cryptoKeys/test-Kek',
DLP_KEY_VALUE),
'Plaintext',
'your_surrogate',
'aad') AS results;
/*---------------------------------------------------------*
| results |
+---------------------------------------------------------+
| your_surrogate(36):AWDeSznl9C7+NzTaCgiqiEAZ8Y55fZSuvCQ= |
*---------------------------------------------------------*/
DLP_DETERMINISTIC_DECRYPT
DLP_DETERMINISTIC_DECRYPT(key, ciphertext, surrogate)
DLP_DETERMINISTIC_DECRYPT(key, ciphertext, surrogate, context)
Description
This function decrypts ciphertext
using an encryption key derived from key
and context
. You can use surrogate
to prepend the decryption
result. To use DLP functions, you need a
new cryptographic key and then use that key to get a wrapped key.
Definitions
key
: A serializedBYTES
value returned byDLP_KEY_CHAIN
.key
must be set toENABLED
in Cloud KMS. For information about how to generate a wrapped key, see gcloud kms encrypt.ciphertext
: TheSTRING
value to decrypt.surrogate
: ASTRING
value that you can prepend to output. If you don't want to usesurrogate
, pass an empty string (enclosed in""
).context
: ASTRING
value that is used with a Cloud KMS key to derive a data encryption key. For more information, see CryptoDeterministicConfig:context.
Return data type
STRING
Examples
In the following query, the wrapped key is presented in a BYTES
literal format:
SELECT
DLP_DETERMINISTIC_DECRYPT(
DLP_KEY_CHAIN(
'gcp-kms://projects/myproject/locations/us/keyRings/kms-test/cryptoKeys/test-Kek',
b'\012\044\000\325\155\264\153\246\071\172\130\372\305\103\047\342\356\061\077\014\030\126\147\041\126\150\012\036\020\202\215\044\267\310\331\014\116\233\022\071\000\363\344\230\067\274\007\340\273\016\212\151\226\064\200\377\303\207\103\147\052\267\035\350\004\147\365\251\271\133\062\251\246\152\177\017\005\270\044\141\211\116\337\043\035\263\122\340\110\333\266\220\377\247\204\215\233'),
'AWDeSznl9C7+NzTaCgiqiEAZ8Y55fZSuvCQ=',
'',
'aad') AS results;
/*--------------------------------------*
| results |
+--------------------------------------+
| Plaintext |
*--------------------------------------*/
In the following query, the wrapped key is presented in the base64 format:
DECLARE DLP_KEY_VALUE BYTES;
SET DLP_KEY_VALUE =
FROM_BASE64(
'CiQA1W20a6Y5elj6xUMn4u4xPwwYVmchVmgKHhCCjSS3yNkMTpsSOQDz5Jg3vAfguw6KaZY0gP/Dh0NnKrcd6ARn9am5WzKppmp/DwW4JGGJTt8jHbNS4EjbtpD/p4SNmw==');
SELECT
DLP_DETERMINISTIC_DECRYPT(
DLP_KEY_CHAIN(
'gcp-kms://projects/myproject/locations/us/keyRings/kms-test/cryptoKeys/test-Kek',
DLP_KEY_VALUE),
'your_surrogate(36):AWDeSznl9C7+NzTaCgiqiEAZ8Y55fZSuvCQ=',
'your_surrogate',
'aad') AS results;
/*--------------------------------------*
| results |
+--------------------------------------+
| Plaintext |
*--------------------------------------*/
DLP_KEY_CHAIN
DLP_KEY_CHAIN(kms_resource_name, wrapped_key)
Description
You can use this function instead of the key
argument for
DLP deterministic encryption functions. This function lets
you use the AES-SIV encryption functions
without including plaintext
keys in a query. To use DLP functions, you need a
new cryptographic key and then use that key to get a wrapped key.
Definitions
kms_resource_name
: ASTRING
literal that contains the resource path to the Cloud KMS key.kms_resource_name
cannot beNULL
and must reside in the same Cloud region where this function is executed. This argument is used to derive the data encryption key in theDLP_DETERMINISTIC_DECRYPT
andDLP_DETERMINISTIC_ENCRYPT
functions. A Cloud KMS key looks like this:gcp-kms://projects/my-project/locations/us/keyRings/my-key-ring/cryptoKeys/my-crypto-key
wrapped_key
: ABYTES
literal that represents a secret text chosen by the user. This secret text can be 16, 24, or 32 bytes. For information about how to generate a wrapped key, see gcloud kms encrypt.
Return data type
STRUCT
Examples
In the following query, the wrapped key is presented in a BYTES
literal format:
SELECT
DLP_DETERMINISTIC_ENCRYPT(
DLP_KEY_CHAIN(
'gcp-kms://projects/myproject/locations/us/keyRings/kms-test/cryptoKeys/test-Kek',
b'\012\044\000\325\155\264\153\246\071\172\130\372\305\103\047\342\356\061\077\014\030\126\147\041\126\150\012\036\020\202\215\044\267\310\331\014\116\233\022\071\000\363\344\230\067\274\007\340\273\016\212\151\226\064\200\377\303\207\103\147\052\267\035\350\004\147\365\251\271\133\062\251\246\152\177\017\005\270\044\141\211\116\337\043\035\263\122\340\110\333\266\220\377\247\204\215\233'),
'Plaintext',
'',
'aad') AS results;
/*--------------------------------------*
| results |
+--------------------------------------+
| AWDeSznl9C7+NzTaCgiqiEAZ8Y55fZSuvCQ= |
*--------------------------------------*/
In the following query, the wrapped key is presented in the base64 format:
DECLARE DLP_KEY_VALUE BYTES;
SET DLP_KEY_VALUE =
FROM_BASE64(
'CiQA1W20a6Y5elj6xUMn4u4xPwwYVmchVmgKHhCCjSS3yNkMTpsSOQDz5Jg3vAfguw6KaZY0gP/Dh0NnKrcd6ARn9am5WzKppmp/DwW4JGGJTt8jHbNS4EjbtpD/p4SNmw==');
SELECT
DLP_DETERMINISTIC_ENCRYPT(
DLP_KEY_CHAIN(
'gcp-kms://projects/myproject/locations/us/keyRings/kms-test/cryptoKeys/test-Kek',
DLP_KEY_VALUE),
'Plaintext',
'',
'aad') AS results;
/*--------------------------------------*
| results |
+--------------------------------------+
| AWDeSznl9C7+NzTaCgiqiEAZ8Y55fZSuvCQ= |
*--------------------------------------*/
Geography functions
GoogleSQL for BigQuery supports geography functions.
Geography functions operate on or generate GoogleSQL
GEOGRAPHY
values. The signature of most geography
functions starts with ST_
. GoogleSQL for BigQuery supports the following functions
that can be used to analyze geographical data, determine spatial relationships
between geographical features, and construct or manipulate
GEOGRAPHY
s.
All GoogleSQL geography functions return NULL
if any input argument
is NULL
.
Categories
The geography functions are grouped into the following categories based on their behavior:
Category | Functions | Description |
Constructors |
ST_GEOGPOINT ST_MAKELINE ST_MAKEPOLYGON ST_MAKEPOLYGONORIENTED
|
Functions that build new geography values from coordinates or existing geographies. |
Parsers |
ST_GEOGFROM ST_GEOGFROMGEOJSON ST_GEOGFROMTEXT ST_GEOGFROMWKB ST_GEOGPOINTFROMGEOHASH |
Functions that create geographies from an external format such as WKT and GeoJSON. |
Formatters |
ST_ASBINARY ST_ASGEOJSON ST_ASTEXT ST_GEOHASH
|
Functions that export geographies to an external format such as WKT. |
Transformations |
ST_BOUNDARY ST_BUFFER ST_BUFFERWITHTOLERANCE ST_CENTROID ST_CENTROID_AGG (Aggregate)ST_CLOSESTPOINT ST_CONVEXHULL ST_DIFFERENCE ST_EXTERIORRING ST_INTERIORRINGS ST_INTERSECTION ST_LINEINTERPOLATEPOINT ST_LINESUBSTRING ST_SIMPLIFY ST_SNAPTOGRID ST_UNION ST_UNION_AGG (Aggregate) |
Functions that generate a new geography based on input. |
Accessors |
ST_DIMENSION ST_DUMP ST_ENDPOINT ST_GEOMETRYTYPE ST_ISCLOSED ST_ISCOLLECTION ST_ISEMPTY ST_ISRING ST_NPOINTS ST_NUMGEOMETRIES ST_NUMPOINTS ST_POINTN ST_STARTPOINT ST_X ST_Y |
Functions that provide access to properties of a geography without side-effects. |
Predicates |
ST_CONTAINS ST_COVEREDBY ST_COVERS ST_DISJOINT ST_DWITHIN ST_EQUALS ST_INTERSECTS ST_INTERSECTSBOX ST_TOUCHES ST_WITHIN |
Functions that return TRUE or
FALSE for some spatial
relationship between two
geographies or some property of
a geography. These functions
are commonly used in filter
clauses.
|
Measures |
ST_ANGLE ST_AREA ST_AZIMUTH ST_BOUNDINGBOX ST_DISTANCE ST_EXTENT (Aggregate)ST_HAUSDORFFDISTANCE ST_LINELOCATEPOINT ST_LENGTH ST_MAXDISTANCE ST_PERIMETER |
Functions that compute measurements of one or more geographies. |
Clustering |
ST_CLUSTERDBSCAN
|
Functions that perform clustering on geographies. |
S2 functions |
S2_CELLIDFROMPOINT S2_COVERINGCELLIDS |
Functions for working with S2 cell coverings of GEOGRAPHY. |
Function list
Name | Summary |
---|---|
S2_CELLIDFROMPOINT
|
Gets the S2 cell ID covering a point GEOGRAPHY value.
|
S2_COVERINGCELLIDS
|
Gets an array of S2 cell IDs that cover a GEOGRAPHY value.
|
ST_ANGLE
|
Takes three point GEOGRAPHY values, which represent two
intersecting lines, and returns the angle between these lines.
|
ST_AREA
|
Gets the area covered by the polygons in a GEOGRAPHY value.
|
ST_ASBINARY
|
Converts a GEOGRAPHY value to a
BYTES WKB geography value.
|
ST_ASGEOJSON
|
Converts a GEOGRAPHY value to a STRING
GeoJSON geography value.
|
ST_ASTEXT
|
Converts a GEOGRAPHY value to a
STRING WKT geography value.
|
ST_AZIMUTH
|
Gets the azimuth of a line segment formed by two
point GEOGRAPHY values.
|
ST_BOUNDARY
|
Gets the union of component boundaries in a
GEOGRAPHY value.
|
ST_BOUNDINGBOX
|
Gets the bounding box for a GEOGRAPHY value.
|
ST_BUFFER
|
Gets the buffer around a GEOGRAPHY value, using a specific
number of segments.
|
ST_BUFFERWITHTOLERANCE
|
Gets the buffer around a GEOGRAPHY value, using tolerance.
|
ST_CENTROID
|
Gets the centroid of a GEOGRAPHY value.
|
ST_CENTROID_AGG
|
Gets the centroid of a set of GEOGRAPHY values.
|
ST_CLOSESTPOINT
|
Gets the point on a GEOGRAPHY value which is closest to any
point in a second GEOGRAPHY value.
|
ST_CLUSTERDBSCAN
|
Performs DBSCAN clustering on a group of GEOGRAPHY values and
produces a 0-based cluster number for this row.
|
ST_CONTAINS
|
Checks if one GEOGRAPHY value contains another
GEOGRAPHY value.
|
ST_CONVEXHULL
|
Returns the convex hull for a GEOGRAPHY value.
|
ST_COVEREDBY
|
Checks if all points of a GEOGRAPHY value are on the boundary
or interior of another GEOGRAPHY value.
|
ST_COVERS
|
Checks if all points of a GEOGRAPHY value are on the boundary
or interior of another GEOGRAPHY value.
|
ST_DIFFERENCE
|
Gets the point set difference between two GEOGRAPHY values.
|
ST_DIMENSION
|
Gets the dimension of the highest-dimensional element in a
GEOGRAPHY value.
|
ST_DISJOINT
|
Checks if two GEOGRAPHY values are disjoint (do not intersect).
|
ST_DISTANCE
|
Gets the shortest distance in meters between two GEOGRAPHY
values.
|
ST_DUMP
|
Returns an array of simple GEOGRAPHY components in a
GEOGRAPHY value.
|
ST_DWITHIN
|
Checks if any points in two GEOGRAPHY values are within a given
distance.
|
ST_ENDPOINT
|
Gets the last point of a linestring GEOGRAPHY value.
|
ST_EQUALS
|
Checks if two GEOGRAPHY values represent the same
GEOGRAPHY value.
|
ST_EXTENT
|
Gets the bounding box for a group of GEOGRAPHY values.
|
ST_EXTERIORRING
|
Returns a linestring GEOGRAPHY value that corresponds to the
outermost ring of a polygon GEOGRAPHY value.
|
ST_GEOGFROM
|
Converts a STRING or BYTES value
into a GEOGRAPHY value.
|
ST_GEOGFROMGEOJSON
|
Converts a STRING GeoJSON geometry value into a
GEOGRAPHY value.
|
ST_GEOGFROMTEXT
|
Converts a STRING WKT geometry value into a
GEOGRAPHY value.
|
ST_GEOGFROMWKB
|
Converts a BYTES or hexadecimal-text STRING WKT
geometry value into a GEOGRAPHY value.
|
ST_GEOGPOINT
|
Creates a point GEOGRAPHY value for a given longitude and
latitude.
|
ST_GEOGPOINTFROMGEOHASH
|
Gets a point GEOGRAPHY value that is in the middle of a
bounding box defined in a STRING GeoHash value.
|
ST_GEOHASH
|
Converts a point GEOGRAPHY value to a STRING
GeoHash value.
|
ST_GEOMETRYTYPE
|
Gets the Open Geospatial Consortium (OGC) geometry type for a
GEOGRAPHY value.
|
ST_HAUSDORFFDISTANCE
|
Gets the discrete Hausdorff distance between two geometries. |
ST_INTERIORRINGS
|
Gets the interior rings of a polygon GEOGRAPHY value.
|
ST_INTERSECTION
|
Gets the point set intersection of two GEOGRAPHY values.
|
ST_INTERSECTS
|
Checks if at least one point appears in two GEOGRAPHY
values.
|
ST_INTERSECTSBOX
|
Checks if a GEOGRAPHY value intersects a rectangle.
|
ST_ISCLOSED
|
Checks if all components in a GEOGRAPHY value are closed.
|
ST_ISCOLLECTION
|
Checks if the total number of points, linestrings, and polygons is
greater than one in a GEOGRAPHY value.
|
ST_ISEMPTY
|
Checks if a GEOGRAPHY value is empty.
|
ST_ISRING
|
Checks if a GEOGRAPHY value is a closed, simple
linestring.
|
ST_LENGTH
|
Gets the total length of lines in a GEOGRAPHY value.
|
ST_LINEINTERPOLATEPOINT
|
Gets a point at a specific fraction in a linestring GEOGRAPHY
value.
|
ST_LINELOCATEPOINT
|
Gets a section of a linestring GEOGRAPHY value between the
start point and a point GEOGRAPHY value.
|
ST_LINESUBSTRING
|
Gets a segment of a single linestring at a specific starting and ending fraction. |
ST_MAKELINE
|
Creates a linestring GEOGRAPHY value by concatenating the point
and linestring vertices of GEOGRAPHY values.
|
ST_MAKEPOLYGON
|
Constructs a polygon GEOGRAPHY value by combining
a polygon shell with polygon holes.
|
ST_MAKEPOLYGONORIENTED
|
Constructs a polygon GEOGRAPHY value, using an array of
linestring GEOGRAPHY values. The vertex ordering of each
linestring determines the orientation of each polygon ring.
|
ST_MAXDISTANCE
|
Gets the longest distance between two non-empty
GEOGRAPHY values.
|
ST_NPOINTS
|
An alias of ST_NUMPOINTS .
|
ST_NUMGEOMETRIES
|
Gets the number of geometries in a GEOGRAPHY value.
|
ST_NUMPOINTS
|
Gets the number of vertices in the a GEOGRAPHY value.
|
ST_PERIMETER
|
Gets the length of the boundary of the polygons in a
GEOGRAPHY value.
|
ST_POINTN
|
Gets the point at a specific index of a linestring GEOGRAPHY
value.
|
ST_SIMPLIFY
|
Converts a GEOGRAPHY value into a simplified
GEOGRAPHY value, using tolerance.
|
ST_SNAPTOGRID
|
Produces a GEOGRAPHY value, where each vertex has
been snapped to a longitude/latitude grid.
|
ST_STARTPOINT
|
Gets the first point of a linestring GEOGRAPHY value.
|
ST_TOUCHES
|
Checks if two GEOGRAPHY values intersect and their interiors
have no elements in common.
|
ST_UNION
|
Gets the point set union of multiple GEOGRAPHY values.
|
ST_UNION_AGG
|
Aggregates over GEOGRAPHY values and gets their
point set union.
|
ST_WITHIN
|
Checks if one GEOGRAPHY value contains another
GEOGRAPHY value.
|
ST_X
|
Gets the longitude from a point GEOGRAPHY value.
|
ST_Y
|
Gets the latitude from a point GEOGRAPHY value.
|
S2_CELLIDFROMPOINT
S2_CELLIDFROMPOINT(point_geography[, level => cell_level])
Description
Returns the S2 cell ID covering a point GEOGRAPHY
.
- The optional
INT64
parameterlevel
specifies the S2 cell level for the returned cell. Naming this argument is optional.
This is advanced functionality for interoperability with systems utilizing the S2 Geometry Library.
Constraints
- Returns the cell ID as a signed
INT64
bit-equivalent to unsigned 64-bit integer representation. - Can return negative cell IDs.
- Valid S2 cell levels are 0 to 30.
level
defaults to 30 if not explicitly specified.- The function only supports a single point GEOGRAPHY. Use the
SAFE
prefix if the input can be multipoint, linestring, polygon, or an emptyGEOGRAPHY
. - To compute the covering of a complex
GEOGRAPHY
, use S2_COVERINGCELLIDS.
Return type
INT64
Example
WITH data AS (
SELECT 1 AS id, ST_GEOGPOINT(-122, 47) AS geo
UNION ALL
-- empty geography is not supported
SELECT 2 AS id, ST_GEOGFROMTEXT('POINT EMPTY') AS geo
UNION ALL
-- only points are supported
SELECT 3 AS id, ST_GEOGFROMTEXT('LINESTRING(1 2, 3 4)') AS geo
)
SELECT id,
SAFE.S2_CELLIDFROMPOINT(geo) cell30,
SAFE.S2_CELLIDFROMPOINT(geo, level => 10) cell10
FROM data;
/*----+---------------------+---------------------*
| id | cell30 | cell10 |
+----+---------------------+---------------------+
| 1 | 6093613931972369317 | 6093613287902019584 |
| 2 | NULL | NULL |
| 3 | NULL | NULL |
*----+---------------------+---------------------*/
S2_COVERINGCELLIDS
S2_COVERINGCELLIDS(
geography
[, min_level => cell_level]
[, max_level => cell_level]
[, max_cells => max_cells]
[, buffer => buffer])
Description
Returns an array of S2 cell IDs that cover the input
GEOGRAPHY
. The function returns at most max_cells
cells. The optional
arguments min_level
and max_level
specify minimum and maximum levels for
returned S2 cells. The array size is limited by the optional max_cells
argument. The optional buffer
argument specifies a buffering factor in
meters; the region being covered is expanded from the extent of the
input geography by this amount.
This is advanced functionality for interoperability with systems utilizing the S2 Geometry Library.
Constraints
- Returns the cell ID as a signed
INT64
bit-equivalent to unsigned 64-bit integer representation. - Can return negative cell IDs.
- Valid S2 cell levels are 0 to 30.
max_cells
defaults to 8 if not explicitly specified.buffer
should be nonnegative. It defaults to 0.0 meters if not explicitly specified.
Return type
ARRAY<INT64>
Example
WITH data AS (
SELECT 1 AS id, ST_GEOGPOINT(-122, 47) AS geo
UNION ALL
SELECT 2 AS id, ST_GEOGFROMTEXT('POINT EMPTY') AS geo
UNION ALL
SELECT 3 AS id, ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)') AS geo
)
SELECT id, S2_COVERINGCELLIDS(geo, min_level => 12) cells
FROM data;
/*----+--------------------------------------------------------------------------------------*
| id | cells |
+----+--------------------------------------------------------------------------------------+
| 1 | [6093613931972369317] |
| 2 | [] |
| 3 | [6093384954555662336, 6093390709811838976, 6093390735581642752, 6093390740145045504, |
| | 6093390791416217600, 6093390812891054080, 6093390817187069952, 6093496378892222464] |
*----+--------------------------------------------------------------------------------------*/
ST_ANGLE
ST_ANGLE(point_geography_1, point_geography_2, point_geography_3)
Description
Takes three point GEOGRAPHY
values, which represent two intersecting lines.
Returns the angle between these lines. Point 2 and point 1 represent the first
line and point 2 and point 3 represent the second line. The angle between
these lines is in radians, in the range [0, 2pi)
. The angle is measured
clockwise from the first line to the second line.
ST_ANGLE
has the following edge cases:
- If points 2 and 3 are the same, returns
NULL
. - If points 2 and 1 are the same, returns
NULL
. - If points 2 and 3 are exactly antipodal, returns
NULL
. - If points 2 and 1 are exactly antipodal, returns
NULL
. - If any of the input geographies are not single points or are the empty geography, then throws an error.
Return type
FLOAT64
Example
WITH geos AS (
SELECT 1 id, ST_GEOGPOINT(1, 0) geo1, ST_GEOGPOINT(0, 0) geo2, ST_GEOGPOINT(0, 1) geo3 UNION ALL
SELECT 2 id, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0), ST_GEOGPOINT(0, 1) UNION ALL
SELECT 3 id, ST_GEOGPOINT(1, 0), ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0) UNION ALL
SELECT 4 id, ST_GEOGPOINT(1, 0) geo1, ST_GEOGPOINT(0, 0) geo2, ST_GEOGPOINT(0, 0) geo3 UNION ALL
SELECT 5 id, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(-30, 0), ST_GEOGPOINT(150, 0) UNION ALL
SELECT 6 id, ST_GEOGPOINT(0, 0), NULL, NULL UNION ALL
SELECT 7 id, NULL, ST_GEOGPOINT(0, 0), NULL UNION ALL
SELECT 8 id, NULL, NULL, ST_GEOGPOINT(0, 0))
SELECT ST_ANGLE(geo1,geo2,geo3) AS angle FROM geos ORDER BY id;
/*---------------------*
| angle |
+---------------------+
| 4.71238898038469 |
| 0.78547432161873854 |
| 0 |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
*---------------------*/
ST_AREA
ST_AREA(geography_expression[, use_spheroid])
Description
Returns the area in square meters covered by the polygons in the input
GEOGRAPHY
.
If geography_expression
is a point or a line, returns zero. If
geography_expression
is a collection, returns the area of the polygons in the
collection; if the collection does not contain polygons, returns zero.
The optional use_spheroid
parameter determines how this function measures
distance. If use_spheroid
is FALSE
, the function measures distance on the
surface of a perfect sphere.
The use_spheroid
parameter currently only supports
the value FALSE
. The default value of use_spheroid
is FALSE
.
Return type
FLOAT64
ST_ASBINARY
ST_ASBINARY(geography_expression)
Description
Returns the WKB representation of an input
GEOGRAPHY
.
See ST_GEOGFROMWKB
to construct a
GEOGRAPHY
from WKB.
Return type
BYTES
ST_ASGEOJSON
ST_ASGEOJSON(geography_expression)
Description
Returns the RFC 7946 compliant GeoJSON
representation of the input GEOGRAPHY
.
A GoogleSQL GEOGRAPHY
has spherical
geodesic edges, whereas a GeoJSON Geometry
object explicitly has planar edges.
To convert between these two types of edges, GoogleSQL adds additional
points to the line where necessary so that the resulting sequence of edges
remains within 10 meters of the original edge.
See ST_GEOGFROMGEOJSON
to construct a
GEOGRAPHY
from GeoJSON.
Return type
STRING
ST_ASTEXT
ST_ASTEXT(geography_expression)
Description
Returns the WKT representation of an input
GEOGRAPHY
.
See ST_GEOGFROMTEXT
to construct a
GEOGRAPHY
from WKT.
Return type
STRING
ST_AZIMUTH
ST_AZIMUTH(point_geography_1, point_geography_2)
Description
Takes two point GEOGRAPHY
values, and returns the azimuth of the line segment
formed by points 1 and 2. The azimuth is the angle in radians measured between
the line from point 1 facing true North to the line segment from point 1 to
point 2.
The positive angle is measured clockwise on the surface of a sphere. For example, the azimuth for a line segment:
- Pointing North is
0
- Pointing East is
PI/2
- Pointing South is
PI
- Pointing West is
3PI/2
ST_AZIMUTH
has the following edge cases:
- If the two input points are the same, returns
NULL
. - If the two input points are exactly antipodal, returns
NULL
. - If either of the input geographies are not single points or are the empty geography, throws an error.
Return type
FLOAT64
Example
WITH geos AS (
SELECT 1 id, ST_GEOGPOINT(1, 0) AS geo1, ST_GEOGPOINT(0, 0) AS geo2 UNION ALL
SELECT 2, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(1, 0) UNION ALL
SELECT 3, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(0, 1) UNION ALL
-- identical
SELECT 4, ST_GEOGPOINT(0, 0), ST_GEOGPOINT(0, 0) UNION ALL
-- antipode
SELECT 5, ST_GEOGPOINT(-30, 0), ST_GEOGPOINT(150, 0) UNION ALL
-- nulls
SELECT 6, ST_GEOGPOINT(0, 0), NULL UNION ALL
SELECT 7, NULL, ST_GEOGPOINT(0, 0))
SELECT ST_AZIMUTH(geo1, geo2) AS azimuth FROM geos ORDER BY id;
/*--------------------*
| azimuth |
+--------------------+
| 4.71238898038469 |
| 1.5707963267948966 |
| 0 |
| NULL |
| NULL |
| NULL |
| NULL |
*--------------------*/
ST_BOUNDARY
ST_BOUNDARY(geography_expression)
Description
Returns a single GEOGRAPHY
that contains the union
of the boundaries of each component in the given input
GEOGRAPHY
.
The boundary of each component of a GEOGRAPHY
is
defined as follows:
- The boundary of a point is empty.
- The boundary of a linestring consists of the endpoints of the linestring.
- The boundary of a polygon consists of the linestrings that form the polygon shell and each of the polygon's holes.
Return type
GEOGRAPHY
ST_BOUNDINGBOX
ST_BOUNDINGBOX(geography_expression)
Description
Returns a STRUCT
that represents the bounding box for the specified geography.
The bounding box is the minimal rectangle that encloses the geography. The edges
of the rectangle follow constant lines of longitude and latitude.
Caveats:
- Returns
NULL
if the input isNULL
or an empty geography. - The bounding box might cross the antimeridian if this allows for a smaller
rectangle. In this case, the bounding box has one of its longitudinal bounds
outside of the [-180, 180] range, so that
xmin
is smaller than the eastmost valuexmax
.
Return type
STRUCT<xmin FLOAT64, ymin FLOAT64, xmax FLOAT64, ymax FLOAT64>
.
Bounding box parts:
xmin
: The westmost constant longitude line that bounds the rectangle.xmax
: The eastmost constant longitude line that bounds the rectangle.ymin
: The minimum constant latitude line that bounds the rectangle.ymax
: The maximum constant latitude line that bounds the rectangle.
Example
WITH data AS (
SELECT 1 id, ST_GEOGFROMTEXT('POLYGON((-125 48, -124 46, -117 46, -117 49, -125 48))') g
UNION ALL
SELECT 2 id, ST_GEOGFROMTEXT('POLYGON((172 53, -130 55, -141 70, 172 53))') g
UNION ALL
SELECT 3 id, ST_GEOGFROMTEXT('POINT EMPTY') g
UNION ALL
SELECT 4 id, ST_GEOGFROMTEXT('POLYGON((172 53, -141 70, -130 55, 172 53))', oriented => TRUE)
)
SELECT id, ST_BOUNDINGBOX(g) AS box
FROM data
/*----+------------------------------------------*
| id | box |
+----+------------------------------------------+
| 1 | {xmin:-125, ymin:46, xmax:-117, ymax:49} |
| 2 | {xmin:172, ymin:53, xmax:230, ymax:70} |
| 3 | NULL |
| 4 | {xmin:-180, ymin:-90, xmax:180, ymax:90} |
*----+------------------------------------------*/
See ST_EXTENT
for the aggregate version of ST_BOUNDINGBOX
.
ST_BUFFER
ST_BUFFER(
geography,
buffer_radius
[, num_seg_quarter_circle => num_segments]
[, use_spheroid => boolean_expression]
[, endcap => endcap_style]
[, side => line_side])
Description
Returns a GEOGRAPHY
that represents the buffer around the input GEOGRAPHY
.
This function is similar to ST_BUFFERWITHTOLERANCE
,
but you specify the number of segments instead of providing tolerance to
determine how much the resulting geography can deviate from the ideal
buffer radius.
geography
: The inputGEOGRAPHY
to encircle with the buffer radius.buffer_radius
:FLOAT64
that represents the radius of the buffer around the input geography. The radius is in meters. Note that polygons contract when buffered with a negativebuffer_radius
. Polygon shells and holes that are contracted to a point are discarded.num_seg_quarter_circle
: (Optional)FLOAT64
specifies the number of segments that are used to approximate a quarter circle. The default value is8.0
. Naming this argument is optional.endcap
: (Optional)STRING
allows you to specify one of two endcap styles:ROUND
andFLAT
. The default value isROUND
. This option only affects the endcaps of buffered linestrings.side
: (Optional)STRING
allows you to specify one of three possibilities for lines:BOTH
,LEFT
, andRIGHT
. The default isBOTH
. This option only affects how linestrings are buffered.use_spheroid
: (Optional)BOOL
determines how this function measures distance. Ifuse_spheroid
isFALSE
, the function measures distance on the surface of a perfect sphere. Theuse_spheroid
parameter currently only supports the valueFALSE
. The default value ofuse_spheroid
isFALSE
.
Return type
Polygon GEOGRAPHY
Example
The following example shows the result of ST_BUFFER
on a point. A buffered
point is an approximated circle. When num_seg_quarter_circle = 2
, there are
two line segments in a quarter circle, and therefore the buffered circle has
eight sides and ST_NUMPOINTS
returns nine vertices. When
num_seg_quarter_circle = 8
, there are eight line segments in a quarter circle,
and therefore the buffered circle has thirty-two sides and
ST_NUMPOINTS
returns thirty-three vertices.
SELECT
-- num_seg_quarter_circle=2
ST_NUMPOINTS(ST_BUFFER(ST_GEOGFROMTEXT('POINT(1 2)'), 50, 2)) AS eight_sides,
-- num_seg_quarter_circle=8, since 8 is the default
ST_NUMPOINTS(ST_BUFFER(ST_GEOGFROMTEXT('POINT(100 2)'), 50)) AS thirty_two_sides;
/*-------------+------------------*
| eight_sides | thirty_two_sides |
+-------------+------------------+
| 9 | 33 |
*-------------+------------------*/
ST_BUFFERWITHTOLERANCE
ST_BUFFERWITHTOLERANCE(
geography,
buffer_radius,
tolerance_meters => tolerance
[, use_spheroid => boolean_expression]
[, endcap => endcap_style]
[, side => line_side])
Returns a GEOGRAPHY
that represents the buffer around the input GEOGRAPHY
.
This function is similar to ST_BUFFER
,
but you provide tolerance instead of segments to determine how much the
resulting geography can deviate from the ideal buffer radius.
geography
: The inputGEOGRAPHY
to encircle with the buffer radius.buffer_radius
:FLOAT64
that represents the radius of the buffer around the input geography. The radius is in meters. Note that polygons contract when buffered with a negativebuffer_radius
. Polygon shells and holes that are contracted to a point are discarded.tolerance_meters
:FLOAT64
specifies a tolerance in meters with which the shape is approximated. Tolerance determines how much a polygon can deviate from the ideal radius. Naming this argument is optional.endcap
: (Optional)STRING
allows you to specify one of two endcap styles:ROUND
andFLAT
. The default value isROUND
. This option only affects the endcaps of buffered linestrings.side
: (Optional)STRING
allows you to specify one of three possible line styles:BOTH
,LEFT
, andRIGHT
. The default isBOTH
. This option only affects the endcaps of buffered linestrings.use_spheroid
: (Optional)BOOL
determines how this function measures distance. Ifuse_spheroid
isFALSE
, the function measures distance on the surface of a perfect sphere. Theuse_spheroid
parameter currently only supports the valueFALSE
. The default value ofuse_spheroid
isFALSE
.
Return type
Polygon GEOGRAPHY
Example
The following example shows the results of ST_BUFFERWITHTOLERANCE
on a point,
given two different values for tolerance but with the same buffer radius of
100
. A buffered point is an approximated circle. When tolerance_meters=25
,
the tolerance is a large percentage of the buffer radius, and therefore only
five segments are used to approximate a circle around the input point. When
tolerance_meters=1
, the tolerance is a much smaller percentage of the buffer
radius, and therefore twenty-four edges are used to approximate a circle around
the input point.
SELECT
-- tolerance_meters=25, or 25% of the buffer radius.
ST_NumPoints(ST_BUFFERWITHTOLERANCE(ST_GEOGFROMTEXT('POINT(1 2)'), 100, 25)) AS five_sides,
-- tolerance_meters=1, or 1% of the buffer radius.
st_NumPoints(ST_BUFFERWITHTOLERANCE(ST_GEOGFROMTEXT('POINT(100 2)'), 100, 1)) AS twenty_four_sides;
/*------------+-------------------*
| five_sides | twenty_four_sides |
+------------+-------------------+
| 6 | 24 |
*------------+-------------------*/
ST_CENTROID
ST_CENTROID(geography_expression)
Description
Returns the centroid of the input GEOGRAPHY
as a single point GEOGRAPHY
.
The centroid of a GEOGRAPHY
is the weighted average of the centroids of the
highest-dimensional components in the GEOGRAPHY
. The centroid for components
in each dimension is defined as follows:
- The centroid of points is the arithmetic mean of the input coordinates.
- The centroid of linestrings is the centroid of all the edges weighted by length. The centroid of each edge is the geodesic midpoint of the edge.
- The centroid of a polygon is its center of mass.
If the input GEOGRAPHY
is empty, an empty GEOGRAPHY
is returned.
Constraints
In the unlikely event that the centroid of a GEOGRAPHY
cannot be defined by a
single point on the surface of the Earth, a deterministic but otherwise
arbitrary point is returned. This can only happen if the centroid is exactly at
the center of the Earth, such as the centroid for a pair of antipodal points,
and the likelihood of this happening is vanishingly small.
Return type
Point GEOGRAPHY
ST_CENTROID_AGG
ST_CENTROID_AGG(geography)
Description
Computes the centroid of the set of input GEOGRAPHY
s as a single point
GEOGRAPHY
.
The centroid over the set of input GEOGRAPHY
s is the weighted average of the
centroid of each individual GEOGRAPHY
. Only the GEOGRAPHY
s with the highest
dimension present in the input contribute to the centroid of the entire set. For
example, if the input contains both GEOGRAPHY
s with lines and GEOGRAPHY
s
with only points, ST_CENTROID_AGG
returns the weighted average of the
GEOGRAPHY
s with lines, since a line has more dimensions than a point. In this
example, ST_CENTROID_AGG
ignores GEOGRAPHY
s with only points when
calculating the aggregate centroid.
ST_CENTROID_AGG
ignores NULL
input GEOGRAPHY
values.
See ST_CENTROID
for the non-aggregate version of
ST_CENTROID_AGG
and the definition of centroid for an individual GEOGRAPHY
value.
Return type
Point GEOGRAPHY
Example
The following queries compute the aggregate centroid over a set of
GEOGRAPHY
values. The input to the first query
contains only points, and therefore each value contribute to the aggregate
centroid. Also notice that ST_CENTROID_AGG
is not equivalent to calling
ST_CENTROID
on the result of ST_UNION_AGG
; duplicates are removed by the
union, unlike ST_CENTROID_AGG
. The input to the second query has mixed
dimensions, and only values with the highest dimension in the set, the lines,
affect the aggregate centroid.
SELECT ST_CENTROID_AGG(points) AS st_centroid_agg,
ST_CENTROID(ST_UNION_AGG(points)) AS centroid_of_union
FROM UNNEST([ST_GEOGPOINT(1, 5),
ST_GEOGPOINT(1, 2),
ST_GEOGPOINT(1, -1),
ST_GEOGPOINT(1, -1)]) points;
/*---------------------------+-------------------*
| st_centroid_agg | centroid_of_union |
+---------------------------+-------------------+
| POINT(1 1.24961422620969) | POINT(1 2) |
*---------------------------+-------------------*/
SELECT ST_CENTROID_AGG(points) AS st_centroid_agg
FROM UNNEST([ST_GEOGPOINT(50, 26),
ST_GEOGPOINT(34, 33.3),
ST_GEOGFROMTEXT('LINESTRING(0 -1, 0 1)'),
ST_GEOGFROMTEXT('LINESTRING(0 1, 0 3)')]) points;
/*-----------------*
| st_centroid_agg |
+-----------------+
| POINT(0 1) |
*-----------------*/
ST_CLOSESTPOINT
ST_CLOSESTPOINT(geography_1, geography_2[, use_spheroid])
Description
Returns a GEOGRAPHY
containing a point on
geography_1
with the smallest possible distance to geography_2
. This implies
that the distance between the point returned by ST_CLOSESTPOINT
and
geography_2
is less than or equal to the distance between any other point on
geography_1
and geography_2
.
If either of the input GEOGRAPHY
s is empty, ST_CLOSESTPOINT
returns NULL
.
The optional use_spheroid
parameter determines how this function measures
distance. If use_spheroid
is FALSE
, the function measures distance on the
surface of a perfect sphere.
The use_spheroid
parameter currently only supports
the value FALSE
. The default value of use_spheroid
is FALSE
.
Return type
Point GEOGRAPHY
ST_CLUSTERDBSCAN
ST_CLUSTERDBSCAN(geography_column, epsilon, minimum_geographies)
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
Performs DBSCAN clustering on a column of geographies. Returns a 0-based cluster number.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Input parameters
geography_column
: A column ofGEOGRAPHY
s that is clustered.epsilon
: The epsilon that specifies the radius, measured in meters, around a core value. Non-negativeFLOAT64
value.minimum_geographies
: Specifies the minimum number of geographies in a single cluster. Only dense input forms a cluster, otherwise it is classified as noise. Non-negativeINT64
value.
Geography types and the DBSCAN algorithm
The DBSCAN algorithm identifies high-density clusters of data and marks outliers
in low-density areas of noise. Geographies passed in through geography_column
are classified in one of three ways by the DBSCAN algorithm:
- Core value: A geography is a core value if it is within
epsilon
distance ofminimum_geographies
geographies, including itself. The core value starts a new cluster, or is added to the same cluster as a core value withinepsilon
distance. Core values are grouped in a cluster together with all other core and border values that are withinepsilon
distance. - Border value: A geography is a border value if it is within epsilon distance
of a core value. It is added to the same cluster as a core value within
epsilon
distance. A border value may be withinepsilon
distance of more than one cluster. In this case, it may be arbitrarily assigned to either cluster and the function will produce the same result in subsequent calls. - Noise: A geography is noise if it is neither a core nor a border value.
Noise values are assigned to a
NULL
cluster. An emptyGEOGRAPHY
is always classified as noise.
Constraints
- The argument
minimum_geographies
is a non-negativeINT64
andepsilon
is a non-negativeFLOAT64
. - An empty geography cannot join any cluster.
- Multiple clustering assignments could be possible for a border value. If a
geography is a border value,
ST_CLUSTERDBSCAN
will assign it to an arbitrary valid cluster.
Return type
INT64
for each geography in the geography column.
Examples
This example performs DBSCAN clustering with a radius of 100,000 meters with a
minimum_geographies
argument of 1. The geographies being analyzed are a
mixture of points, lines, and polygons.
WITH Geos as
(SELECT 1 as row_id, ST_GEOGFROMTEXT('POINT EMPTY') as geo UNION ALL
SELECT 2, ST_GEOGFROMTEXT('MULTIPOINT(1 1, 2 2, 4 4, 5 2)') UNION ALL
SELECT 3, ST_GEOGFROMTEXT('POINT(14 15)') UNION ALL
SELECT 4, ST_GEOGFROMTEXT('LINESTRING(40 1, 42 34, 44 39)') UNION ALL
SELECT 5, ST_GEOGFROMTEXT('POLYGON((40 2, 40 1, 41 2, 40 2))'))
SELECT row_id, geo, ST_CLUSTERDBSCAN(geo, 1e5, 1) OVER () AS cluster_num FROM
Geos ORDER BY row_id
/*--------+-----------------------------------+-------------*
| row_id | geo | cluster_num |
+--------+-----------------------------------+-------------+
| 1 | GEOMETRYCOLLECTION EMPTY | NULL |
| 2 | MULTIPOINT(1 1, 2 2, 5 2, 4 4) | 0 |
| 3 | POINT(14 15) | 1 |
| 4 | LINESTRING(40 1, 42 34, 44 39) | 2 |
| 5 | POLYGON((40 2, 40 1, 41 2, 40 2)) | 2 |
*--------+-----------------------------------+-------------*/
ST_CONTAINS
ST_CONTAINS(geography_1, geography_2)
Description
Returns TRUE
if no point of geography_2
is outside geography_1
, and
the interiors intersect; returns FALSE
otherwise.
NOTE: A GEOGRAPHY
does not contain its own
boundary. Compare with ST_COVERS
.
Return type
BOOL
Example
The following query tests whether the polygon POLYGON((1 1, 20 1, 10 20, 1 1))
contains each of the three points (0, 0)
, (1, 1)
, and (10, 10)
, which lie
on the exterior, the boundary, and the interior of the polygon respectively.
SELECT
ST_GEOGPOINT(i, i) AS p,
ST_CONTAINS(ST_GEOGFROMTEXT('POLYGON((1 1, 20 1, 10 20, 1 1))'),
ST_GEOGPOINT(i, i)) AS `contains`
FROM UNNEST([0, 1, 10]) AS i;
/*--------------+----------*
| p | contains |
+--------------+----------+
| POINT(0 0) | FALSE |
| POINT(1 1) | FALSE |
| POINT(10 10) | TRUE |
*--------------+----------*/
ST_CONVEXHULL
ST_CONVEXHULL(geography_expression)
Description
Returns the convex hull for the input GEOGRAPHY
. The convex hull is the
smallest convex GEOGRAPHY
that covers the input. A GEOGRAPHY
is convex if
for every pair of points in the GEOGRAPHY
, the geodesic edge connecting the
points are also contained in the same GEOGRAPHY
.
In most cases, the convex hull consists of a single polygon. Notable edge cases include the following:
- The convex hull of a single point is also a point.
- The convex hull of two or more collinear points is a linestring as long as that linestring is convex.
- If the input
GEOGRAPHY
spans more than a hemisphere, the convex hull is the full globe. This includes any input that contains a pair of antipodal points. ST_CONVEXHULL
returnsNULL
if the input is eitherNULL
or the emptyGEOGRAPHY
.
Return type
GEOGRAPHY
Examples
The convex hull returned by ST_CONVEXHULL
can be a point, linestring, or a
polygon, depending on the input.
WITH Geographies AS
(SELECT ST_GEOGFROMTEXT('POINT(1 1)') AS g UNION ALL
SELECT ST_GEOGFROMTEXT('LINESTRING(1 1, 2 2)') AS g UNION ALL
SELECT ST_GEOGFROMTEXT('MULTIPOINT(2 11, 4 12, 0 15, 1 9, 1 12)') AS g)
SELECT
g AS input_geography,
ST_CONVEXHULL(g) AS convex_hull
FROM Geographies;
/*-----------------------------------------+--------------------------------------------------------*
| input_geography | convex_hull |
+-----------------------------------------+--------------------------------------------------------+
| POINT(1 1) | POINT(0.999999999999943 1) |
| LINESTRING(1 1, 2 2) | LINESTRING(2 2, 1.49988573656168 1.5000570914792, 1 1) |
| MULTIPOINT(1 9, 4 12, 2 11, 1 12, 0 15) | POLYGON((1 9, 4 12, 0 15, 1 9)) |
*-----------------------------------------+--------------------------------------------------------*/
ST_COVEREDBY
ST_COVEREDBY(geography_1, geography_2)
Description
Returns FALSE
if geography_1
or geography_2
is empty. Returns TRUE
if no
points of geography_1
lie in the exterior of geography_2
.
Given two GEOGRAPHY
s a
and b
,
ST_COVEREDBY(a, b)
returns the same result as
ST_COVERS
(b, a)
. Note the opposite order of arguments.
Return type
BOOL
ST_COVERS
ST_COVERS(geography_1, geography_2)
Description
Returns FALSE
if geography_1
or geography_2
is empty.
Returns TRUE
if no points of geography_2
lie in the exterior of
geography_1
.
Return type
BOOL
Example
The following query tests whether the polygon POLYGON((1 1, 20 1, 10 20, 1 1))
covers each of the three points (0, 0)
, (1, 1)
, and (10, 10)
, which lie
on the exterior, the boundary, and the interior of the polygon respectively.
SELECT
ST_GEOGPOINT(i, i) AS p,
ST_COVERS(ST_GEOGFROMTEXT('POLYGON((1 1, 20 1, 10 20, 1 1))'),
ST_GEOGPOINT(i, i)) AS `covers`
FROM UNNEST([0, 1, 10]) AS i;
/*--------------+--------*
| p | covers |
+--------------+--------+
| POINT(0 0) | FALSE |
| POINT(1 1) | TRUE |
| POINT(10 10) | TRUE |
*--------------+--------*/
ST_DIFFERENCE
ST_DIFFERENCE(geography_1, geography_2)
Description
Returns a GEOGRAPHY
that represents the point set
difference of geography_1
and geography_2
. Therefore, the result consists of
the part of geography_1
that does not intersect with geography_2
.
If geometry_1
is completely contained in geometry_2
, then ST_DIFFERENCE
returns an empty GEOGRAPHY
.
Constraints
The underlying geometric objects that a GoogleSQL
GEOGRAPHY
represents correspond to a closed point
set. Therefore, ST_DIFFERENCE
is the closure of the point set difference of
geography_1
and geography_2
. This implies that if geography_1
and
geography_2
intersect, then a portion of the boundary of geography_2
could
be in the difference.
Return type
GEOGRAPHY
Example
The following query illustrates the difference between geog1
, a larger polygon
POLYGON((0 0, 10 0, 10 10, 0 0))
and geog2
, a smaller polygon
POLYGON((4 2, 6 2, 8 6, 4 2))
that intersects with geog1
. The result is
geog1
with a hole where geog2
intersects with it.
SELECT
ST_DIFFERENCE(
ST_GEOGFROMTEXT('POLYGON((0 0, 10 0, 10 10, 0 0))'),
ST_GEOGFROMTEXT('POLYGON((4 2, 6 2, 8 6, 4 2))')
);
/*--------------------------------------------------------*
| difference_of_geog1_and_geog2 |
+--------------------------------------------------------+
| POLYGON((0 0, 10 0, 10 10, 0 0), (8 6, 6 2, 4 2, 8 6)) |
*--------------------------------------------------------*/
ST_DIMENSION
ST_DIMENSION(geography_expression)
Description
Returns the dimension of the highest-dimensional element in the input
GEOGRAPHY
.
The dimension of each possible element is as follows:
- The dimension of a point is
0
. - The dimension of a linestring is
1
. - The dimension of a polygon is
2
.
If the input GEOGRAPHY
is empty, ST_DIMENSION
returns -1
.
Return type
INT64
ST_DISJOINT
ST_DISJOINT(geography_1, geography_2)
Description
Returns TRUE
if the intersection of geography_1
and geography_2
is empty,
that is, no point in geography_1
also appears in geography_2
.
ST_DISJOINT
is the logical negation of ST_INTERSECTS
.
Return type
BOOL
ST_DISTANCE
ST_DISTANCE(geography_1, geography_2[, use_spheroid])
Description
Returns the shortest distance in meters between two non-empty
GEOGRAPHY
s.
If either of the input GEOGRAPHY
s is empty,
ST_DISTANCE
returns NULL
.
The optional use_spheroid
parameter determines how this function measures
distance. If use_spheroid
is FALSE
, the function measures distance on the
surface of a perfect sphere. If use_spheroid
is TRUE
, the function measures
distance on the surface of the WGS84 spheroid. The default value
of use_spheroid
is FALSE
.
Return type
FLOAT64
ST_DUMP
ST_DUMP(geography[, dimension])
Description
Returns an ARRAY
of simple
GEOGRAPHY
s where each element is a component of
the input GEOGRAPHY
. A simple
GEOGRAPHY
consists of a single point, linestring,
or polygon. If the input GEOGRAPHY
is simple, the
result is a single element. When the input
GEOGRAPHY
is a collection, ST_DUMP
returns an
ARRAY
with one simple
GEOGRAPHY
for each component in the collection.
If dimension
is provided, the function only returns
GEOGRAPHY
s of the corresponding dimension. A
dimension of -1 is equivalent to omitting dimension
.
Return Type
ARRAY<GEOGRAPHY>
Examples
The following example shows how ST_DUMP
returns the simple geographies within
a complex geography.
WITH example AS (
SELECT ST_GEOGFROMTEXT('POINT(0 0)') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('MULTIPOINT(0 0, 1 1)') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))'))
SELECT
geography AS original_geography,
ST_DUMP(geography) AS dumped_geographies
FROM example
/*-------------------------------------+------------------------------------*
| original_geographies | dumped_geographies |
+-------------------------------------+------------------------------------+
| POINT(0 0) | [POINT(0 0)] |
| MULTIPOINT(0 0, 1 1) | [POINT(0 0), POINT(1 1)] |
| GEOMETRYCOLLECTION(POINT(0 0), | [POINT(0 0), LINESTRING(1 2, 2 1)] |
| LINESTRING(1 2, 2 1)) | |
*-------------------------------------+------------------------------------*/
The following example shows how ST_DUMP
with the dimension argument only
returns simple geographies of the given dimension.
WITH example AS (
SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography)
SELECT
geography AS original_geography,
ST_DUMP(geography, 1) AS dumped_geographies
FROM example
/*-------------------------------------+------------------------------*
| original_geographies | dumped_geographies |
+-------------------------------------+------------------------------+
| GEOMETRYCOLLECTION(POINT(0 0), | [LINESTRING(1 2, 2 1)] |
| LINESTRING(1 2, 2 1)) | |
*-------------------------------------+------------------------------*/
ST_DWITHIN
ST_DWITHIN(geography_1, geography_2, distance[, use_spheroid])
Description
Returns TRUE
if the distance between at least one point in geography_1
and
one point in geography_2
is less than or equal to the distance given by the
distance
argument; otherwise, returns FALSE
. If either input
GEOGRAPHY
is empty, ST_DWithin
returns FALSE
. The
given distance
is in meters on the surface of the Earth.
The optional use_spheroid
parameter determines how this function measures
distance. If use_spheroid
is FALSE
, the function measures distance on the
surface of a perfect sphere.
The use_spheroid
parameter currently only supports
the value FALSE
. The default value of use_spheroid
is FALSE
.
Return type
BOOL
ST_ENDPOINT
ST_ENDPOINT(linestring_geography)
Description
Returns the last point of a linestring geography as a point geography. Returns
an error if the input is not a linestring or if the input is empty. Use the
SAFE
prefix to obtain NULL
for invalid input instead of an error.
Return Type
Point GEOGRAPHY
Example
SELECT ST_ENDPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)')) last
/*--------------*
| last |
+--------------+
| POINT(3 3) |
*--------------*/
ST_EQUALS
ST_EQUALS(geography_1, geography_2)
Description
Returns TRUE
if geography_1
and geography_2
represent the same
GEOGRAPHY
value. More precisely, this means that
one of the following conditions holds:
+ ST_COVERS(geography_1, geography_2) = TRUE
and ST_COVERS(geography_2,
geography_1) = TRUE
+ Both geography_1
and geography_2
are empty.
Therefore, two GEOGRAPHY
s may be equal even if the
ordering of points or vertices differ, as long as they still represent the same
geometric structure.
Constraints
ST_EQUALS
is not guaranteed to be a transitive function.
Return type
BOOL
ST_EXTENT
ST_EXTENT(geography_expression)
Description
Returns a STRUCT
that represents the bounding box for the set of input
GEOGRAPHY
values. The bounding box is the minimal rectangle that encloses the
geography. The edges of the rectangle follow constant lines of longitude and
latitude.
Caveats:
- Returns
NULL
if all the inputs areNULL
or empty geographies. - The bounding box might cross the antimeridian if this allows for a smaller
rectangle. In this case, the bounding box has one of its longitudinal bounds
outside of the [-180, 180] range, so that
xmin
is smaller than the eastmost valuexmax
. - If the longitude span of the bounding box is larger than or equal to 180 degrees, the function returns the bounding box with the longitude range of [-180, 180].
Return type
STRUCT<xmin FLOAT64, ymin FLOAT64, xmax FLOAT64, ymax FLOAT64>
.
Bounding box parts:
xmin
: The westmost constant longitude line that bounds the rectangle.xmax
: The eastmost constant longitude line that bounds the rectangle.ymin
: The minimum constant latitude line that bounds the rectangle.ymax
: The maximum constant latitude line that bounds the rectangle.
Example
WITH data AS (
SELECT 1 id, ST_GEOGFROMTEXT('POLYGON((-125 48, -124 46, -117 46, -117 49, -125 48))') g
UNION ALL
SELECT 2 id, ST_GEOGFROMTEXT('POLYGON((172 53, -130 55, -141 70, 172 53))') g
UNION ALL
SELECT 3 id, ST_GEOGFROMTEXT('POINT EMPTY') g
)
SELECT ST_EXTENT(g) AS box
FROM data
/*----------------------------------------------*
| box |
+----------------------------------------------+
| {xmin:172, ymin:46, xmax:243, ymax:70} |
*----------------------------------------------*/
ST_BOUNDINGBOX
for the non-aggregate version of ST_EXTENT
.
ST_EXTERIORRING
ST_EXTERIORRING(polygon_geography)
Description
Returns a linestring geography that corresponds to the outermost ring of a polygon geography.
- If the input geography is a polygon, gets the outermost ring of the polygon geography and returns the corresponding linestring.
- If the input is the full
GEOGRAPHY
, returns an empty geography. - Returns an error if the input is not a single polygon.
Use the SAFE
prefix to return NULL
for invalid input instead of an error.
Return type
- Linestring
GEOGRAPHY
- Empty
GEOGRAPHY
Examples
WITH geo as
(SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 1 4, 2 2, 0 0))') AS g UNION ALL
SELECT ST_GEOGFROMTEXT('''POLYGON((1 1, 1 10, 5 10, 5 1, 1 1),
(2 2, 3 4, 2 4, 2 2))''') as g)
SELECT ST_EXTERIORRING(g) AS ring FROM geo;
/*---------------------------------------*
| ring |
+---------------------------------------+
| LINESTRING(2 2, 1 4, 0 0, 2 2) |
| LINESTRING(5 1, 5 10, 1 10, 1 1, 5 1) |
*---------------------------------------*/
ST_GEOGFROM
ST_GEOGFROM(expression)
Description
Converts an expression for a STRING
or BYTES
value into a
GEOGRAPHY
value.
If expression
represents a STRING
value, it must be a valid
GEOGRAPHY
representation in one of the following formats:
- WKT format. To learn more about this format and the requirements to use it, see ST_GEOGFROMTEXT.
- WKB in hexadecimal text format. To learn more about this format and the requirements to use it, see ST_GEOGFROMWKB.
- GeoJSON format. To learn more about this format and the requirements to use it, see ST_GEOGFROMGEOJSON.
If expression
represents a BYTES
value, it must be a valid GEOGRAPHY
binary expression in WKB format. To learn more about this format and the
requirements to use it, see ST_GEOGFROMWKB.
If expression
is NULL
, the output is NULL
.
Return type
GEOGRAPHY
Examples
This takes a WKT-formatted string and returns a GEOGRAPHY
polygon:
SELECT ST_GEOGFROM('POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))') AS WKT_format;
/*------------------------------------*
| WKT_format |
+------------------------------------+
| POLYGON((2 0, 2 2, 0 2, 0 0, 2 0)) |
*------------------------------------*/
This takes a WKB-formatted hexadecimal-encoded string and returns a
GEOGRAPHY
point:
SELECT ST_GEOGFROM(FROM_HEX('010100000000000000000000400000000000001040')) AS WKB_format;
/*----------------*
| WKB_format |
+----------------+
| POINT(2 4) |
*----------------*/
This takes WKB-formatted bytes and returns a GEOGRAPHY
point:
SELECT ST_GEOGFROM('010100000000000000000000400000000000001040') AS WKB_format;
/*----------------*
| WKB_format |
+----------------+
| POINT(2 4) |
*----------------*/
This takes a GeoJSON-formatted string and returns a GEOGRAPHY
polygon:
SELECT ST_GEOGFROM(
'{ "type": "Polygon", "coordinates": [ [ [2, 0], [2, 2], [1, 2], [0, 2], [0, 0], [2, 0] ] ] }'
) AS GEOJSON_format;
/*-----------------------------------------*
| GEOJSON_format |
+-----------------------------------------+
| POLYGON((2 0, 2 2, 1 2, 0 2, 0 0, 2 0)) |
*-----------------------------------------*/
ST_GEOGFROMGEOJSON
ST_GEOGFROMGEOJSON(
geojson_string
[, make_valid => constant_expression ]
)
Description
Returns a GEOGRAPHY
value that corresponds to the
input GeoJSON representation.
ST_GEOGFROMGEOJSON
accepts input that is RFC 7946
compliant.
If the named argument make_valid
is set to TRUE
, the function attempts to
repair polygons that don't conform to Open Geospatial Consortium
semantics.
A GoogleSQL GEOGRAPHY
has spherical
geodesic edges, whereas a GeoJSON Geometry
object explicitly has planar edges.
To convert between these two types of edges, GoogleSQL adds additional
points to the line where necessary so that the resulting sequence of edges
remains within 10 meters of the original edge.
See ST_ASGEOJSON
to format a
GEOGRAPHY
as GeoJSON.
Constraints
The JSON input is subject to the following constraints:
ST_GEOGFROMGEOJSON
only accepts JSON geometry fragments and cannot be used to ingest a whole JSON document.- The input JSON fragment must consist of a GeoJSON geometry type, which
includes
Point
,MultiPoint
,LineString
,MultiLineString
,Polygon
,MultiPolygon
, andGeometryCollection
. Any other GeoJSON type such asFeature
orFeatureCollection
will result in an error. - A position in the
coordinates
member of a GeoJSON geometry type must consist of exactly two elements. The first is the longitude and the second is the latitude. Therefore,ST_GEOGFROMGEOJSON
does not support the optional third element for a position in thecoordinates
member.
Return type
GEOGRAPHY
ST_GEOGFROMTEXT
ST_GEOGFROMTEXT(
wkt_string
[ , oriented => value ]
[ , planar => value ]
[ , make_valid => value ]
)
Description
Converts a STRING
WKT geometry value into a GEOGRAPHY
value.
To format GEOGRAPHY
value as WKT, use ST_ASTEXT
.
Definitions
wkt_string
: ASTRING
value that contains the WKT format.oriented
: A named argument with aBOOL
literal.If the value is
TRUE
, any polygons in the input are assumed to be oriented as follows: when traveling along the boundary of the polygon in the order of the input vertices, the interior of the polygon is on the left. This allows WKT to represent polygons larger than a hemisphere. See alsoST_MAKEPOLYGONORIENTED
, which is similar toST_GEOGFROMTEXT
withoriented=TRUE
.If the value is
FALSE
or omitted, this function returns the polygon with the smaller area.
planar
: A named argument with aBOOL
literal. If the value isTRUE
, the edges of the linestrings and polygons are assumed to use planar map semantics, rather than GoogleSQL default spherical geodesics semantics. For more information about the differences between spherical geodesics and planar lines, see Coordinate systems and edges.make_valid
: A named argument with aBOOL
literal. If the value isTRUE
, the function attempts to repair polygons that don't conform to Open Geospatial Consortium semantics.
Details
- The function does not support three-dimensional geometries that have a
Z
suffix, nor does it support linear referencing system geometries with anM
suffix. oriented
andplanar
can't beTRUE
at the same time.oriented
andmake_valid
can't beTRUE
at the same time.
Example
The following query reads the WKT string POLYGON((0 0, 0 2, 2 2, 0 2, 0 0))
both as a non-oriented polygon and as an oriented polygon, and checks whether
each result contains the point (1, 1)
.
WITH polygon AS (SELECT 'POLYGON((0 0, 0 2, 2 2, 2 0, 0 0))' AS p)
SELECT
ST_CONTAINS(ST_GEOGFROMTEXT(p), ST_GEOGPOINT(1, 1)) AS fromtext_default,
ST_CONTAINS(ST_GEOGFROMTEXT(p, oriented => FALSE), ST_GEOGPOINT(1, 1)) AS non_oriented,
ST_CONTAINS(ST_GEOGFROMTEXT(p, oriented => TRUE), ST_GEOGPOINT(1, 1)) AS oriented
FROM polygon;
/*-------------------+---------------+-----------*
| fromtext_default | non_oriented | oriented |
+-------------------+---------------+-----------+
| TRUE | TRUE | FALSE |
*-------------------+---------------+-----------*/
The following query converts a WKT string with an invalid polygon to
GEOGRAPHY
. The WKT string violates two properties
of a valid polygon - the loop describing the polygon is not closed, and it
contains self-intersection. With the make_valid
option, ST_GEOGFROMTEXT
successfully converts it to a multipolygon shape.
WITH data AS (
SELECT 'POLYGON((0 -1, 2 1, 2 -1, 0 1))' wkt)
SELECT
SAFE.ST_GEOGFROMTEXT(wkt) as geom,
SAFE.ST_GEOGFROMTEXT(wkt, make_valid => TRUE) as valid_geom
FROM data
/*------+-----------------------------------------------------------------*
| geom | valid_geom |
+------+-----------------------------------------------------------------+
| NULL | MULTIPOLYGON(((0 -1, 1 0, 0 1, 0 -1)), ((1 0, 2 -1, 2 1, 1 0))) |
*------+-----------------------------------------------------------------*/
ST_GEOGFROMWKB
ST_GEOGFROMWKB(
wkb_bytes_expression
[ , oriented => value ]
[ , planar => value ]
[ , make_valid => value ]
)
ST_GEOGFROMWKB(
wkb_hex_string_expression
[, oriented => value ]
[, planar => value ]
[, make_valid => value ]
)
Description
Converts an expression from a hexadecimal-text STRING
or BYTES
value into a GEOGRAPHY
value. The expression must be in
WKB format.
To format GEOGRAPHY
as WKB, use ST_ASBINARY
.
Definitions
wkb_bytes_expression
: ABYTES
value that contains the WKB format.wkb_hex_string_expression
: ASTRING
value that contains the hexadecimal-encoded WKB format.oriented
: A named argument with aBOOL
literal.If the value is
TRUE
, any polygons in the input are assumed to be oriented as follows: when traveling along the boundary of the polygon in the order of the input vertices, the interior of the polygon is on the left. This allows WKB to represent polygons larger than a hemisphere. See alsoST_MAKEPOLYGONORIENTED
, which is similar toST_GEOGFROMWKB
withoriented=TRUE
.If the value is
FALSE
or omitted, this function returns the polygon with the smaller area.
planar
: A named argument with aBOOL
literal. If the value isTRUE
, the edges of the linestrings and polygons are assumed to use planar map semantics, rather than GoogleSQL default spherical geodesics semantics. For more information about the differences between spherical geodesics and planar lines, see Coordinate systems and edges.make_valid
: A named argument with aBOOL
literal. If the value isTRUE
, the function attempts to repair polygons that don't conform to Open Geospatial Consortium semantics.
Details
- The function does not support three-dimensional geometries that have a
Z
suffix, nor does it support linear referencing system geometries with anM
suffix. oriented
andplanar
can't beTRUE
at the same time.oriented
andmake_valid
can't beTRUE
at the same time.
Return type
GEOGRAPHY
Example
The following query reads the hex-encoded WKB data containing
LINESTRING(1 1, 3 2)
and uses it with planar and geodesic semantics. When
planar is used, the function approximates the planar input line using
line that contains a chain of geodesic segments.
WITH wkb_data AS (
SELECT '010200000002000000feffffffffffef3f000000000000f03f01000000000008400000000000000040' geo
)
SELECT
ST_GeogFromWkb(geo, planar=>TRUE) AS from_planar,
ST_GeogFromWkb(geo, planar=>FALSE) AS from_geodesic,
FROM wkb_data
/*---------------------------------------+----------------------*
| from_planar | from_geodesic |
+---------------------------------------+----------------------+
| LINESTRING(1 1, 2 1.5, 2.5 1.75, 3 2) | LINESTRING(1 1, 3 2) |
*---------------------------------------+----------------------*/
ST_GEOGPOINT
ST_GEOGPOINT(longitude, latitude)
Description
Creates a GEOGRAPHY
with a single point. ST_GEOGPOINT
creates a point from
the specified FLOAT64
longitude (in degrees,
negative west of the Prime Meridian, positive east) and latitude (in degrees,
positive north of the Equator, negative south) parameters and returns that point
in a GEOGRAPHY
value.
NOTE: Some systems present latitude first; take care with argument order.
Constraints
- Longitudes outside the range [-180, 180] are allowed;
ST_GEOGPOINT
uses the input longitude modulo 360 to obtain a longitude within [-180, 180]. - Latitudes must be in the range [-90, 90]. Latitudes outside this range will result in an error.
Return type
Point GEOGRAPHY
ST_GEOGPOINTFROMGEOHASH
ST_GEOGPOINTFROMGEOHASH(geohash)
Description
Returns a GEOGRAPHY
value that corresponds to a
point in the middle of a bounding box defined in the GeoHash.
Return type
Point GEOGRAPHY
ST_GEOHASH
ST_GEOHASH(geography_expression[, maxchars])
Description
Takes a single-point GEOGRAPHY
and returns a GeoHash
representation of that GEOGRAPHY
object.
geography_expression
: Represents aGEOGRAPHY
object. Only aGEOGRAPHY
object that represents a single point is supported. IfST_GEOHASH
is used over an emptyGEOGRAPHY
object, returnsNULL
.maxchars
: This optionalINT64
parameter specifies the maximum number of characters the hash will contain. Fewer characters corresponds to lower precision (or, described differently, to a bigger bounding box).maxchars
defaults to 20 if not explicitly specified. A validmaxchars
value is 1 to 20. Any value below or above is considered unspecified and the default of 20 is used.
Return type
STRING
Example
Returns a GeoHash of the Seattle Center with 10 characters of precision.
SELECT ST_GEOHASH(ST_GEOGPOINT(-122.35, 47.62), 10) geohash
/*--------------*
| geohash |
+--------------+
| c22yzugqw7 |
*--------------*/
ST_GEOMETRYTYPE
ST_GEOMETRYTYPE(geography_expression)
Description
Returns the Open Geospatial Consortium (OGC) geometry type that
describes the input GEOGRAPHY
. The OGC geometry type matches the
types that are used in WKT and GeoJSON formats and
printed for ST_ASTEXT and ST_ASGEOJSON.
ST_GEOMETRYTYPE
returns the OGC geometry type with the "ST_" prefix.
ST_GEOMETRYTYPE
returns the following given the type on the input:
- Single point geography: Returns
ST_Point
. - Collection of only points: Returns
ST_MultiPoint
. - Single linestring geography: Returns
ST_LineString
. - Collection of only linestrings: Returns
ST_MultiLineString
. - Single polygon geography: Returns
ST_Polygon
. - Collection of only polygons: Returns
ST_MultiPolygon
. - Collection with elements of different dimensions, or the input is the empty
geography: Returns
ST_GeometryCollection
.
Return type
STRING
Example
The following example shows how ST_GEOMETRYTYPE
takes geographies and returns
the names of their OGC geometry types.
WITH example AS(
SELECT ST_GEOGFROMTEXT('POINT(0 1)') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('MULTILINESTRING((2 2, 3 4), (5 6, 7 7))')
UNION ALL
SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(MULTIPOINT(-1 2, 0 12), LINESTRING(-2 4, 0 6))')
UNION ALL
SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY'))
SELECT
geography AS WKT,
ST_GEOMETRYTYPE(geography) AS geometry_type_name
FROM example;
/*-------------------------------------------------------------------+-----------------------*
| WKT | geometry_type_name |
+-------------------------------------------------------------------+-----------------------+
| POINT(0 1) | ST_Point |
| MULTILINESTRING((2 2, 3 4), (5 6, 7 7)) | ST_MultiLineString |
| GEOMETRYCOLLECTION(MULTIPOINT(-1 2, 0 12), LINESTRING(-2 4, 0 6)) | ST_GeometryCollection |
| GEOMETRYCOLLECTION EMPTY | ST_GeometryCollection |
*-------------------------------------------------------------------+-----------------------*/
ST_HAUSDORFFDISTANCE
ST_HAUSDORFFDISTANCE(
geography_1,
geography_2
[, directed => { TRUE | FALSE } ]
)
Description
Gets the discrete Hausdorff distance, which is the greatest of all the distances from a discrete point in one geography to the closest discrete point in another geography.
Definitions
geography_1
: AGEOGRAPHY
value that represents the first geography.geography_2
: AGEOGRAPHY
value that represents the second geography.directed
: A named argument with aBOOL
value. Represents the type of computation to use on the input geographies. If this argument isn't specified,directed => FALSE
is used by default.FALSE
: The largest Hausdorff distance found in (geography_1
,geography_2
) and (geography_2
,geography_1
).TRUE
(default): The Hausdorff distance for (geography_1
,geography_2
).
Details
If an input geography is NULL
, the function returns NULL
.
Return type
FLOAT64
Example
The following query gets the Hausdorff distance between geo1
and geo2
:
WITH data AS (
SELECT
ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1,
ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2
)
SELECT ST_HAUSDORFFDISTANCE(geo1, geo2, directed=>TRUE) AS distance
FROM data;
/*--------------------+
| distance |
+--------------------+
| 1688933.9832041925 |
+--------------------*/
The following query gets the Hausdorff distance between geo2
and geo1
:
WITH data AS (
SELECT
ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1,
ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2
)
SELECT ST_HAUSDORFFDISTANCE(geo2, geo1, directed=>TRUE) AS distance
FROM data;
/*--------------------+
| distance |
+--------------------+
| 5802892.745488612 |
+--------------------*/
The following query gets the largest Hausdorff distance between
(geo1
and geo2
) and (geo2
and geo1
):
WITH data AS (
SELECT
ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1,
ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2
)
SELECT ST_HAUSDORFFDISTANCE(geo1, geo2, directed=>FALSE) AS distance
FROM data;
/*--------------------+
| distance |
+--------------------+
| 5802892.745488612 |
+--------------------*/
The following query produces the same results as the previous query because
ST_HAUSDORFFDISTANCE
uses directed=>FALSE
by default.
WITH data AS (
SELECT
ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1,
ST_GEOGFROMTEXT('LINESTRING(20 90, 30 90, 60 10, 90 10)') AS geo2
)
SELECT ST_HAUSDORFFDISTANCE(geo1, geo2) AS distance
FROM data;
ST_INTERIORRINGS
ST_INTERIORRINGS(polygon_geography)
Description
Returns an array of linestring geographies that corresponds to the interior rings of a polygon geography. Each interior ring is the border of a hole within the input polygon.
- If the input geography is a polygon, excludes the outermost ring of the polygon geography and returns the linestrings corresponding to the interior rings.
- If the input is the full
GEOGRAPHY
, returns an empty array. - If the input polygon has no holes, returns an empty array.
- Returns an error if the input is not a single polygon.
Use the SAFE
prefix to return NULL
for invalid input instead of an error.
Return type
ARRAY<LineString GEOGRAPHY>
Examples
WITH geo AS (
SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 1 1, 1 2, 0 0))') AS g UNION ALL
SELECT ST_GEOGFROMTEXT('POLYGON((1 1, 1 10, 5 10, 5 1, 1 1), (2 2, 3 4, 2 4, 2 2))') UNION ALL
SELECT ST_GEOGFROMTEXT('POLYGON((1 1, 1 10, 5 10, 5 1, 1 1), (2 2.5, 3.5 3, 2.5 2, 2 2.5), (3.5 7, 4 6, 3 3, 3.5 7))') UNION ALL
SELECT ST_GEOGFROMTEXT('fullglobe') UNION ALL
SELECT NULL)
SELECT ST_INTERIORRINGS(g) AS rings FROM geo;
/*----------------------------------------------------------------------------*
| rings |
+----------------------------------------------------------------------------+
| [] |
| [LINESTRING(2 2, 3 4, 2 4, 2 2)] |
| [LINESTRING(2.5 2, 3.5 3, 2 2.5, 2.5 2), LINESTRING(3 3, 4 6, 3.5 7, 3 3)] |
| [] |
| NULL |
*----------------------------------------------------------------------------*/
ST_INTERSECTION
ST_INTERSECTION(geography_1, geography_2)
Description
Returns a GEOGRAPHY
that represents the point set
intersection of the two input GEOGRAPHY
s. Thus,
every point in the intersection appears in both geography_1
and geography_2
.
If the two input GEOGRAPHY
s are disjoint, that is,
there are no points that appear in both input geometry_1
and geometry_2
,
then an empty GEOGRAPHY
is returned.
See ST_INTERSECTS, ST_DISJOINT for related predicate functions.
Return type
GEOGRAPHY
ST_INTERSECTS
ST_INTERSECTS(geography_1, geography_2)
Description
Returns TRUE
if the point set intersection of geography_1
and geography_2
is non-empty. Thus, this function returns TRUE
if there is at least one point
that appears in both input GEOGRAPHY
s.
If ST_INTERSECTS
returns TRUE
, it implies that ST_DISJOINT
returns FALSE
.
Return type
BOOL
ST_INTERSECTSBOX
ST_INTERSECTSBOX(geography, lng1, lat1, lng2, lat2)
Description
Returns TRUE
if geography
intersects the rectangle between [lng1, lng2]
and [lat1, lat2]
. The edges of the rectangle follow constant lines of
longitude and latitude. lng1
and lng2
specify the westmost and eastmost
constant longitude lines that bound the rectangle, and lat1
and lat2
specify
the minimum and maximum constant latitude lines that bound the rectangle.
Specify all longitude and latitude arguments in degrees.
Constraints
The input arguments are subject to the following constraints:
- Latitudes should be in the
[-90, 90]
degree range. - Longitudes should follow either of the following rules:
- Both longitudes are in the
[-180, 180]
degree range. - One of the longitudes is in the
[-180, 180]
degree range, andlng2 - lng1
is in the[0, 360]
interval.
- Both longitudes are in the
Return type
BOOL
Example
SELECT p, ST_INTERSECTSBOX(p, -90, 0, 90, 20) AS box1,
ST_INTERSECTSBOX(p, 90, 0, -90, 20) AS box2
FROM UNNEST([ST_GEOGPOINT(10, 10), ST_GEOGPOINT(170, 10),
ST_GEOGPOINT(30, 30)]) p
/*----------------+--------------+--------------*
| p | box1 | box2 |
+----------------+--------------+--------------+
| POINT(10 10) | TRUE | FALSE |
| POINT(170 10) | FALSE | TRUE |
| POINT(30 30) | FALSE | FALSE |
*----------------+--------------+--------------*/
ST_ISCLOSED
ST_ISCLOSED(geography_expression)
Description
Returns TRUE
for a non-empty Geography, where each element in the Geography
has an empty boundary. The boundary for each element can be defined with
ST_BOUNDARY
.
- A point is closed.
- A linestring is closed if the start and end points of the linestring are the same.
- A polygon is closed only if it is a full polygon.
- A collection is closed if and only if every element in the collection is closed.
An empty GEOGRAPHY
is not closed.
Return type
BOOL
Example
WITH example AS(
SELECT ST_GEOGFROMTEXT('POINT(5 0)') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('LINESTRING(0 1, 4 3, 2 6, 0 1)') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('LINESTRING(2 6, 1 3, 3 9)') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY'))
SELECT
geography,
ST_ISCLOSED(geography) AS is_closed,
FROM example;
/*------------------------------------------------------+-----------*
| geography | is_closed |
+------------------------------------------------------+-----------+
| POINT(5 0) | TRUE |
| LINESTRING(0 1, 4 3, 2 6, 0 1) | TRUE |
| LINESTRING(2 6, 1 3, 3 9) | FALSE |
| GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1)) | FALSE |
| GEOMETRYCOLLECTION EMPTY | FALSE |
*------------------------------------------------------+-----------*/
ST_ISCOLLECTION
ST_ISCOLLECTION(geography_expression)
Description
Returns TRUE
if the total number of points, linestrings, and polygons is
greater than one.
An empty GEOGRAPHY
is not a collection.
Return type
BOOL
ST_ISEMPTY
ST_ISEMPTY(geography_expression)
Description
Returns TRUE
if the given GEOGRAPHY
is empty; that is, the GEOGRAPHY
does
not contain any points, lines, or polygons.
NOTE: An empty GEOGRAPHY
is not associated with a particular geometry shape.
For example, the results of expressions ST_GEOGFROMTEXT('POINT EMPTY')
and
ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY')
are identical.
Return type
BOOL
ST_ISRING
ST_ISRING(geography_expression)
Description
Returns TRUE
if the input GEOGRAPHY
is a linestring and if the
linestring is both ST_ISCLOSED
and
simple. A linestring is considered simple if it does not pass through the
same point twice (with the exception of the start and endpoint, which may
overlap to form a ring).
An empty GEOGRAPHY
is not a ring.
Return type
BOOL
ST_LENGTH
ST_LENGTH(geography_expression[, use_spheroid])
Description
Returns the total length in meters of the lines in the input
GEOGRAPHY
.
If geography_expression
is a point or a polygon, returns zero. If
geography_expression
is a collection, returns the length of the lines in the
collection; if the collection does not contain lines, returns zero.
The optional use_spheroid
parameter determines how this function measures
distance. If use_spheroid
is FALSE
, the function measures distance on the
surface of a perfect sphere.
The use_spheroid
parameter currently only supports
the value FALSE
. The default value of use_spheroid
is FALSE
.
Return type
FLOAT64
ST_LINEINTERPOLATEPOINT
ST_LINEINTERPOLATEPOINT(linestring_geography, fraction)
Description
Gets a point at a specific fraction in a linestring GEOGRAPHY
value.
Definitions
linestring_geography
: A linestringGEOGRAPHY
on which the target point is located.fraction
: AFLOAT64
value that represents a fraction along the linestringGEOGRAPHY
where the target point is located. This should be an inclusive value between0
(start of the linestring) and1
(end of the linestring).
Details
- Returns
NULL
if any input argument isNULL
. - Returns an empty geography if
linestring_geography
is an empty geography. - Returns an error if
linestring_geography
is not a linestring or an empty geography, or iffraction
is outside the[0, 1]
range.
Return Type
GEOGRAPHY
Example
The following query returns a few points on a linestring. Notice that the
midpoint of the linestring LINESTRING(1 1, 5 5)
is slightly different from
POINT(3 3)
because the GEOGRAPHY
type uses geodesic line segments.
WITH fractions AS (
SELECT 0 AS fraction UNION ALL
SELECT 0.5 UNION ALL
SELECT 1 UNION ALL
SELECT NULL
)
SELECT
fraction,
ST_LINEINTERPOLATEPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 5 5)'), fraction)
AS point
FROM fractions
/*-------------+-------------------------------------------*
| fraction | point |
+-------------+-------------------------------------------+
| 0 | POINT(1 1) |
| 0.5 | POINT(2.99633827268976 3.00182528336078) |
| 1 | POINT(5 5) |
| NULL | NULL |
*-------------+-------------------------------------------*/
ST_LINELOCATEPOINT
ST_LINELOCATEPOINT(linestring_geography, point_geography)
Description
Gets a section of a linestring between the start point and a selected point (a
point on the linestring closest to the point_geography
argument). Returns the
percentage that this section represents in the linestring.
Details:
- To select a point on the linestring
GEOGRAPHY
(linestring_geography
), this function takes a pointGEOGRAPHY
(point_geography
) and finds the closest point to it on the linestring. - If two points on
linestring_geography
are an equal distance away frompoint_geography
, it is not guaranteed which one will be selected. - The return value is an inclusive value between 0 and 1 (0-100%).
- If the selected point is the start point on the linestring, function returns 0 (0%).
- If the selected point is the end point on the linestring, function returns 1 (100%).
NULL
and error handling:
- Returns
NULL
if any input argument isNULL
. - Returns an error if
linestring_geography
is not a linestring or ifpoint_geography
is not a point. Use theSAFE
prefix to obtainNULL
for invalid input instead of an error.
Return Type
FLOAT64
Examples
WITH geos AS (
SELECT ST_GEOGPOINT(0, 0) AS point UNION ALL
SELECT ST_GEOGPOINT(1, 0) UNION ALL
SELECT ST_GEOGPOINT(1, 1) UNION ALL
SELECT ST_GEOGPOINT(2, 2) UNION ALL
SELECT ST_GEOGPOINT(3, 3) UNION ALL
SELECT ST_GEOGPOINT(4, 4) UNION ALL
SELECT ST_GEOGPOINT(5, 5) UNION ALL
SELECT ST_GEOGPOINT(6, 5) UNION ALL
SELECT NULL
)
SELECT
point AS input_point,
ST_LINELOCATEPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 5 5)'), point)
AS percentage_from_beginning
FROM geos
/*-------------+---------------------------*
| input_point | percentage_from_beginning |
+-------------+---------------------------+
| POINT(0 0) | 0 |
| POINT(1 0) | 0 |
| POINT(1 1) | 0 |
| POINT(2 2) | 0.25015214685147907 |
| POINT(3 3) | 0.5002284283637185 |
| POINT(4 4) | 0.7501905913884388 |
| POINT(5 5) | 1 |
| POINT(6 5) | 1 |
| NULL | NULL |
*-------------+---------------------------*/
ST_LINESUBSTRING
ST_LINESUBSTRING(linestring_geography, start_fraction, end_fraction);
Description
Gets a segment of a linestring at a specific starting and ending fraction.
Definitions
linestring_geography
: The LineStringGEOGRAPHY
value that represents the linestring from which to extract a segment.start_fraction
:FLOAT64
value that represents the starting fraction of the total length oflinestring_geography
. This must be an inclusive value between 0 and 1 (0-100%).end_fraction
:FLOAT64
value that represents the ending fraction of the total length oflinestring_geography
. This must be an inclusive value between 0 and 1 (0-100%).
Details
end_fraction
must be greater than or equal to start_fraction
.
If start_fraction
and end_fraction
are equal, a linestring with only
one point is produced.
Return type
- LineString
GEOGRAPHY
if the resulting geography has more than one point. - Point
GEOGRAPHY
if the resulting geography has only one point.
Example
The following query returns the second half of the linestring:
WITH data AS (
SELECT ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1
)
SELECT ST_LINESUBSTRING(geo1, 0.5, 1) AS segment
FROM data;
/*-------------------------------------------------------------+
| segment |
+-------------------------------------------------------------+
| LINESTRING(49.4760661523471 67.2419539103851, 10 70, 70 70) |
+-------------------------------------------------------------*/
The following query returns a linestring that only contains one point:
WITH data AS (
SELECT ST_GEOGFROMTEXT('LINESTRING(20 70, 70 60, 10 70, 70 70)') AS geo1
)
SELECT ST_LINESUBSTRING(geo1, 0.5, 0.5) AS segment
FROM data;
/*------------------------------------------+
| segment |
+------------------------------------------+
| POINT(49.4760661523471 67.2419539103851) |
+------------------------------------------*/
ST_MAKELINE
ST_MAKELINE(geography_1, geography_2)
ST_MAKELINE(array_of_geography)
Description
Creates a GEOGRAPHY
with a single linestring by
concatenating the point or line vertices of each of the input
GEOGRAPHY
s in the order they are given.
ST_MAKELINE
comes in two variants. For the first variant, input must be two
GEOGRAPHY
s. For the second, input must be an ARRAY
of type GEOGRAPHY
. In
either variant, each input GEOGRAPHY
must consist of one of the following
values:
- Exactly one point.
- Exactly one linestring.
For the first variant of ST_MAKELINE
, if either input GEOGRAPHY
is NULL
,
ST_MAKELINE
returns NULL
. For the second variant, if input ARRAY
or any
element in the input ARRAY
is NULL
, ST_MAKELINE
returns NULL
.
Constraints
Every edge must span strictly less than 180 degrees.
NOTE: The GoogleSQL snapping process may discard sufficiently short
edges and snap the two endpoints together. For instance, if two input
GEOGRAPHY
s each contain a point and the two points are separated by a distance
less than the snap radius, the points will be snapped together. In such a case
the result will be a GEOGRAPHY
with exactly one point.
Return type
LineString GEOGRAPHY
ST_MAKEPOLYGON
ST_MAKEPOLYGON(polygon_shell[, array_of_polygon_holes])
Description
Creates a GEOGRAPHY
containing a single polygon
from linestring inputs, where each input linestring is used to construct a
polygon ring.
ST_MAKEPOLYGON
comes in two variants. For the first variant, the input
linestring is provided by a single GEOGRAPHY
containing exactly one
linestring. For the second variant, the input consists of a single GEOGRAPHY
and an array of GEOGRAPHY
s, each containing exactly one linestring.
The first GEOGRAPHY
in either variant is used to construct the polygon shell.
Additional GEOGRAPHY
s provided in the input ARRAY
specify a polygon hole.
For every input GEOGRAPHY
containing exactly one linestring, the following
must be true:
- The linestring must consist of at least three distinct vertices.
- The linestring must be closed: that is, the first and last vertex have to be the same. If the first and last vertex differ, the function constructs a final edge from the first vertex to the last.
For the first variant of ST_MAKEPOLYGON
, if either input GEOGRAPHY
is
NULL
, ST_MAKEPOLYGON
returns NULL
. For the second variant, if
input ARRAY
or any element in the ARRAY
is NULL
, ST_MAKEPOLYGON
returns
NULL
.
NOTE: ST_MAKEPOLYGON
accepts an empty GEOGRAPHY
as input. ST_MAKEPOLYGON
interprets an empty GEOGRAPHY
as having an empty linestring, which will
create a full loop: that is, a polygon that covers the entire Earth.
Constraints
Together, the input rings must form a valid polygon:
- The polygon shell must cover each of the polygon holes.
- There can be only one polygon shell (which has to be the first input ring). This implies that polygon holes cannot be nested.
- Polygon rings may only intersect in a vertex on the boundary of both rings.
Every edge must span strictly less than 180 degrees.
Each polygon ring divides the sphere into two regions. The first input linesting
to ST_MAKEPOLYGON
forms the polygon shell, and the interior is chosen to be
the smaller of the two regions. Each subsequent input linestring specifies a
polygon hole, so the interior of the polygon is already well-defined. In order
to define a polygon shell such that the interior of the polygon is the larger of
the two regions, see ST_MAKEPOLYGONORIENTED
.
NOTE: The GoogleSQL snapping process may discard sufficiently
short edges and snap the two endpoints together. Hence, when vertices are
snapped together, it is possible that a polygon hole that is sufficiently small
may disappear, or the output GEOGRAPHY
may contain only a line or a
point.
Return type
GEOGRAPHY
ST_MAKEPOLYGONORIENTED
ST_MAKEPOLYGONORIENTED(array_of_geography)
Description
Like ST_MAKEPOLYGON
, but the vertex ordering of each input linestring
determines the orientation of each polygon ring. The orientation of a polygon
ring defines the interior of the polygon as follows: if someone walks along the
boundary of the polygon in the order of the input vertices, the interior of the
polygon is on the left. This applies for each polygon ring provided.
This variant of the polygon constructor is more flexible since
ST_MAKEPOLYGONORIENTED
can construct a polygon such that the interior is on
either side of the polygon ring. However, proper orientation of polygon rings is
critical in order to construct the desired polygon.
If the input ARRAY
or any element in the ARRAY
is NULL
,
ST_MAKEPOLYGONORIENTED
returns NULL
.
NOTE: The input argument for ST_MAKEPOLYGONORIENTED
may contain an empty
GEOGRAPHY
. ST_MAKEPOLYGONORIENTED
interprets an empty GEOGRAPHY
as having
an empty linestring, which will create a full loop: that is, a polygon that
covers the entire Earth.
Constraints
Together, the input rings must form a valid polygon:
- The polygon shell must cover each of the polygon holes.
- There must be only one polygon shell, which must to be the first input ring. This implies that polygon holes cannot be nested.
- Polygon rings may only intersect in a vertex on the boundary of both rings.
Every edge must span strictly less than 180 degrees.
ST_MAKEPOLYGONORIENTED
relies on the ordering of the input vertices of each
linestring to determine the orientation of the polygon. This applies to the
polygon shell and any polygon holes. ST_MAKEPOLYGONORIENTED
expects all
polygon holes to have the opposite orientation of the shell. See
ST_MAKEPOLYGON
for an alternate polygon constructor, and
other constraints on building a valid polygon.
NOTE: Due to the GoogleSQL snapping process, edges with a sufficiently
short length will be discarded and the two endpoints will be snapped to a single
point. Therefore, it is possible that vertices in a linestring may be snapped
together such that one or more edge disappears. Hence, it is possible that a
polygon hole that is sufficiently small may disappear, or the resulting
GEOGRAPHY
may contain only a line or a point.
Return type
GEOGRAPHY
ST_MAXDISTANCE
ST_MAXDISTANCE(geography_1, geography_2[, use_spheroid])
Returns the longest distance in meters between two non-empty
GEOGRAPHY
s; that is, the distance between two
vertices where the first vertex is in the first
GEOGRAPHY
, and the second vertex is in the second
GEOGRAPHY
. If geography_1
and geography_2
are the
same GEOGRAPHY
, the function returns the distance
between the two most distant vertices in that
GEOGRAPHY
.
If either of the input GEOGRAPHY
s is empty,
ST_MAXDISTANCE
returns NULL
.
The optional use_spheroid
parameter determines how this function measures
distance. If use_spheroid
is FALSE
, the function measures distance on the
surface of a perfect sphere.
The use_spheroid
parameter currently only supports
the value FALSE
. The default value of use_spheroid
is FALSE
.
Return type
FLOAT64
ST_NPOINTS
ST_NPOINTS(geography_expression)
Description
An alias of ST_NUMPOINTS.
ST_NUMGEOMETRIES
ST_NUMGEOMETRIES(geography_expression)
Description
Returns the number of geometries in the input GEOGRAPHY
. For a single point,
linestring, or polygon, ST_NUMGEOMETRIES
returns 1
. For any collection of
geometries, ST_NUMGEOMETRIES
returns the number of geometries making up the
collection. ST_NUMGEOMETRIES
returns 0
if the input is the empty
GEOGRAPHY
.
Return type
INT64
Example
The following example computes ST_NUMGEOMETRIES
for a single point geography,
two collections, and an empty geography.
WITH example AS(
SELECT ST_GEOGFROMTEXT('POINT(5 0)') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('MULTIPOINT(0 1, 4 3, 2 6)') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1))') AS geography
UNION ALL
SELECT ST_GEOGFROMTEXT('GEOMETRYCOLLECTION EMPTY'))
SELECT
geography,
ST_NUMGEOMETRIES(geography) AS num_geometries,
FROM example;
/*------------------------------------------------------+----------------*
| geography | num_geometries |
+------------------------------------------------------+----------------+
| POINT(5 0) | 1 |
| MULTIPOINT(0 1, 4 3, 2 6) | 3 |
| GEOMETRYCOLLECTION(POINT(0 0), LINESTRING(1 2, 2 1)) | 2 |
| GEOMETRYCOLLECTION EMPTY | 0 |
*------------------------------------------------------+----------------*/
ST_NUMPOINTS
ST_NUMPOINTS(geography_expression)
Description
Returns the number of vertices in the input
GEOGRAPHY
. This includes the number of points, the
number of linestring vertices, and the number of polygon vertices.
NOTE: The first and last vertex of a polygon ring are counted as distinct vertices.
Return type
INT64
ST_PERIMETER
ST_PERIMETER(geography_expression[, use_spheroid])
Description
Returns the length in meters of the boundary of the polygons in the input
GEOGRAPHY
.
If geography_expression
is a point or a line, returns zero. If
geography_expression
is a collection, returns the perimeter of the polygons
in the collection; if the collection does not contain polygons, returns zero.
The optional use_spheroid
parameter determines how this function measures
distance. If use_spheroid
is FALSE
, the function measures distance on the
surface of a perfect sphere.
The use_spheroid
parameter currently only supports
the value FALSE
. The default value of use_spheroid
is FALSE
.
Return type
FLOAT64
ST_POINTN
ST_POINTN(linestring_geography, index)
Description
Returns the Nth point of a linestring geography as a point geography, where N is
the index. The index is 1-based. Negative values are counted backwards from the
end of the linestring, so that -1 is the last point. Returns an error if the
input is not a linestring, if the input is empty, or if there is no vertex at
the given index. Use the SAFE
prefix to obtain NULL
for invalid input
instead of an error.
Return Type
Point GEOGRAPHY
Example
The following example uses ST_POINTN
, ST_STARTPOINT
and
ST_ENDPOINT
to extract points from a linestring.
WITH linestring AS (
SELECT ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)') g
)
SELECT ST_POINTN(g, 1) AS first, ST_POINTN(g, -1) AS last,
ST_POINTN(g, 2) AS second, ST_POINTN(g, -2) AS second_to_last
FROM linestring;
/*--------------+--------------+--------------+----------------*
| first | last | second | second_to_last |
+--------------+--------------+--------------+----------------+
| POINT(1 1) | POINT(3 3) | POINT(2 1) | POINT(3 2) |
*--------------+--------------+--------------+----------------*/
ST_SIMPLIFY
ST_SIMPLIFY(geography, tolerance_meters)
Description
Returns a simplified version of geography
, the given input
GEOGRAPHY
. The input GEOGRAPHY
is simplified by replacing nearly straight
chains of short edges with a single long edge. The input geography
will not
change by more than the tolerance specified by tolerance_meters
. Thus,
simplified edges are guaranteed to pass within tolerance_meters
of the
original positions of all vertices that were removed from that edge. The given
tolerance_meters
is in meters on the surface of the Earth.
Note that ST_SIMPLIFY
preserves topological relationships, which means that
no new crossing edges will be created and the output will be valid. For a large
enough tolerance, adjacent shapes may collapse into a single object, or a shape
could be simplified to a shape with a smaller dimension.
Constraints
For ST_SIMPLIFY
to have any effect, tolerance_meters
must be non-zero.
ST_SIMPLIFY
returns an error if the tolerance specified by tolerance_meters
is one of the following:
- A negative tolerance.
- Greater than ~7800 kilometers.
Return type
GEOGRAPHY
Examples
The following example shows how ST_SIMPLIFY
simplifies the input line
GEOGRAPHY
by removing intermediate vertices.
WITH example AS
(SELECT ST_GEOGFROMTEXT('LINESTRING(0 0, 0.05 0, 0.1 0, 0.15 0, 2 0)') AS line)
SELECT
line AS original_line,
ST_SIMPLIFY(line, 1) AS simplified_line
FROM example;
/*---------------------------------------------+----------------------*
| original_line | simplified_line |
+---------------------------------------------+----------------------+
| LINESTRING(0 0, 0.05 0, 0.1 0, 0.15 0, 2 0) | LINESTRING(0 0, 2 0) |
*---------------------------------------------+----------------------*/
The following example illustrates how the result of ST_SIMPLIFY
can have a
lower dimension than the original shape.
WITH example AS
(SELECT
ST_GEOGFROMTEXT('POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0))') AS polygon,
t AS tolerance
FROM UNNEST([1000, 10000, 100000]) AS t)
SELECT
polygon AS original_triangle,
tolerance AS tolerance_meters,
ST_SIMPLIFY(polygon, tolerance) AS simplified_result
FROM example
/*-------------------------------------+------------------+-------------------------------------*
| original_triangle | tolerance_meters | simplified_result |
+-------------------------------------+------------------+-------------------------------------+
| POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 1000 | POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) |
| POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 10000 | LINESTRING(0 0, 0.1 0.1) |
| POLYGON((0 0, 0.1 0, 0.1 0.1, 0 0)) | 100000 | POINT(0 0) |
*-------------------------------------+------------------+-------------------------------------*/
ST_SNAPTOGRID
ST_SNAPTOGRID(geography_expression, grid_size)
Description
Returns the input GEOGRAPHY
, where each vertex has
been snapped to a longitude/latitude grid. The grid size is determined by the
grid_size
parameter which is given in degrees.
Constraints
Arbitrary grid sizes are not supported. The grid_size
parameter is rounded so
that it is of the form 10^n
, where -10 < n < 0
.
Return type
GEOGRAPHY
ST_STARTPOINT
ST_STARTPOINT(linestring_geography)
Description
Returns the first point of a linestring geography as a point geography. Returns
an error if the input is not a linestring or if the input is empty. Use the
SAFE
prefix to obtain NULL
for invalid input instead of an error.
Return Type
Point GEOGRAPHY
Example
SELECT ST_STARTPOINT(ST_GEOGFROMTEXT('LINESTRING(1 1, 2 1, 3 2, 3 3)')) first
/*--------------*
| first |
+--------------+
| POINT(1 1) |
*--------------*/
ST_TOUCHES
ST_TOUCHES(geography_1, geography_2)
Description
Returns TRUE
provided the following two conditions are satisfied:
geography_1
intersectsgeography_2
.- The interior of
geography_1
and the interior ofgeography_2
are disjoint.
Return type
BOOL
ST_UNION
ST_UNION(geography_1, geography_2)
ST_UNION(array_of_geography)
Description
Returns a GEOGRAPHY
that represents the point set
union of all input GEOGRAPHY
s.
ST_UNION
comes in two variants. For the first variant, input must be two
GEOGRAPHY
s. For the second, the input is an
ARRAY
of type GEOGRAPHY
.
For the first variant of ST_UNION
, if an input
GEOGRAPHY
is NULL
, ST_UNION
returns NULL
.
For the second variant, if the input ARRAY
value
is NULL
, ST_UNION
returns NULL
.
For a non-NULL
input ARRAY
, the union is computed
and NULL
elements are ignored so that they do not affect the output.
See ST_UNION_AGG
for the aggregate version of ST_UNION
.
Return type
GEOGRAPHY
Example
SELECT ST_UNION(
ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)'),
ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -100.19 47.69)')
) AS results
/*---------------------------------------------------------*
| results |
+---------------------------------------------------------+
| LINESTRING(-100.19 47.69, -122.12 47.67, -122.19 47.69) |
*---------------------------------------------------------*/
ST_UNION_AGG
ST_UNION_AGG(geography)
Description
Returns a GEOGRAPHY
that represents the point set
union of all input GEOGRAPHY
s.
ST_UNION_AGG
ignores NULL
input GEOGRAPHY
values.
See ST_UNION
for the non-aggregate version of ST_UNION_AGG
.
Return type
GEOGRAPHY
Example
SELECT ST_UNION_AGG(items) AS results
FROM UNNEST([
ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)'),
ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -100.19 47.69)'),
ST_GEOGFROMTEXT('LINESTRING(-122.12 47.67, -122.19 47.69)')]) as items;
/*---------------------------------------------------------*
| results |
+---------------------------------------------------------+
| LINESTRING(-100.19 47.69, -122.12 47.67, -122.19 47.69) |
*---------------------------------------------------------*/
ST_WITHIN
ST_WITHIN(geography_1, geography_2)
Description
Returns TRUE
if no point of geography_1
is outside of geography_2
and
the interiors of geography_1
and geography_2
intersect.
Given two geographies a
and b
, ST_WITHIN(a, b)
returns the same result
as ST_CONTAINS
(b, a)
. Note the opposite order of arguments.
Return type
BOOL
ST_X
ST_X(point_geography_expression)
Description
Returns the longitude in degrees of the single-point input
GEOGRAPHY
.
For any input GEOGRAPHY
that is not a single point,
including an empty GEOGRAPHY
, ST_X
returns an
error. Use the SAFE.
prefix to obtain NULL
.
Return type
FLOAT64
Example
The following example uses ST_X
and ST_Y
to extract coordinates from
single-point geographies.
WITH points AS
(SELECT ST_GEOGPOINT(i, i + 1) AS p FROM UNNEST([0, 5, 12]) AS i)
SELECT
p,
ST_X(p) as longitude,
ST_Y(p) as latitude
FROM points;
/*--------------+-----------+----------*
| p | longitude | latitude |
+--------------+-----------+----------+
| POINT(0 1) | 0.0 | 1.0 |
| POINT(5 6) | 5.0 | 6.0 |
| POINT(12 13) | 12.0 | 13.0 |
*--------------+-----------+----------*/
ST_Y
ST_Y(point_geography_expression)
Description
Returns the latitude in degrees of the single-point input
GEOGRAPHY
.
For any input GEOGRAPHY
that is not a single point,
including an empty GEOGRAPHY
, ST_Y
returns an
error. Use the SAFE.
prefix to return NULL
instead.
Return type
FLOAT64
Example
See ST_X
for example usage.
Hash functions
GoogleSQL for BigQuery supports the following hash functions.
Function list
Name | Summary |
---|---|
FARM_FINGERPRINT
|
Computes the fingerprint of a STRING or
BYTES value, using the FarmHash Fingerprint64 algorithm.
|
MD5
|
Computes the hash of a STRING or
BYTES value, using the MD5 algorithm.
|
SHA1
|
Computes the hash of a STRING or
BYTES value, using the SHA-1 algorithm.
|
SHA256
|
Computes the hash of a STRING or
BYTES value, using the SHA-256 algorithm.
|
SHA512
|
Computes the hash of a STRING or
BYTES value, using the SHA-512 algorithm.
|
FARM_FINGERPRINT
FARM_FINGERPRINT(value)
Description
Computes the fingerprint of the STRING
or BYTES
input using the
Fingerprint64
function from the
open-source FarmHash library. The output
of this function for a particular input will never change.
Return type
INT64
Examples
WITH example AS (
SELECT 1 AS x, "foo" AS y, true AS z UNION ALL
SELECT 2 AS x, "apple" AS y, false AS z UNION ALL
SELECT 3 AS x, "" AS y, true AS z
)
SELECT
*,
FARM_FINGERPRINT(CONCAT(CAST(x AS STRING), y, CAST(z AS STRING)))
AS row_fingerprint
FROM example;
/*---+-------+-------+----------------------*
| x | y | z | row_fingerprint |
+---+-------+-------+----------------------+
| 1 | foo | true | -1541654101129638711 |
| 2 | apple | false | 2794438866806483259 |
| 3 | | true | -4880158226897771312 |
*---+-------+-------+----------------------*/
MD5
MD5(input)
Description
Computes the hash of the input using the
MD5 algorithm. The input can either be
STRING
or BYTES
. The string version treats the input as an array of bytes.
This function returns 16 bytes.
Return type
BYTES
Example
SELECT MD5("Hello World") as md5;
-- Note that the result of MD5 is of type BYTES, displayed as a base64-encoded string.
/*--------------------------*
| md5 |
+--------------------------+
| sQqNsWTgdUEFt6mb5y4/5Q== |
*--------------------------*/
SHA1
SHA1(input)
Description
Computes the hash of the input using the
SHA-1 algorithm. The input can either be
STRING
or BYTES
. The string version treats the input as an array of bytes.
This function returns 20 bytes.
Return type
BYTES
Example
SELECT SHA1("Hello World") as sha1;
-- Note that the result of SHA1 is of type BYTES, displayed as a base64-encoded string.
/*------------------------------*
| sha1 |
+------------------------------+
| Ck1VqNd45QIvq3AZd8XYQLvEhtA= |
*------------------------------*/
SHA256
SHA256(input)
Description
Computes the hash of the input using the
SHA-256 algorithm. The input can either be
STRING
or BYTES
. The string version treats the input as an array of bytes.
This function returns 32 bytes.
Return type
BYTES
Example
SELECT SHA256("Hello World") as sha256;
SHA512
SHA512(input)
Description
Computes the hash of the input using the
SHA-512 algorithm. The input can either be
STRING
or BYTES
. The string version treats the input as an array of bytes.
This function returns 64 bytes.
Return type
BYTES
Example
SELECT SHA512("Hello World") as sha512;
HyperLogLog++ functions
The HyperLogLog++ algorithm (HLL++) estimates cardinality from sketches.
HLL++ functions are approximate aggregate functions.
Approximate aggregation typically requires less
memory than exact aggregation functions,
like COUNT(DISTINCT)
, but also introduces statistical error.
This makes HLL++ functions appropriate for large data streams for
which linear memory usage is impractical, as well as for data that is
already approximate.
If you do not need materialized sketches, you can alternatively use an
approximate aggregate function with system-defined precision,
such as APPROX_COUNT_DISTINCT
. However,
APPROX_COUNT_DISTINCT
does not allow partial aggregations, re-aggregations,
and custom precision.
GoogleSQL for BigQuery supports the following HLL++ functions:
Function list
Name | Summary |
---|---|
HLL_COUNT.EXTRACT
|
Extracts a cardinality estimate of an HLL++ sketch. |
HLL_COUNT.INIT
|
Aggregates values of the same underlying type into a new HLL++ sketch. |
HLL_COUNT.MERGE
|
Merges HLL++ sketches of the same underlying type into a new sketch, and then gets the cardinality of the new sketch. |
HLL_COUNT.MERGE_PARTIAL
|
Merges HLL++ sketches of the same underlying type into a new sketch. |
HLL_COUNT.EXTRACT
HLL_COUNT.EXTRACT(sketch)
Description
A scalar function that extracts a cardinality estimate of a single HLL++ sketch.
If sketch
is NULL
, this function returns a cardinality estimate of 0
.
Supported input types
BYTES
Return type
INT64
Example
The following query returns the number of distinct users for each country who have at least one invoice.
SELECT
country,
HLL_COUNT.EXTRACT(HLL_sketch) AS distinct_customers_with_open_invoice
FROM
(
SELECT
country,
HLL_COUNT.INIT(customer_id) AS hll_sketch
FROM
UNNEST(
ARRAY<STRUCT<country STRING, customer_id STRING, invoice_id STRING>>[
('UA', 'customer_id_1', 'invoice_id_11'),
('BR', 'customer_id_3', 'invoice_id_31'),
('CZ', 'customer_id_2', 'invoice_id_22'),
('CZ', 'customer_id_2', 'invoice_id_23'),
('BR', 'customer_id_3', 'invoice_id_31'),
('UA', 'customer_id_2', 'invoice_id_24')])
GROUP BY country
);
/*---------+--------------------------------------*
| country | distinct_customers_with_open_invoice |
+---------+--------------------------------------+
| UA | 2 |
| BR | 1 |
| CZ | 1 |
*---------+--------------------------------------*/
HLL_COUNT.INIT
HLL_COUNT.INIT(input [, precision])
Description
An aggregate function that takes one or more input
values and aggregates them
into a HLL++ sketch. Each sketch
is represented using the BYTES
data type. You can then merge sketches using
HLL_COUNT.MERGE
or HLL_COUNT.MERGE_PARTIAL
. If no merging is needed,
you can extract the final count of distinct values from the sketch using
HLL_COUNT.EXTRACT
.
This function supports an optional parameter, precision
. This parameter
defines the accuracy of the estimate at the cost of additional memory required
to process the sketches or store them on disk. The range for this value is
10
to 24
. The default value is 15
. For more information about precision,
see Precision for sketches.
If the input is NULL
, this function returns NULL
.
For more information, see HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.
Supported input types
INT64
NUMERIC
BIGNUMERIC
STRING
BYTES
Return type
BYTES
Example
The following query creates HLL++ sketches that count the number of distinct users with at least one invoice per country.
SELECT
country,
HLL_COUNT.INIT(customer_id, 10)
AS hll_sketch
FROM
UNNEST(
ARRAY<STRUCT<country STRING, customer_id STRING, invoice_id STRING>>[
('UA', 'customer_id_1', 'invoice_id_11'),
('CZ', 'customer_id_2', 'invoice_id_22'),
('CZ', 'customer_id_2', 'invoice_id_23'),
('BR', 'customer_id_3', 'invoice_id_31'),
('UA', 'customer_id_2', 'invoice_id_24')])
GROUP BY country;
/*---------+------------------------------------------------------------------------------------*
| country | hll_sketch |
+---------+------------------------------------------------------------------------------------+
| UA | "\010p\020\002\030\002 \013\202\007\r\020\002\030\n \0172\005\371\344\001\315\010" |
| CZ | "\010p\020\002\030\002 \013\202\007\013\020\001\030\n \0172\003\371\344\001" |
| BR | "\010p\020\001\030\002 \013\202\007\013\020\001\030\n \0172\003\202\341\001" |
*---------+------------------------------------------------------------------------------------*/
HLL_COUNT.MERGE
HLL_COUNT.MERGE(sketch)
Description
An aggregate function that returns the cardinality of several HLL++ sketches by computing their union.
Each sketch
must be initialized on the same type. Attempts to merge sketches
for different types results in an error. For example, you cannot merge a sketch
initialized from INT64
data with one initialized from STRING
data.
If the merged sketches were initialized with different precisions, the precision will be downgraded to the lowest precision involved in the merge.
This function ignores NULL
values when merging sketches. If the merge happens
over zero rows or only over NULL
values, the function returns 0
.
Supported input types
BYTES
Return type
INT64
Example
The following query counts the number of distinct users across all countries who have at least one invoice.
SELECT HLL_COUNT.MERGE(hll_sketch) AS distinct_customers_with_open_invoice
FROM
(
SELECT
country,
HLL_COUNT.INIT(customer_id) AS hll_sketch
FROM
UNNEST(
ARRAY<STRUCT<country STRING, customer_id STRING, invoice_id STRING>>[
('UA', 'customer_id_1', 'invoice_id_11'),
('BR', 'customer_id_3', 'invoice_id_31'),
('CZ', 'customer_id_2', 'invoice_id_22'),
('CZ', 'customer_id_2', 'invoice_id_23'),
('BR', 'customer_id_3', 'invoice_id_31'),
('UA', 'customer_id_2', 'invoice_id_24')])
GROUP BY country
);
/*--------------------------------------*
| distinct_customers_with_open_invoice |
+--------------------------------------+
| 3 |
*--------------------------------------*/
HLL_COUNT.MERGE_PARTIAL
HLL_COUNT.MERGE_PARTIAL(sketch)
Description
An aggregate function that takes one or more
HLL++ sketch
inputs and merges them into a new sketch.
Each sketch
must be initialized on the same type. Attempts to merge sketches
for different types results in an error. For example, you cannot merge a sketch
initialized from INT64
data with one initialized from STRING
data.
If the merged sketches were initialized with different precisions, the precision
will be downgraded to the lowest precision involved in the merge. For example,
if MERGE_PARTIAL
encounters sketches of precision 14 and 15, the returned new
sketch will have precision 14.
This function returns NULL
if there is no input or all inputs are NULL
.
Supported input types
BYTES
Return type
BYTES
Example
The following query returns an HLL++ sketch that counts the number of distinct users who have at least one invoice across all countries.
SELECT HLL_COUNT.MERGE_PARTIAL(HLL_sketch) AS distinct_customers_with_open_invoice
FROM
(
SELECT
country,
HLL_COUNT.INIT(customer_id) AS hll_sketch
FROM
UNNEST(
ARRAY<STRUCT<country STRING, customer_id STRING, invoice_id STRING>>[
('UA', 'customer_id_1', 'invoice_id_11'),
('BR', 'customer_id_3', 'invoice_id_31'),
('CZ', 'customer_id_2', 'invoice_id_22'),
('CZ', 'customer_id_2', 'invoice_id_23'),
('BR', 'customer_id_3', 'invoice_id_31'),
('UA', 'customer_id_2', 'invoice_id_24')])
GROUP BY country
);
/*----------------------------------------------------------------------------------------------*
| distinct_customers_with_open_invoice |
+----------------------------------------------------------------------------------------------+
| "\010p\020\006\030\002 \013\202\007\020\020\003\030\017 \0242\010\320\2408\352}\244\223\002" |
*----------------------------------------------------------------------------------------------*/
Interval functions
GoogleSQL for BigQuery supports the following interval functions.
Function list
Name | Summary |
---|---|
EXTRACT
|
Extracts part of an INTERVAL value.
|
JUSTIFY_DAYS
|
Normalizes the day part of an INTERVAL value.
|
JUSTIFY_HOURS
|
Normalizes the time part of an INTERVAL value.
|
JUSTIFY_INTERVAL
|
Normalizes the day and time parts of an INTERVAL value.
|
MAKE_INTERVAL
|
Constructs an INTERVAL value.
|
EXTRACT
EXTRACT(part FROM interval_expression)
Description
Returns the value corresponding to the specified date part. The part
must be
one of YEAR
, MONTH
, DAY
, HOUR
, MINUTE
, SECOND
, MILLISECOND
or
MICROSECOND
.
Return Data Type
INTERVAL
Examples
In the following example, different parts of two intervals are extracted.
SELECT
EXTRACT(YEAR FROM i) AS year,
EXTRACT(MONTH FROM i) AS month,
EXTRACT(DAY FROM i) AS day,
EXTRACT(HOUR FROM i) AS hour,
EXTRACT(MINUTE FROM i) AS minute,
EXTRACT(SECOND FROM i) AS second,
EXTRACT(MILLISECOND FROM i) AS milli,
EXTRACT(MICROSECOND FROM i) AS micro
FROM
UNNEST([INTERVAL '1-2 3 4:5:6.789999' YEAR TO SECOND,
INTERVAL '0-13 370 48:61:61' YEAR TO SECOND]) AS i
/*------+-------+-----+------+--------+--------+-------+--------*
| year | month | day | hour | minute | second | milli | micro |
+------+-------+-----+------+--------+--------+-------+--------+
| 1 | 2 | 3 | 4 | 5 | 6 | 789 | 789999 |
| 1 | 1 | 370 | 49 | 2 | 1 | 0 | 0 |
*------+-------+-----+------+--------+--------+-------+--------*/
When a negative sign precedes the time part in an interval, the negative sign distributes over the hours, minutes, and seconds. For example:
SELECT
EXTRACT(HOUR FROM i) AS hour,
EXTRACT(MINUTE FROM i) AS minute
FROM
UNNEST([INTERVAL '10 -12:30' DAY TO MINUTE]) AS i
/*------+--------*
| hour | minute |
+------+--------+
| -12 | -30 |
*------+--------*/
When a negative sign precedes the year and month part in an interval, the negative sign distributes over the years and months. For example:
SELECT
EXTRACT(YEAR FROM i) AS year,
EXTRACT(MONTH FROM i) AS month
FROM
UNNEST([INTERVAL '-22-6 10 -12:30' YEAR TO MINUTE]) AS i
/*------+--------*
| year | month |
+------+--------+
| -22 | -6 |
*------+--------*/
JUSTIFY_DAYS
JUSTIFY_DAYS(interval_expression)
Description
Normalizes the day part of the interval to the range from -29 to 29 by incrementing/decrementing the month or year part of the interval.
Return Data Type
INTERVAL
Example
SELECT
JUSTIFY_DAYS(INTERVAL 29 DAY) AS i1,
JUSTIFY_DAYS(INTERVAL -30 DAY) AS i2,
JUSTIFY_DAYS(INTERVAL 31 DAY) AS i3,
JUSTIFY_DAYS(INTERVAL -65 DAY) AS i4,
JUSTIFY_DAYS(INTERVAL 370 DAY) AS i5
/*--------------+--------------+-------------+---------------+--------------*
| i1 | i2 | i3 | i4 | i5 |
+--------------+--------------+-------------+---------------+--------------+
| 0-0 29 0:0:0 | -0-1 0 0:0:0 | 0-1 1 0:0:0 | -0-2 -5 0:0:0 | 1-0 10 0:0:0 |
*--------------+--------------+-------------+---------------+--------------*/
JUSTIFY_HOURS
JUSTIFY_HOURS(interval_expression)
Description
Normalizes the time part of the interval to the range from -23:59:59.999999 to 23:59:59.999999 by incrementing/decrementing the day part of the interval.
Return Data Type
INTERVAL
Example
SELECT
JUSTIFY_HOURS(INTERVAL 23 HOUR) AS i1,
JUSTIFY_HOURS(INTERVAL -24 HOUR) AS i2,
JUSTIFY_HOURS(INTERVAL 47 HOUR) AS i3,
JUSTIFY_HOURS(INTERVAL -12345 MINUTE) AS i4
/*--------------+--------------+--------------+-----------------*
| i1 | i2 | i3 | i4 |
+--------------+--------------+--------------+-----------------+
| 0-0 0 23:0:0 | 0-0 -1 0:0:0 | 0-0 1 23:0:0 | 0-0 -8 -13:45:0 |
*--------------+--------------+--------------+-----------------*/
JUSTIFY_INTERVAL
JUSTIFY_INTERVAL(interval_expression)
Description
Normalizes the days and time parts of the interval.
Return Data Type
INTERVAL
Example
SELECT JUSTIFY_INTERVAL(INTERVAL '29 49:00:00' DAY TO SECOND) AS i
/*-------------*
| i |
+-------------+
| 0-1 1 1:0:0 |
*-------------*/
MAKE_INTERVAL
MAKE_INTERVAL(
[ [ year => ] value ]
[, [ month => ] value ]
[, [ day => ] value ]
[, [ hour => ] value ]
[, [ minute => ] value ]
[, [ second => ] value ]
)
Description
Constructs an INTERVAL
object using INT64
values
representing the year, month, day, hour, minute, and second. All arguments are
optional, 0
by default, and can be named arguments.
Return Data Type
INTERVAL
Example
SELECT
MAKE_INTERVAL(1, 6, 15) AS i1,
MAKE_INTERVAL(hour => 10, second => 20) AS i2,
MAKE_INTERVAL(1, minute => 5, day => 2) AS i3
/*--------------+---------------+-------------*
| i1 | i2 | i3 |
+--------------+---------------+-------------+
| 1-6 15 0:0:0 | 0-0 0 10:0:20 | 1-0 2 0:5:0 |
*--------------+---------------+-------------*/
JSON functions
GoogleSQL for BigQuery supports the following functions, which can retrieve and transform JSON data.
Categories
The JSON functions are grouped into the following categories based on their behavior:
Category | Functions | Description |
Standard extractors |
JSON_QUERY JSON_VALUE JSON_QUERY_ARRAY JSON_VALUE_ARRAY |
Functions that extract JSON data. |
Legacy extractors |
JSON_EXTRACT JSON_EXTRACT_SCALAR JSON_EXTRACT_ARRAY JSON_EXTRACT_STRING_ARRAY |
Functions that extract JSON data. While these functions are supported by GoogleSQL, we recommend using the standard extractor functions. |
Lax converters |
LAX_BOOL LAX_FLOAT64 LAX_INT64 LAX_STRING |
Functions that flexibly convert a JSON value to a SQL value without returning errors. |
Converters |
BOOL FLOAT64 INT64 STRING |
Functions that convert a JSON value to a SQL value. |
Other converters |
PARSE_JSON TO_JSON TO_JSON_STRING |
Other conversion functions from or to JSON. |
Constructors |
JSON_ARRAY JSON_OBJECT |
Functions that create JSON. |
Mutators |
JSON_ARRAY_APPEND JSON_ARRAY_INSERT JSON_REMOVE JSON_SET JSON_STRIP_NULLS |
Functions that mutate existing JSON. |
Accessors |
JSON_KEYS JSON_TYPE |
Functions that provide access to JSON properties. |
Function list
Name | Summary |
---|---|
BOOL
|
Converts a JSON boolean to a SQL BOOL value.
|
FLOAT64
|
Converts a JSON number to a SQL
FLOAT64 value.
|
INT64
|
Converts a JSON number to a SQL INT64 value.
|
JSON_ARRAY
|
Creates a JSON array. |
JSON_ARRAY_APPEND
|
Appends JSON data to the end of a JSON array. |
JSON_ARRAY_INSERT
|
Inserts JSON data into a JSON array. |
JSON_EXTRACT
|
(Deprecated)
Extracts a JSON value and converts it to a SQL
JSON-formatted STRING
or
JSON
value.
|
JSON_EXTRACT_ARRAY
|
(Deprecated)
Extracts a JSON array and converts it to
a SQL ARRAY<JSON-formatted STRING>
or
ARRAY<JSON>
value.
|
JSON_EXTRACT_SCALAR
|
(Deprecated)
Extracts a JSON scalar value and converts it to a SQL
STRING value.
|
JSON_EXTRACT_STRING_ARRAY
|
(Deprecated)
Extracts a JSON array of scalar values and converts it to a SQL
ARRAY<STRING> value.
|
JSON_KEYS
|
Extracts unique JSON keys from a JSON expression. |
JSON_OBJECT
|
Creates a JSON object. |
JSON_QUERY
|
Extracts a JSON value and converts it to a SQL
JSON-formatted STRING
or
JSON
value.
|
JSON_QUERY_ARRAY
|
Extracts a JSON array and converts it to
a SQL ARRAY<JSON-formatted STRING>
or
ARRAY<JSON>
value.
|
JSON_REMOVE
|
Produces JSON with the specified JSON data removed. |
JSON_SET
|
Inserts or replaces JSON data. |
JSON_STRIP_NULLS
|
Removes JSON nulls from JSON objects and JSON arrays. |
JSON_TYPE
|
Gets the JSON type of the outermost JSON value and converts the name of
this type to a SQL STRING value.
|
JSON_VALUE
|
Extracts a JSON scalar value and converts it to a SQL
STRING value.
|
JSON_VALUE_ARRAY
|
Extracts a JSON array of scalar values and converts it to a SQL
ARRAY<STRING> value.
|
LAX_BOOL
|
Attempts to convert a JSON value to a SQL BOOL value.
|
LAX_FLOAT64
|
Attempts to convert a JSON value to a
SQL FLOAT64 value.
|
LAX_INT64
|
Attempts to convert a JSON value to a SQL INT64 value.
|
LAX_STRING
|
Attempts to convert a JSON value to a SQL STRING value.
|
PARSE_JSON
|
Converts a JSON-formatted STRING value to a
JSON value.
|
STRING
|
Converts a JSON string to a SQL STRING value.
|
TO_JSON
|
Converts a SQL value to a JSON value. |
TO_JSON_STRING
|
Converts a SQL value to a JSON-formatted STRING value.
|
BOOL
BOOL(json_expr)
Description
Converts a JSON boolean to a SQL BOOL
value.
Arguments:
json_expr
: JSON. For example:JSON 'true'
If the JSON value is not a boolean, an error is produced. If the expression is SQL
NULL
, the function returns SQLNULL
.
Return type
BOOL
Examples
SELECT BOOL(JSON 'true') AS vacancy;
/*---------*
| vacancy |
+---------+
| true |
*---------*/
SELECT BOOL(JSON_QUERY(JSON '{"hotel class": "5-star", "vacancy": true}', "$.vacancy")) AS vacancy;
/*---------*
| vacancy |
+---------+
| true |
*---------*/
The following examples show how invalid requests are handled:
-- An error is thrown if JSON is not of type bool.
SELECT BOOL(JSON '123') AS result; -- Throws an error
SELECT BOOL(JSON 'null') AS result; -- Throws an error
SELECT SAFE.BOOL(JSON '123') AS result; -- Returns a SQL NULL
FLOAT64
FLOAT64(
json_expr
[, wide_number_mode => { 'exact' | 'round' } ]
)
Description
Converts a JSON number to a SQL FLOAT64
value.
Arguments:
json_expr
: JSON. For example:JSON '9.8'
If the JSON value is not a number, an error is produced. If the expression is a SQL
NULL
, the function returns SQLNULL
.wide_number_mode
: A named argument with aSTRING
value. Defines what happens with a number that can't be represented as aFLOAT64
without loss of precision. This argument accepts one of the two case-sensitive values:exact
: The function fails if the result cannot be represented as aFLOAT64
without loss of precision.round
(default): The numeric value stored in JSON will be rounded toFLOAT64
. If such rounding is not possible, the function fails.
Return type
FLOAT64
Examples
SELECT FLOAT64(JSON '9.8') AS velocity;
/*----------*
| velocity |
+----------+
| 9.8 |
*----------*/
SELECT FLOAT64(JSON_QUERY(JSON '{"vo2_max": 39.1, "age": 18}', "$.vo2_max")) AS vo2_max;
/*---------*
| vo2_max |
+---------+
| 39.1 |
*---------*/
SELECT FLOAT64(JSON '18446744073709551615', wide_number_mode=>'round') as result;
/*------------------------*
| result |
+------------------------+
| 1.8446744073709552e+19 |
*------------------------*/
SELECT FLOAT64(JSON '18446744073709551615') as result;
/*------------------------*
| result |
+------------------------+
| 1.8446744073709552e+19 |
*------------------------*/
The following examples show how invalid requests are handled:
-- An error is thrown if JSON is not of type FLOAT64.
SELECT FLOAT64(JSON '"strawberry"') AS result;
SELECT FLOAT64(JSON 'null') AS result;
-- An error is thrown because `wide_number_mode` is case-sensitive and not "exact" or "round".
SELECT FLOAT64(JSON '123.4', wide_number_mode=>'EXACT') as result;
SELECT FLOAT64(JSON '123.4', wide_number_mode=>'exac') as result;
-- An error is thrown because the number cannot be converted to DOUBLE without loss of precision
SELECT FLOAT64(JSON '18446744073709551615', wide_number_mode=>'exact') as result;
-- Returns a SQL NULL
SELECT SAFE.FLOAT64(JSON '"strawberry"') AS result;
INT64
INT64(json_expr)
Description
Converts a JSON number to a SQL INT64
value.
Arguments:
json_expr
: JSON. For example:JSON '999'
If the JSON value is not a number, or the JSON number is not in the SQL
INT64
domain, an error is produced. If the expression is SQLNULL
, the function returns SQLNULL
.
Return type
INT64
Examples
SELECT INT64(JSON '2005') AS flight_number;
/*---------------*
| flight_number |
+---------------+
| 2005 |
*---------------*/
SELECT INT64(JSON_QUERY(JSON '{"gate": "A4", "flight_number": 2005}', "$.flight_number")) AS flight_number;
/*---------------*
| flight_number |
+---------------+
| 2005 |
*---------------*/
SELECT INT64(JSON '10.0') AS score;
/*-------*
| score |
+-------+
| 10 |
*-------*/
The following examples show how invalid requests are handled:
-- An error is thrown if JSON is not a number or cannot be converted to a 64-bit integer.
SELECT INT64(JSON '10.1') AS result; -- Throws an error
SELECT INT64(JSON '"strawberry"') AS result; -- Throws an error
SELECT INT64(JSON 'null') AS result; -- Throws an error
SELECT SAFE.INT64(JSON '"strawberry"') AS result; -- Returns a SQL NULL
JSON_ARRAY
JSON_ARRAY([value][, ...])
Description
Creates a JSON array from zero or more SQL values.
Arguments:
value
: A JSON encoding-supported value to add to a JSON array.
Return type
JSON
Examples
The following query creates a JSON array with one value in it:
SELECT JSON_ARRAY(10) AS json_data
/*-----------*
| json_data |
+-----------+
| [10] |
*-----------*/
You can create a JSON array with an empty JSON array in it. For example:
SELECT JSON_ARRAY([]) AS json_data
/*-----------*
| json_data |
+-----------+
| [[]] |
*-----------*/
SELECT JSON_ARRAY(10, 'foo', NULL) AS json_data
/*-----------------*
| json_data |
+-----------------+
| [10,"foo",null] |
*-----------------*/
SELECT JSON_ARRAY(STRUCT(10 AS a, 'foo' AS b)) AS json_data
/*----------------------*
| json_data |
+----------------------+
| [{"a":10,"b":"foo"}] |
*----------------------*/
SELECT JSON_ARRAY(10, ['foo', 'bar'], [20, 30]) AS json_data
/*----------------------------*
| json_data |
+----------------------------+
| [10,["foo","bar"],[20,30]] |
*----------------------------*/
SELECT JSON_ARRAY(10, [JSON '20', JSON '"foo"']) AS json_data
/*-----------------*
| json_data |
+-----------------+
| [10,[20,"foo"]] |
*-----------------*/
You can create an empty JSON array. For example:
SELECT JSON_ARRAY() AS json_data
/*-----------*
| json_data |
+-----------+
| [] |
*-----------*/
JSON_ARRAY_APPEND
JSON_ARRAY_APPEND(
json_expr,
json_path_value_pair[, ...]
[, append_each_element => { TRUE | FALSE } ]
)
json_path_value_pair:
json_path, value
Appends JSON data to the end of a JSON array.
Arguments:
json_expr
: JSON. For example:JSON '["a", "b", "c"]'
json_path_value_pair
: A value and the JSONPath for that value. This includes:json_path
: Appendvalue
at this JSONPath injson_expr
.value
: A JSON encoding-supported value to append.
append_each_element
: A named argument with aBOOL
value.If
TRUE
(default), andvalue
is a SQL array, appends each element individually.If
FALSE,
andvalue
is a SQL array, appends the array as one element.
Details:
- Path value pairs are evaluated left to right. The JSON produced by evaluating one pair becomes the JSON against which the next pair is evaluated.
- The operation is ignored if the path points to a JSON non-array value that is not a JSON null.
- If
json_path
points to a JSON null, the JSON null is replaced by a JSON array that containsvalue
. - If the path exists but has an incompatible type at any given path token, the path value pair operation is ignored.
- The function applies all path value pair append operations even if an individual path value pair operation is invalid. For invalid operations, the operation is ignored and the function continues to process the rest of the path value pairs.
- If any
json_path
is an invalid JSONPath, an error is produced. - If
json_expr
is SQLNULL
, the function returns SQLNULL
. - If
append_each_element
is SQLNULL
, the function returnsjson_expr
. - If
json_path
is SQLNULL
, thejson_path_value_pair
operation is ignored.
Return type
JSON
Examples
In the following example, path $
is matched and appends 1
.
SELECT JSON_ARRAY_APPEND(JSON '["a", "b", "c"]', '$', 1) AS json_data
/*-----------------*
| json_data |
+-----------------+
| ["a","b","c",1] |
*-----------------*/
In the following example, append_each_element
defaults to TRUE
, so
[1, 2]
is appended as individual elements.
SELECT JSON_ARRAY_APPEND(JSON '["a", "b", "c"]', '$', [1, 2]) AS json_data
/*-------------------*
| json_data |
+-------------------+
| ["a","b","c",1,2] |
*-------------------*/
In the following example, append_each_element
is FALSE
, so
[1, 2]
is appended as one element.
SELECT JSON_ARRAY_APPEND(
JSON '["a", "b", "c"]',
'$', [1, 2],
append_each_element=>FALSE) AS json_data
/*---------------------*
| json_data |
+---------------------+
| ["a","b","c",[1,2]] |
*---------------------*/
In the following example, append_each_element
is FALSE
, so
[1, 2]
and [3, 4]
are each appended as one element.
SELECT JSON_ARRAY_APPEND(
JSON '["a", ["b"], "c"]',
'$[1]', [1, 2],
'$[1][1]', [3, 4],
append_each_element=>FALSE) AS json_data
/*-----------------------------*
| json_data |
+-----------------------------+
| ["a",["b",[1,2,[3,4]]],"c"] |
*-----------------------------*/
In the following example, the first path $[1]
appends [1, 2]
as single
elements, and then the second path $[1][1]
is not a valid path to an array,
so the second operation is ignored.
SELECT JSON_ARRAY_APPEND(
JSON '["a", ["b"], "c"]',
'$[1]', [1, 2],
'$[1][1]', [3, 4]) AS json_data
/*---------------------*
| json_data |
+---------------------+
| ["a",["b",1,2],"c"] |
*---------------------*/
In the following example, path $.a
is matched and appends 2
.
SELECT JSON_ARRAY_APPEND(JSON '{"a": [1]}', '$.a', 2) AS json_data
/*-------------*
| json_data |
+-------------+
| {"a":[1,2]} |
*-------------*/
In the following example, a value is appended into a JSON null.
SELECT JSON_ARRAY_APPEND(JSON '{"a": null}', '$.a', 10)
/*------------*
| json_data |
+------------+
| {"a":[10]} |
*------------*/
In the following example, path $.a
is not an array, so the operation is
ignored.
SELECT JSON_ARRAY_APPEND(JSON '{"a": 1}', '$.a', 2) AS json_data
/*-----------*
| json_data |
+-----------+
| {"a":1} |
*-----------*/
In the following example, path $.b
does not exist, so the operation is
ignored.
SELECT JSON_ARRAY_APPEND(JSON '{"a": 1}', '$.b', 2) AS json_data
/*-----------*
| json_data |
+-----------+
| {"a":1} |
*-----------*/
JSON_ARRAY_INSERT
JSON_ARRAY_INSERT(
json_expr,
json_path_value_pair[, ...]
[, insert_each_element => { TRUE | FALSE } ]
)
json_path_value_pair:
json_path, value
Produces a new JSON value that is created by inserting JSON data into a JSON array.
Arguments:
json_expr
: JSON. For example:JSON '["a", "b", "c"]'
json_path_value_pair
: A value and the JSONPath for that value. This includes:json_path
: Insertvalue
at this JSONPath injson_expr
.value
: A JSON encoding-supported value to insert.
insert_each_element
: A named argument with aBOOL
value.If
TRUE
(default), andvalue
is a SQL array, inserts each element individually.If
FALSE,
andvalue
is a SQL array, inserts the array as one element.
Details:
- Path value pairs are evaluated left to right. The JSON produced by evaluating one pair becomes the JSON against which the next pair is evaluated.
- The operation is ignored if the path points to a JSON non-array value that is not a JSON null.
- If
json_path
points to a JSON null, the JSON null is replaced by a JSON array of the appropriate size and padded on the left with JSON nulls. - If the path exists but has an incompatible type at any given path token, the path value pair operator is ignored.
- The function applies all path value pair append operations even if an individual path value pair operation is invalid. For invalid operations, the operation is ignored and the function continues to process the rest of the path value pairs.
- If the array index in
json_path
is larger than the size of the array, the function extends the length of the array to the index, fills in the array with JSON nulls, then addsvalue
at the index. - If any
json_path
is an invalid JSONPath, an error is produced. - If
json_expr
is SQLNULL
, the function returns SQLNULL
. - If
insert_each_element
is SQLNULL
, the function returnsjson_expr
. - If
json_path
is SQLNULL
, thejson_path_value_pair
operation is ignored.
Return type
JSON
Examples
In the following example, path $[1]
is matched and inserts 1
.
SELECT JSON_ARRAY_INSERT(JSON '["a", ["b", "c"], "d"]', '$[1]', 1) AS json_data
/*-----------------------*
| json_data |
+-----------------------+
| ["a",1,["b","c"],"d"] |
*-----------------------*/
In the following example, path $[1][0]
is matched and inserts 1
.
SELECT JSON_ARRAY_INSERT(JSON '["a", ["b", "c"], "d"]', '$[1][0]', 1) AS json_data
/*-----------------------*
| json_data |
+-----------------------+
| ["a",[1,"b","c"],"d"] |
*-----------------------*/
In the following example, insert_each_element
defaults to TRUE
, so
[1, 2]
is inserted as individual elements.
SELECT JSON_ARRAY_INSERT(JSON '["a", "b", "c"]', '$[1]', [1, 2]) AS json_data
/*-------------------*
| json_data |
+-------------------+
| ["a",1,2,"b","c"] |
*-------------------*/
In the following example, insert_each_element
is FALSE
, so [1, 2]
is
inserted as one element.
SELECT JSON_ARRAY_INSERT(
JSON '["a", "b", "c"]',
'$[1]', [1, 2],
insert_each_element=>FALSE) AS json_data
/*---------------------*
| json_data |
+---------------------+
| ["a",[1,2],"b","c"] |
*---------------------*/
In the following example, path $[7]
is larger than the length of the
matched array, so the array is extended with JSON nulls and "e"
is inserted at
the end of the array.
SELECT JSON_ARRAY_INSERT(JSON '["a", "b", "c", "d"]', '$[7]', "e") AS json_data
/*--------------------------------------*
| json_data |
+--------------------------------------+
| ["a","b","c","d",null,null,null,"e"] |
*--------------------------------------*/
In the following example, path $.a
is an object, so the operation is ignored.
SELECT JSON_ARRAY_INSERT(JSON '{"a": {}}', '$.a[0]', 2) AS json_data
/*-----------*
| json_data |
+-----------+
| {"a":{}} |
*-----------*/
In the following example, path $
does not specify a valid array position,
so the operation is ignored.
SELECT JSON_ARRAY_INSERT(JSON '[1, 2]', '$', 3) AS json_data
/*-----------*
| json_data |
+-----------+
| [1,2] |
*-----------*/
In the following example, a value is inserted into a JSON null.
SELECT JSON_ARRAY_INSERT(JSON '{"a": null}', '$.a[2]', 10) AS json_data
/*----------------------*
| json_data |
+----------------------+
| {"a":[null,null,10]} |
*----------------------*/
In the following example, the operation is ignored because you can't insert data into a JSON number.
SELECT JSON_ARRAY_INSERT(JSON '1', '$[0]', 'r1') AS json_data
/*-----------*
| json_data |
+-----------+
| 1 |
*-----------*/
JSON_EXTRACT
JSON_EXTRACT(json_string_expr, json_path)
JSON_EXTRACT(json_expr, json_path)
Description
Extracts a JSON value and converts it to a
SQL JSON-formatted STRING
or JSON
value.
This function uses single quotes and brackets to escape invalid
JSONPath characters in JSON keys. For example: ['a.b']
.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'{"class": {"students": [{"name": "Jane"}]}}'
Extracts a SQL
NULL
when a JSON-formatted stringnull
is encountered. For example:SELECT JSON_EXTRACT("null", "$") -- Returns a SQL NULL
json_expr
: JSON. For example:JSON '{"class": {"students": [{"name": "Jane"}]}}'
Extracts a JSON
null
when a JSONnull
is encountered.SELECT JSON_EXTRACT(JSON 'null', "$") -- Returns a JSON 'null'
json_path
: The JSONPath. This identifies the data that you want to obtain from the input.
There are differences between the JSON-formatted string and JSON input types. For details, see Differences between the JSON and JSON-formatted STRING types.
Return type
json_string_expr
: A JSON-formattedSTRING
json_expr
:JSON
Examples
In the following example, JSON data is extracted and returned as JSON.
SELECT
JSON_EXTRACT(JSON '{"class": {"students": [{"id": 5}, {"id": 12}]}}', '$.class')
AS json_data;
/*-----------------------------------*
| json_data |
+-----------------------------------+
| {"students":[{"id":5},{"id":12}]} |
*-----------------------------------*/
In the following examples, JSON data is extracted and returned as JSON-formatted strings.
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "Jane"}]}}',
'$') AS json_text_string;
/*-----------------------------------------------------------*
| json_text_string |
+-----------------------------------------------------------+
| {"class":{"students":[{"name":"Jane"}]}} |
*-----------------------------------------------------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": []}}',
'$') AS json_text_string;
/*-----------------------------------------------------------*
| json_text_string |
+-----------------------------------------------------------+
| {"class":{"students":[]}} |
*-----------------------------------------------------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}',
'$') AS json_text_string;
/*-----------------------------------------------------------*
| json_text_string |
+-----------------------------------------------------------+
| {"class":{"students":[{"name":"John"},{"name":"Jamie"}]}} |
*-----------------------------------------------------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "Jane"}]}}',
'$.class.students[0]') AS first_student;
/*-----------------*
| first_student |
+-----------------+
| {"name":"Jane"} |
*-----------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": []}}',
'$.class.students[0]') AS first_student;
/*-----------------*
| first_student |
+-----------------+
| NULL |
*-----------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}',
'$.class.students[0]') AS first_student;
/*-----------------*
| first_student |
+-----------------+
| {"name":"John"} |
*-----------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "Jane"}]}}',
'$.class.students[1].name') AS second_student;
/*----------------*
| second_student |
+----------------+
| NULL |
*----------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": []}}',
'$.class.students[1].name') AS second_student;
/*----------------*
| second_student |
+----------------+
| NULL |
*----------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "John"}, {"name": null}]}}',
'$.class.students[1].name') AS second_student;
/*----------------*
| second_student |
+----------------+
| NULL |
*----------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}',
'$.class.students[1].name') AS second_student;
/*----------------*
| second_student |
+----------------+
| "Jamie" |
*----------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "Jane"}]}}',
"$.class['students']") AS student_names;
/*------------------------------------*
| student_names |
+------------------------------------+
| [{"name":"Jane"}] |
*------------------------------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": []}}',
"$.class['students']") AS student_names;
/*------------------------------------*
| student_names |
+------------------------------------+
| [] |
*------------------------------------*/
SELECT JSON_EXTRACT(
'{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}',
"$.class['students']") AS student_names;
/*------------------------------------*
| student_names |
+------------------------------------+
| [{"name":"John"},{"name":"Jamie"}] |
*------------------------------------*/
SELECT JSON_EXTRACT('{"a": null}', "$.a"); -- Returns a SQL NULL
SELECT JSON_EXTRACT('{"a": null}', "$.b"); -- Returns a SQL NULL
SELECT JSON_EXTRACT(JSON '{"a": null}', "$.a"); -- Returns a JSON 'null'
SELECT JSON_EXTRACT(JSON '{"a": null}', "$.b"); -- Returns a SQL NULL
JSON_EXTRACT_ARRAY
JSON_EXTRACT_ARRAY(json_string_expr[, json_path])
JSON_EXTRACT_ARRAY(json_expr[, json_path])
Description
Extracts a JSON array and converts it to
a SQL ARRAY<JSON-formatted STRING>
or
ARRAY<JSON>
value.
This function uses single quotes and brackets to escape invalid
JSONPath characters in JSON keys. For example: ['a.b']
.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'["a", "b", {"key": "c"}]'
json_expr
: JSON. For example:JSON '["a", "b", {"key": "c"}]'
json_path
: The JSONPath. This identifies the data that you want to obtain from the input. If this optional parameter is not provided, then the JSONPath$
symbol is applied, which means that all of the data is analyzed.
There are differences between the JSON-formatted string and JSON input types. For details, see Differences between the JSON and JSON-formatted STRING types.
Return type
json_string_expr
:ARRAY<JSON-formatted STRING>
json_expr
:ARRAY<JSON>
Examples
This extracts items in JSON to an array of JSON
values:
SELECT JSON_EXTRACT_ARRAY(
JSON '{"fruits":["apples","oranges","grapes"]}','$.fruits'
) AS json_array;
/*---------------------------------*
| json_array |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
*---------------------------------*/
This extracts the items in a JSON-formatted string to a string array:
SELECT JSON_EXTRACT_ARRAY('[1,2,3]') AS string_array;
/*--------------*
| string_array |
+--------------+
| [1, 2, 3] |
*--------------*/
This extracts a string array and converts it to an integer array:
SELECT ARRAY(
SELECT CAST(integer_element AS INT64)
FROM UNNEST(
JSON_EXTRACT_ARRAY('[1,2,3]','$')
) AS integer_element
) AS integer_array;
/*---------------*
| integer_array |
+---------------+
| [1, 2, 3] |
*---------------*/
This extracts string values in a JSON-formatted string to an array:
-- Doesn't strip the double quotes
SELECT JSON_EXTRACT_ARRAY('["apples", "oranges", "grapes"]', '$') AS string_array;
/*---------------------------------*
| string_array |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
*---------------------------------*/
-- Strips the double quotes
SELECT ARRAY(
SELECT JSON_EXTRACT_SCALAR(string_element, '$')
FROM UNNEST(JSON_EXTRACT_ARRAY('["apples","oranges","grapes"]','$')) AS string_element
) AS string_array;
/*---------------------------*
| string_array |
+---------------------------+
| [apples, oranges, grapes] |
*---------------------------*/
This extracts only the items in the fruit
property to an array:
SELECT JSON_EXTRACT_ARRAY(
'{"fruit": [{"apples": 5, "oranges": 10}, {"apples": 2, "oranges": 4}], "vegetables": [{"lettuce": 7, "kale": 8}]}',
'$.fruit'
) AS string_array;
/*-------------------------------------------------------*
| string_array |
+-------------------------------------------------------+
| [{"apples":5,"oranges":10}, {"apples":2,"oranges":4}] |
*-------------------------------------------------------*/
These are equivalent:
SELECT JSON_EXTRACT_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$[fruits]') AS string_array;
SELECT JSON_EXTRACT_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits') AS string_array;
-- The queries above produce the following result:
/*---------------------------------*
| string_array |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
*---------------------------------*/
In cases where a JSON key uses invalid JSONPath characters, you can escape those
characters using single quotes and brackets, [' ']
. For example:
SELECT JSON_EXTRACT_ARRAY('{"a.b": {"c": ["world"]}}', "$['a.b'].c") AS hello;
/*-----------*
| hello |
+-----------+
| ["world"] |
*-----------*/
The following examples explore how invalid requests and empty arrays are handled:
- If a JSONPath is invalid, an error is thrown.
- If a JSON-formatted string is invalid, the output is NULL.
- It is okay to have empty arrays in the JSON-formatted string.
-- An error is thrown if you provide an invalid JSONPath.
SELECT JSON_EXTRACT_ARRAY('["foo", "bar", "baz"]', 'INVALID_JSONPath') AS result;
-- If the JSONPath does not refer to an array, then NULL is returned.
SELECT JSON_EXTRACT_ARRAY('{"a": "foo"}', '$.a') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a key that does not exist is specified, then the result is NULL.
SELECT JSON_EXTRACT_ARRAY('{"a": "foo"}', '$.b') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- Empty arrays in JSON-formatted strings are supported.
SELECT JSON_EXTRACT_ARRAY('{"a": "foo", "b": []}', '$.b') AS result;
/*--------*
| result |
+--------+
| [] |
*--------*/
JSON_EXTRACT_SCALAR
JSON_EXTRACT_SCALAR(json_string_expr[, json_path])
JSON_EXTRACT_SCALAR(json_expr[, json_path])
Description
Extracts a JSON scalar value and converts it to a SQL STRING
value.
In addition, this function:
- Removes the outermost quotes and unescapes the return values.
- Returns a SQL
NULL
if a non-scalar value is selected. - Uses single quotes and brackets to escape invalid JSONPath
characters in JSON keys. For example:
['a.b']
.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'{"name": "Jane", "age": "6"}'
json_expr
: JSON. For example:JSON '{"name": "Jane", "age": "6"}'
json_path
: The JSONPath. This identifies the data that you want to obtain from the input. If this optional parameter is not provided, then the JSONPath$
symbol is applied, which means that all of the data is analyzed.If
json_path
returns a JSONnull
or a non-scalar value (in other words, ifjson_path
refers to an object or an array), then a SQLNULL
is returned.
There are differences between the JSON-formatted string and JSON input types. For details, see Differences between the JSON and JSON-formatted STRING types.
Return type
STRING
Examples
In the following example, age
is extracted.
SELECT JSON_EXTRACT_SCALAR(JSON '{"name": "Jakob", "age": "6" }', '$.age') AS scalar_age;
/*------------*
| scalar_age |
+------------+
| 6 |
*------------*/
The following example compares how results are returned for the JSON_EXTRACT
and JSON_EXTRACT_SCALAR
functions.
SELECT JSON_EXTRACT('{"name": "Jakob", "age": "6" }', '$.name') AS json_name,
JSON_EXTRACT_SCALAR('{"name": "Jakob", "age": "6" }', '$.name') AS scalar_name,
JSON_EXTRACT('{"name": "Jakob", "age": "6" }', '$.age') AS json_age,
JSON_EXTRACT_SCALAR('{"name": "Jakob", "age": "6" }', '$.age') AS scalar_age;
/*-----------+-------------+----------+------------*
| json_name | scalar_name | json_age | scalar_age |
+-----------+-------------+----------+------------+
| "Jakob" | Jakob | "6" | 6 |
*-----------+-------------+----------+------------*/
SELECT JSON_EXTRACT('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_extract,
JSON_EXTRACT_SCALAR('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_extract_scalar;
/*--------------------+---------------------*
| json_extract | json_extract_scalar |
+--------------------+---------------------+
| ["apple","banana"] | NULL |
*--------------------+---------------------*/
In cases where a JSON key uses invalid JSONPath characters, you can escape those
characters using single quotes and brackets, [' ']
. For example:
SELECT JSON_EXTRACT_SCALAR('{"a.b": {"c": "world"}}', "$['a.b'].c") AS hello;
/*-------*
| hello |
+-------+
| world |
*-------*/
JSON_EXTRACT_STRING_ARRAY
JSON_EXTRACT_STRING_ARRAY(json_string_expr[, json_path])
JSON_EXTRACT_STRING_ARRAY(json_expr[, json_path])
Description
Extracts a JSON array of scalar values and converts it to a SQL ARRAY<STRING>
value. In addition, this function:
- Removes the outermost quotes and unescapes the values.
- Returns a SQL
NULL
if the selected value is not an array or not an array containing only scalar values. - Uses single quotes and brackets to escape invalid JSONPath
characters in JSON keys. For example:
['a.b']
.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'["apples", "oranges", "grapes"]'
json_expr
: JSON. For example:JSON '["apples", "oranges", "grapes"]'
json_path
: The JSONPath. This identifies the data that you want to obtain from the input. If this optional parameter is not provided, then the JSONPath$
symbol is applied, which means that all of the data is analyzed.
There are differences between the JSON-formatted string and JSON input types. For details, see Differences between the JSON and JSON-formatted STRING types.
Caveats:
- A JSON
null
in the input array produces a SQLNULL
as the output for that JSONnull
. If the output contains aNULL
array element, an error is produced because the final output cannot be an array withNULL
values. - If a JSONPath matches an array that contains scalar objects and a JSON
null
, then the output of the function must be transformed because the final output cannot be an array withNULL
values.
Return type
ARRAY<STRING>
Examples
This extracts items in JSON to a string array:
SELECT JSON_EXTRACT_STRING_ARRAY(
JSON '{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits'
) AS string_array;
/*---------------------------*
| string_array |
+---------------------------+
| [apples, oranges, grapes] |
*---------------------------*/
The following example compares how results are returned for the
JSON_EXTRACT_ARRAY
and JSON_EXTRACT_STRING_ARRAY
functions.
SELECT JSON_EXTRACT_ARRAY('["apples", "oranges"]') AS json_array,
JSON_EXTRACT_STRING_ARRAY('["apples", "oranges"]') AS string_array;
/*-----------------------+-------------------*
| json_array | string_array |
+-----------------------+-------------------+
| ["apples", "oranges"] | [apples, oranges] |
*-----------------------+-------------------*/
This extracts the items in a JSON-formatted string to a string array:
-- Strips the double quotes
SELECT JSON_EXTRACT_STRING_ARRAY('["foo", "bar", "baz"]', '$') AS string_array;
/*-----------------*
| string_array |
+-----------------+
| [foo, bar, baz] |
*-----------------*/
This extracts a string array and converts it to an integer array:
SELECT ARRAY(
SELECT CAST(integer_element AS INT64)
FROM UNNEST(
JSON_EXTRACT_STRING_ARRAY('[1, 2, 3]', '$')
) AS integer_element
) AS integer_array;
/*---------------*
| integer_array |
+---------------+
| [1, 2, 3] |
*---------------*/
These are equivalent:
SELECT JSON_EXTRACT_STRING_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$[fruits]') AS string_array;
SELECT JSON_EXTRACT_STRING_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits') AS string_array;
-- The queries above produce the following result:
/*---------------------------*
| string_array |
+---------------------------+
| [apples, oranges, grapes] |
*---------------------------*/
In cases where a JSON key uses invalid JSONPath characters, you can escape those
characters using single quotes and brackets: [' ']
. For example:
SELECT JSON_EXTRACT_STRING_ARRAY('{"a.b": {"c": ["world"]}}', "$['a.b'].c") AS hello;
/*---------*
| hello |
+---------+
| [world] |
*---------*/
The following examples explore how invalid requests and empty arrays are handled:
-- An error is thrown if you provide an invalid JSONPath.
SELECT JSON_EXTRACT_STRING_ARRAY('["foo", "bar", "baz"]', 'INVALID_JSONPath') AS result;
-- If the JSON formatted string is invalid, then NULL is returned.
SELECT JSON_EXTRACT_STRING_ARRAY('}}', '$') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If the JSON document is NULL, then NULL is returned.
SELECT JSON_EXTRACT_STRING_ARRAY(NULL, '$') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath does not match anything, then the output is NULL.
SELECT JSON_EXTRACT_STRING_ARRAY('{"a": ["foo", "bar", "baz"]}', '$.b') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath matches an object that is not an array, then the output is NULL.
SELECT JSON_EXTRACT_STRING_ARRAY('{"a": "foo"}', '$') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath matches an array of non-scalar objects, then the output is NULL.
SELECT JSON_EXTRACT_STRING_ARRAY('{"a": [{"b": "foo", "c": 1}, {"b": "bar", "c":2}], "d": "baz"}', '$.a') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath matches an array of mixed scalar and non-scalar objects, then the output is NULL.
SELECT JSON_EXTRACT_STRING_ARRAY('{"a": [10, {"b": 20}]', '$.a') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath matches an empty JSON array, then the output is an empty array instead of NULL.
SELECT JSON_EXTRACT_STRING_ARRAY('{"a": "foo", "b": []}', '$.b') AS result;
/*--------*
| result |
+--------+
| [] |
*--------*/
-- The following query produces and error because the final output cannot be an
-- array with NULLs.
SELECT JSON_EXTRACT_STRING_ARRAY('["world", 1, null]') AS result;
JSON_KEYS
JSON_KEYS(
json_expr
[, max_depth ]
[, mode => { 'strict' | 'lax' | 'lax recursive' } ]
)
Description
Extracts unique JSON keys from a JSON expression.
Arguments:
json_expr
:JSON
. For example:JSON '{"class": {"students": [{"name": "Jane"}]}}'
max_depth
: AnINT64
value that represents the maximum depth of nested fields to search injson_expr
.mode
: A named argument with aSTRING
value that can be one of the following:strict
(default): Ignore any key that appears in an array.lax
: Also include keys contained in non-consecutively nested arrays.lax recursive
: Return all keys.
Details:
- Keys are de-duplicated and returned in alphabetical order.
- Keys do not include array indices.
- Keys containing special characters are escaped using double quotes.
- Keys are case sensitive and not normalized.
- If
json_expr
ormode
is SQLNULL
, the function returns SQLNULL
. - If
max_depth
is SQLNULL
, the function ignores the argument. - If
max_depth
is less than or equal to 0, then an error is returned.
Return type
ARRAY<STRING>
Examples
In the following example, there are no arrays, so all keys are returned.
SELECT JSON_KEYS(JSON '{"a": {"b":1}}') AS json_keys
/*-----------*
| json_keys |
*-----------*
| [a, a.b] |
*-----------*/
In the following example, max_depth
is set to 1 so "a.b" is not included.
SELECT JSON_KEYS(JSON '{"a": {"b":1}}', 1) AS json_keys
/*-----------*
| json_keys |
*-----------*
| [a] |
*-----------*/
In the following example, json_expr
contains an array. Because the mode is
strict
, keys inside the array are excluded.
SELECT JSON_KEYS(JSON '{"a":[{"b":1}, {"c":2}], "d":3}') AS json_keys
/*-----------*
| json_keys |
*-----------*
| [a, d] |
*-----------*/
In the following example, json_expr
contains an array. Because the mode is
lax
, keys inside the array are included.
SELECT JSON_KEYS(
JSON '{"a":[{"b":1}, {"c":2}], "d":3}',
mode => "lax") as json_keys
/*------------------*
| json_keys |
*------------------*
| [a, a.b, a.c, d] |
*------------------*/
In the following example, json_expr
contains consecutively nested arrays.
Because the mode is lax
, keys inside the consecutively nested arrays are not
included.
SELECT JSON_KEYS(JSON '{"a":[[{"b":1}]]}', mode => "lax") as json_keys
/*-----------*
| json_keys |
*-----------*
| [a] |
*-----------*/
In the following example, json_expr
contains consecutively nested arrays.
Because the mode is lax recursive
, every key is returned.
SELECT JSON_KEYS(JSON '{"a":[[{"b":1}]]}', mode => "lax recursive") as json_keys
/*-----------*
| json_keys |
*-----------*
| [a, a.b] |
*-----------*/
In the following example, json_expr
contains multiple arrays. Because the
arrays are not consecutively nested and the mode is lax
, keys inside the
arrays are included.
SELECT JSON_KEYS(JSON '{"a":[{"b":[{"c":1}]}]}', mode => "lax") as json_keys
/*-----------------*
| json_keys |
*-----------------*
| [a, a.b, a.b.c] |
*-----------------*/
In the following example, json_expr
contains both consecutively nested and
single arrays. Because the mode is lax
, keys inside the consecutively nested
arrays are excluded.
SELECT JSON_KEYS(JSON '{"a":[{"b":[[{"c":1}]]}]}', mode => "lax") as json_keys
/*-----------*
| json_keys |
*-----------*
| [a, a.b] |
*-----------*/
In the following example, json_expr
contains both consecutively nested and
single arrays. Because the mode is lax recursive
, all keys are included.
SELECT JSON_KEYS(
JSON '{"a":[{"b":[[{"c":1}]]}]}', mode => "lax recursive") as json_keys
/*-----------------*
| json_keys |
*-----------------*
| [a, a.b, a.b.c] |
*-----------------*/
JSON_OBJECT
- Signature 1:
JSON_OBJECT([json_key, json_value][, ...])
- Signature 2:
JSON_OBJECT(json_key_array, json_value_array)
Signature 1
JSON_OBJECT([json_key, json_value][, ...])
Description
Creates a JSON object, using key-value pairs.
Arguments:
json_key
: ASTRING
value that represents a key.json_value
: A JSON encoding-supported value.
Details:
- If two keys are passed in with the same name, only the first key-value pair is preserved.
- The order of key-value pairs is not preserved.
- If
json_key
isNULL
, an error is produced.
Return type
JSON
Examples
You can create an empty JSON object by passing in no JSON keys and values. For example:
SELECT JSON_OBJECT() AS json_data
/*-----------*
| json_data |
+-----------+
| {} |
*-----------*/
You can create a JSON object by passing in key-value pairs. For example:
SELECT JSON_OBJECT('foo', 10, 'bar', TRUE) AS json_data
/*-----------------------*
| json_data |
+-----------------------+
| {"bar":true,"foo":10} |
*-----------------------*/
SELECT JSON_OBJECT('foo', 10, 'bar', ['a', 'b']) AS json_data
/*----------------------------*
| json_data |
+----------------------------+
| {"bar":["a","b"],"foo":10} |
*----------------------------*/
SELECT JSON_OBJECT('a', NULL, 'b', JSON 'null') AS json_data
/*---------------------*
| json_data |
+---------------------+
| {"a":null,"b":null} |
*---------------------*/
SELECT JSON_OBJECT('a', 10, 'a', 'foo') AS json_data
/*-----------*
| json_data |
+-----------+
| {"a":10} |
*-----------*/
WITH Items AS (SELECT 'hello' AS key, 'world' AS value)
SELECT JSON_OBJECT(key, value) AS json_data FROM Items
/*-------------------*
| json_data |
+-------------------+
| {"hello":"world"} |
*-------------------*/
An error is produced if a SQL NULL
is passed in for a JSON key.
-- Error: A key cannot be NULL.
SELECT JSON_OBJECT(NULL, 1) AS json_data
An error is produced if the number of JSON keys and JSON values don't match:
-- Error: No matching signature for function JSON_OBJECT for argument types:
-- STRING, INT64, STRING
SELECT JSON_OBJECT('a', 1, 'b') AS json_data
Signature 2
JSON_OBJECT(json_key_array, json_value_array)
Creates a JSON object, using an array of keys and values.
Arguments:
json_key_array
: An array of zero or moreSTRING
keys.json_value_array
: An array of zero or more JSON encoding-supported values.
Details:
- If two keys are passed in with the same name, only the first key-value pair is preserved.
- The order of key-value pairs is not preserved.
- The number of keys must match the number of values, otherwise an error is produced.
- If any argument is
NULL
, an error is produced. - If a key in
json_key_array
isNULL
, an error is produced.
Return type
JSON
Examples
You can create an empty JSON object by passing in an empty array of keys and values. For example:
SELECT JSON_OBJECT(CAST([] AS ARRAY<STRING>), []) AS json_data
/*-----------*
| json_data |
+-----------+
| {} |
*-----------*/
You can create a JSON object by passing in an array of keys and an array of values. For example:
SELECT JSON_OBJECT(['a', 'b'], [10, NULL]) AS json_data
/*-------------------*
| json_data |
+-------------------+
| {"a":10,"b":null} |
*-------------------*/
SELECT JSON_OBJECT(['a', 'b'], [JSON '10', JSON '"foo"']) AS json_data
/*--------------------*
| json_data |
+--------------------+
| {"a":10,"b":"foo"} |
*--------------------*/
SELECT
JSON_OBJECT(
['a', 'b'],
[STRUCT(10 AS id, 'Red' AS color), STRUCT(20 AS id, 'Blue' AS color)])
AS json_data
/*------------------------------------------------------------*
| json_data |
+------------------------------------------------------------+
| {"a":{"color":"Red","id":10},"b":{"color":"Blue","id":20}} |
*------------------------------------------------------------*/
SELECT
JSON_OBJECT(
['a', 'b'],
[TO_JSON(10), TO_JSON(['foo', 'bar'])])
AS json_data
/*----------------------------*
| json_data |
+----------------------------+
| {"a":10,"b":["foo","bar"]} |
*----------------------------*/
The following query groups by id
and then creates an array of keys and
values from the rows with the same id
:
WITH
Fruits AS (
SELECT 0 AS id, 'color' AS json_key, 'red' AS json_value UNION ALL
SELECT 0, 'fruit', 'apple' UNION ALL
SELECT 1, 'fruit', 'banana' UNION ALL
SELECT 1, 'ripe', 'true'
)
SELECT JSON_OBJECT(ARRAY_AGG(json_key), ARRAY_AGG(json_value)) AS json_data
FROM Fruits
GROUP BY id
/*----------------------------------*
| json_data |
+----------------------------------+
| {"color":"red","fruit":"apple"} |
| {"fruit":"banana","ripe":"true"} |
*----------------------------------*/
An error is produced if the size of the JSON keys and values arrays don't match:
-- Error: The number of keys and values must match.
SELECT JSON_OBJECT(['a', 'b'], [10]) AS json_data
An error is produced if the array of JSON keys or JSON values is a SQL NULL
.
-- Error: The keys array cannot be NULL.
SELECT JSON_OBJECT(CAST(NULL AS ARRAY<STRING>), [10, 20]) AS json_data
-- Error: The values array cannot be NULL.
SELECT JSON_OBJECT(['a', 'b'], CAST(NULL AS ARRAY<INT64>)) AS json_data
JSON_QUERY
JSON_QUERY(json_string_expr, json_path)
JSON_QUERY(json_expr, json_path)
Description
Extracts a JSON value and converts it to a SQL
JSON-formatted STRING
or
JSON
value.
This function uses double quotes to escape invalid
JSONPath characters in JSON keys. For example: "a.b"
.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'{"class": {"students": [{"name": "Jane"}]}}'
Extracts a SQL
NULL
when a JSON-formatted stringnull
is encountered. For example:SELECT JSON_QUERY("null", "$") -- Returns a SQL NULL
json_expr
: JSON. For example:JSON '{"class": {"students": [{"name": "Jane"}]}}'
Extracts a JSON
null
when a JSONnull
is encountered.SELECT JSON_QUERY(JSON 'null', "$") -- Returns a JSON 'null'
json_path
: The JSONPath. This identifies the data that you want to obtain from the input. This function lets you specify a mode for the JSONPath.
There are differences between the JSON-formatted string and JSON input types. For details, see Differences between the JSON and JSON-formatted STRING types.
Return type
json_string_expr
: A JSON-formattedSTRING
json_expr
:JSON
Examples
In the following example, JSON data is extracted and returned as JSON.
SELECT
JSON_QUERY(
JSON '{"class": {"students": [{"id": 5}, {"id": 12}]}}',
'$.class') AS json_data;
/*-----------------------------------*
| json_data |
+-----------------------------------+
| {"students":[{"id":5},{"id":12}]} |
*-----------------------------------*/
In the following examples, JSON data is extracted and returned as JSON-formatted strings.
SELECT
JSON_QUERY('{"class": {"students": [{"name": "Jane"}]}}', '$') AS json_text_string;
/*-----------------------------------------------------------*
| json_text_string |
+-----------------------------------------------------------+
| {"class":{"students":[{"name":"Jane"}]}} |
*-----------------------------------------------------------*/
SELECT JSON_QUERY('{"class": {"students": []}}', '$') AS json_text_string;
/*-----------------------------------------------------------*
| json_text_string |
+-----------------------------------------------------------+
| {"class":{"students":[]}} |
*-----------------------------------------------------------*/
SELECT
JSON_QUERY(
'{"class": {"students": [{"name": "John"},{"name": "Jamie"}]}}',
'$') AS json_text_string;
/*-----------------------------------------------------------*
| json_text_string |
+-----------------------------------------------------------+
| {"class":{"students":[{"name":"John"},{"name":"Jamie"}]}} |
*-----------------------------------------------------------*/
SELECT
JSON_QUERY(
'{"class": {"students": [{"name": "Jane"}]}}',
'$.class.students[0]') AS first_student;
/*-----------------*
| first_student |
+-----------------+
| {"name":"Jane"} |
*-----------------*/
SELECT
JSON_QUERY('{"class": {"students": []}}', '$.class.students[0]') AS first_student;
/*-----------------*
| first_student |
+-----------------+
| NULL |
*-----------------*/
SELECT
JSON_QUERY(
'{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}',
'$.class.students[0]') AS first_student;
/*-----------------*
| first_student |
+-----------------+
| {"name":"John"} |
*-----------------*/
SELECT
JSON_QUERY(
'{"class": {"students": [{"name": "Jane"}]}}',
'$.class.students[1].name') AS second_student;
/*----------------*
| second_student |
+----------------+
| NULL |
*----------------*/
SELECT
JSON_QUERY(
'{"class": {"students": []}}',
'$.class.students[1].name') AS second_student;
/*----------------*
| second_student |
+----------------+
| NULL |
*----------------*/
SELECT
JSON_QUERY(
'{"class": {"students": [{"name": "John"}, {"name": null}]}}',
'$.class.students[1].name') AS second_student;
/*----------------*
| second_student |
+----------------+
| NULL |
*----------------*/
SELECT
JSON_QUERY(
'{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}',
'$.class.students[1].name') AS second_student;
/*----------------*
| second_student |
+----------------+
| "Jamie" |
*----------------*/
SELECT
JSON_QUERY(
'{"class": {"students": [{"name": "Jane"}]}}',
'$.class."students"') AS student_names;
/*------------------------------------*
| student_names |
+------------------------------------+
| [{"name":"Jane"}] |
*------------------------------------*/
SELECT
JSON_QUERY(
'{"class": {"students": []}}',
'$.class."students"') AS student_names;
/*------------------------------------*
| student_names |
+------------------------------------+
| [] |
*------------------------------------*/
SELECT
JSON_QUERY(
'{"class": {"students": [{"name": "John"}, {"name": "Jamie"}]}}',
'$.class."students"') AS student_names;
/*------------------------------------*
| student_names |
+------------------------------------+
| [{"name":"John"},{"name":"Jamie"}] |
*------------------------------------*/
In the following examples, the JSON data is extracted in lax mode.
Because the keyword lax
is included in the JSONPath
, JSON arrays are
automatically unwrapped.
SELECT
JSON_QUERY(
JSON '{"class": {"students": [{"name": "Jane"}]}}',
'lax $.class.students.name') AS student_names_lax;
/*-------------------*
| student_names_lax |
*-------------------*
| ["Jane"] |
*-------------------*/
SELECT
JSON_QUERY(
JSON '[{"class": {"students": [{"name": "Joe"}, {"name": "Jamie"}]}}]',
'lax $.class.students.name') AS student_names_lax;
/*-------------------*
| student_names_lax |
*-------------------*
| ["Joe","Jamie"] |
*-------------------*/
SELECT
JSON_QUERY(
JSON '{"class": {"students": [[{"name": "John"}], {"name": "Jamie"}]}}',
'lax $.class.students.name') AS student_names_lax;
/*-------------------*
| student_names_lax |
*-------------------*
| ["Jamie"] |
*-------------------*/
In the following examples, the JSON data is extracted in lax recursive mode.
Because the keyword lax recursive
is included in the JSONPath
, JSON arrays
are unwrapped until a non-array type is found.
SELECT
JSON_QUERY(
JSON '{"class": {"students": [{"name": "Jane"}]}}',
'lax recursive $.class.students.name') AS student_names_lax_recursive;
/*-----------------------------*
| student_names_lax_recursive |
*-----------------------------*
| ["Jane"] |
*-----------------------------*/
SELECT
JSON_QUERY(
JSON '[[{"class": {"students": [{"name": "Joe"}, {"name": "Jamie"}]}}]]',
'lax recursive $.class.students.name') AS student_names_lax_recursive;
/*-----------------------------*
| student_names_lax_recursive |
*-----------------------------*
| ["Joe","Jamie"] |
*-----------------------------*/
SELECT
JSON_QUERY(
JSON '{"class": {"students": [[{"name": "John"}], {"name": "Jamie"}]}}',
'lax recursive $.class.students.name') AS student_names_lax_recursive;
/*-----------------------------*
| student_names_lax_recursive |
*-----------------------------*
| ["John","Jamie"] |
*-----------------------------*/
In the following examples, the keywords lax
and lax recursive
indicate that
non-array types should be wrapped into arrays of size 1 before matching. The
modes lax
and lax recursive
behave identically for wrapping arrays.
SELECT
JSON_QUERY(
JSON '{"class": {"students": {"name": "Jane"}}}',
'lax $.class[0].students[0].name') AS student_names_lax,
JSON_QUERY(
JSON '{"class": {"students": {"name": "Jane"}}}',
'lax recursive $.class[0].students[0].name') AS student_names_lax_recursive;
/*-------------------*-----------------------------*
| student_names_lax | student_names_lax_recursive |
*-------------------*-----------------------------*
| ["Jane"] | ["Jane"] |
*-------------------*-----------------------------*/
SELECT
JSON_QUERY(
JSON '[{"class": {"students": [{"name": "Joe"}, {"name": "Jamie"}]}}]',
'lax $.class[0].students[0].name') AS student_names_lax,
JSON_QUERY(
JSON '[{"class": {"students": [{"name": "Joe"}, {"name": "Jamie"}]}}]',
'lax recursive $.class[0].students[0].name') AS student_names_lax_recursive;
/*-------------------*-----------------------------*
| student_names_lax | student_names_lax_recursive |
*-------------------*-----------------------------*
| ["Joe"] | ["Joe"] |
*-------------------*-----------------------------*/
SELECT
JSON_QUERY(
JSON '{"class": {"students": [[{"name": "John"}], {"name": "Jamie"}]}}',
'lax $.class[0].students[0].name') AS student_names_lax,
JSON_QUERY(
JSON '{"class": {"students": [[{"name": "John"}], {"name": "Jamie"}]}}',
'lax recursive $.class[0].students[0].name') AS student_names_lax_recursive;
/*-------------------*-----------------------------*
| student_names_lax | student_names_lax_recursive |
*-------------------*-----------------------------*
| ["John"] | ["John"] |
*-------------------*-----------------------------*/
SELECT JSON_QUERY('{"a": null}', "$.a"); -- Returns a SQL NULL
SELECT JSON_QUERY('{"a": null}', "$.b"); -- Returns a SQL NULL
SELECT JSON_QUERY(JSON '{"a": null}', "$.a"); -- Returns a JSON 'null'
SELECT JSON_QUERY(JSON '{"a": null}', "$.b"); -- Returns a SQL NULL
JSON_QUERY_ARRAY
JSON_QUERY_ARRAY(json_string_expr[, json_path])
JSON_QUERY_ARRAY(json_expr[, json_path])
Description
Extracts a JSON array and converts it to
a SQL ARRAY<JSON-formatted STRING>
or
ARRAY<JSON>
value.
In addition, this function uses double quotes to escape invalid
JSONPath characters in JSON keys. For example: "a.b"
.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'["a", "b", {"key": "c"}]'
json_expr
: JSON. For example:JSON '["a", "b", {"key": "c"}]'
json_path
: The JSONPath. This identifies the data that you want to obtain from the input. If this optional parameter is not provided, then the JSONPath$
symbol is applied, which means that all of the data is analyzed.
There are differences between the JSON-formatted string and JSON input types. For details, see Differences between the JSON and JSON-formatted STRING types.
Return type
json_string_expr
:ARRAY<JSON-formatted STRING>
json_expr
:ARRAY<JSON>
Examples
This extracts items in JSON to an array of JSON
values:
SELECT JSON_QUERY_ARRAY(
JSON '{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits'
) AS json_array;
/*---------------------------------*
| json_array |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
*---------------------------------*/
This extracts the items in a JSON-formatted string to a string array:
SELECT JSON_QUERY_ARRAY('[1, 2, 3]') AS string_array;
/*--------------*
| string_array |
+--------------+
| [1, 2, 3] |
*--------------*/
This extracts a string array and converts it to an integer array:
SELECT ARRAY(
SELECT CAST(integer_element AS INT64)
FROM UNNEST(
JSON_QUERY_ARRAY('[1, 2, 3]','$')
) AS integer_element
) AS integer_array;
/*---------------*
| integer_array |
+---------------+
| [1, 2, 3] |
*---------------*/
This extracts string values in a JSON-formatted string to an array:
-- Doesn't strip the double quotes
SELECT JSON_QUERY_ARRAY('["apples", "oranges", "grapes"]', '$') AS string_array;
/*---------------------------------*
| string_array |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
*---------------------------------*/
-- Strips the double quotes
SELECT ARRAY(
SELECT JSON_VALUE(string_element, '$')
FROM UNNEST(JSON_QUERY_ARRAY('["apples", "oranges", "grapes"]', '$')) AS string_element
) AS string_array;
/*---------------------------*
| string_array |
+---------------------------+
| [apples, oranges, grapes] |
*---------------------------*/
This extracts only the items in the fruit
property to an array:
SELECT JSON_QUERY_ARRAY(
'{"fruit": [{"apples": 5, "oranges": 10}, {"apples": 2, "oranges": 4}], "vegetables": [{"lettuce": 7, "kale": 8}]}',
'$.fruit'
) AS string_array;
/*-------------------------------------------------------*
| string_array |
+-------------------------------------------------------+
| [{"apples":5,"oranges":10}, {"apples":2,"oranges":4}] |
*-------------------------------------------------------*/
These are equivalent:
SELECT JSON_QUERY_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits') AS string_array;
SELECT JSON_QUERY_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$."fruits"') AS string_array;
-- The queries above produce the following result:
/*---------------------------------*
| string_array |
+---------------------------------+
| ["apples", "oranges", "grapes"] |
*---------------------------------*/
In cases where a JSON key uses invalid JSONPath characters, you can escape those
characters using double quotes: " "
. For example:
SELECT JSON_QUERY_ARRAY('{"a.b": {"c": ["world"]}}', '$."a.b".c') AS hello;
/*-----------*
| hello |
+-----------+
| ["world"] |
*-----------*/
The following examples show how invalid requests and empty arrays are handled:
-- An error is returned if you provide an invalid JSONPath.
SELECT JSON_QUERY_ARRAY('["foo", "bar", "baz"]', 'INVALID_JSONPath') AS result;
-- If the JSONPath does not refer to an array, then NULL is returned.
SELECT JSON_QUERY_ARRAY('{"a": "foo"}', '$.a') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a key that does not exist is specified, then the result is NULL.
SELECT JSON_QUERY_ARRAY('{"a": "foo"}', '$.b') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- Empty arrays in JSON-formatted strings are supported.
SELECT JSON_QUERY_ARRAY('{"a": "foo", "b": []}', '$.b') AS result;
/*--------*
| result |
+--------+
| [] |
*--------*/
JSON_REMOVE
JSON_REMOVE(json_expr, json_path[, ...])
Produces a new SQL JSON
value with the specified JSON data removed.
Arguments:
json_expr
: JSON. For example:JSON '{"class": {"students": [{"name": "Jane"}]}}'
json_path
: Remove data at this JSONPath injson_expr
.
Details:
- Paths are evaluated left to right. The JSON produced by evaluating the first path is the JSON for the next path.
- The operation ignores non-existent paths and continue processing the rest of the paths.
- For each path, the entire matched JSON subtree is deleted.
- If the path matches a JSON object key, this function deletes the key-value pair.
- If the path matches an array element, this function deletes the specific element from the matched array.
- If removing the path results in an empty JSON object or empty JSON array, the empty structure is preserved.
- If
json_path
is$
or an invalid JSONPath, an error is produced. - If
json_path
is SQLNULL
, the path operation is ignored.
Return type
JSON
Examples
In the following example, the path $[1]
is matched and removes
["b", "c"]
.
SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]') AS json_data
/*-----------*
| json_data |
+-----------+
| ["a","d"] |
*-----------*/
You can use the field access operator to pass JSON data into this function. For example:
WITH T AS (SELECT JSON '{"a": {"b": 10, "c": 20}}' AS data)
SELECT JSON_REMOVE(data.a, '$.b') AS json_data FROM T
/*-----------*
| json_data |
+-----------+
| {"c":20} |
*-----------*/
In the following example, the first path $[1]
is matched and removes
["b", "c"]
. Then, the second path $[1]
is matched and removes "d"
.
SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]', '$[1]') AS json_data
/*-----------*
| json_data |
+-----------+
| ["a"] |
*-----------*/
The structure of an empty array is preserved when all elements are deleted from it. For example:
SELECT JSON_REMOVE(JSON '["a", ["b", "c"], "d"]', '$[1]', '$[1]', '$[0]') AS json_data
/*-----------*
| json_data |
+-----------+
| [] |
*-----------*/
In the following example, the path $.a.b.c
is matched and removes the
"c":"d"
key-value pair from the JSON object.
SELECT JSON_REMOVE(JSON '{"a": {"b": {"c": "d"}}}', '$.a.b.c') AS json_data
/*----------------*
| json_data |
+----------------+
| {"a":{"b":{}}} |
*----------------*/
In the following example, the path $.a.b
is matched and removes the
"b": {"c":"d"}
key-value pair from the JSON object.
SELECT JSON_REMOVE(JSON '{"a": {"b": {"c": "d"}}}', '$.a.b') AS json_data
/*-----------*
| json_data |
+-----------+
| {"a":{}} |
*-----------*/
In the following example, the path $.b
is not valid, so the operation makes
no changes.
SELECT JSON_REMOVE(JSON '{"a": 1}', '$.b') AS json_data
/*-----------*
| json_data |
+-----------+
| {"a":1} |
*-----------*/
In the following example, path $.a.b
and $.b
don't exist, so those
operations are ignored, but the others are processed.
SELECT JSON_REMOVE(JSON '{"a": [1, 2, 3]}', '$.a[0]', '$.a.b', '$.b', '$.a[0]') AS json_data
/*-----------*
| json_data |
+-----------+
| {"a":[3]} |
*-----------*/
If you pass in $
as the path, an error is produced. For example:
-- Error: The JSONPath cannot be '$'
SELECT JSON_REMOVE(JSON '{}', '$') AS json_data
In the following example, the operation is ignored because you can't remove data from a JSON null.
SELECT JSON_REMOVE(JSON 'null', '$.a.b') AS json_data
/*-----------*
| json_data |
+-----------+
| null |
*-----------*/
JSON_SET
JSON_SET(
json_expr,
json_path_value_pair[, ...]
[, create_if_missing => { TRUE | FALSE } ]
)
json_path_value_pair:
json_path, value
Produces a new SQL JSON
value with the specified JSON data inserted
or replaced.
Arguments:
json_expr
: JSON. For example:JSON '{"class": {"students": [{"name": "Jane"}]}}'
json_path_value_pair
: A value and the JSONPath for that value. This includes:json_path
: Insert or replacevalue
at this JSONPath injson_expr
.value
: A JSON encoding-supported value to insert.
create_if_missing
: A named argument that takes aBOOL
value.If
TRUE
(default), replaces or inserts data if the path doesn't exist.If
FALSE
, only existing JSONPath values are replaced. If the path doesn't exist, the set operation is ignored.
Details:
- Path value pairs are evaluated left to right. The JSON produced by evaluating one pair becomes the JSON against which the next pair is evaluated.
- If a matched path has an existing value, it overwrites the existing data
with
value
. If
create_if_missing
isTRUE
:- If a path doesn't exist, the remainder of the path is recursively created.
- If the matched path prefix points to a JSON null, the remainder of the
path is recursively created, and
value
is inserted. - If a path token points to a JSON array and the specified index is larger than the size of the array, pads the JSON array with JSON nulls, recursively creates the remainder of the path at the specified index, and inserts the path value pair.
This function applies all path value pair set operations even if an individual path value pair operation is invalid. For invalid operations, the operation is ignored and the function continues to process the rest of the path value pairs.
If the path exists but has an incompatible type at any given path token, no update happens for that specific path value pair.
If any
json_path
is an invalid JSONPath, an error is produced.If
json_expr
is SQLNULL
, the function returns SQLNULL
.If
json_path
is SQLNULL
, thejson_path_value_pair
operation is ignored.If
create_if_missing
is SQLNULL
, the set operation is ignored.
Return type
JSON
Examples
In the following example, the path $
matches the entire JSON
value
and replaces it with {"b": 2, "c": 3}
.
SELECT JSON_SET(JSON '{"a": 1}', '$', JSON '{"b": 2, "c": 3}') AS json_data
/*---------------*
| json_data |
+---------------+
| {"b":2,"c":3} |
*---------------*/
In the following example, create_if_missing
is FALSE
and the path $.b
doesn't exist, so the set operation is ignored.
SELECT JSON_SET(
JSON '{"a": 1}',
"$.b", 999,
create_if_missing => false) AS json_data
/*------------*
| json_data |
+------------+
| '{"a": 1}' |
*------------*/
In the following example, create_if_missing
is TRUE
and the path $.a
exists, so the value is replaced.
SELECT JSON_SET(
JSON '{"a": 1}',
"$.a", 999,
create_if_missing => false) AS json_data
/*--------------*
| json_data |
+--------------+
| '{"a": 999}' |
*--------------*/
In the following example, the path $.a
is matched, but $.a.b
does not
exist, so the new path and the value are inserted.
SELECT JSON_SET(JSON '{"a": {}}', '$.a.b', 100) AS json_data
/*-----------------*
| json_data |
+-----------------+
| {"a":{"b":100}} |
*-----------------*/
In the following example, the path prefix $
points to a JSON null, so the
remainder of the path is created for the value 100
.
SELECT JSON_SET(JSON 'null', '$.a.b', 100) AS json_data
/*-----------------*
| json_data |
+-----------------+
| {"a":{"b":100}} |
*-----------------*/
In the following example, the path $.a.c
implies that the value at $.a
is
a JSON object but it's not. This part of the operation is ignored, but the other
parts of the operation are completed successfully.
SELECT JSON_SET(
JSON '{"a": 1}',
'$.b', 2,
'$.a.c', 100,
'$.d', 3) AS json_data
/*---------------------*
| json_data |
+---------------------+
| {"a":1,"b":2,"d":3} |
*---------------------*/
In the following example, the path $.a[2]
implies that the value for $.a
is
an array, but it's not, so the operation is ignored for that value.
SELECT JSON_SET(
JSON '{"a": 1}',
'$.a[2]', 100,
'$.b', 2) AS json_data
/*---------------*
| json_data |
+---------------+
| {"a":1,"b":2} |
*---------------*/
In the following example, the path $[1]
is matched and replaces the
array element value with foo
.
SELECT JSON_SET(JSON '["a", ["b", "c"], "d"]', '$[1]', "foo") AS json_data
/*-----------------*
| json_data |
+-----------------+
| ["a","foo","d"] |
*-----------------*/
In the following example, the path $[1][0]
is matched and replaces the
array element value with foo
.
SELECT JSON_SET(JSON '["a", ["b", "c"], "d"]', '$[1][0]', "foo") AS json_data
/*-----------------------*
| json_data |
+-----------------------+
| ["a",["foo","c"],"d"] |
*-----------------------*/
In the following example, the path prefix $
points to a JSON null, so the
remainder of the path is created. The resulting array is padded with
JSON nulls and appended with foo
.
SELECT JSON_SET(JSON 'null', '$[0][3]', "foo")
/*--------------------------*
| json_data |
+--------------------------+
| [[null,null,null,"foo"]] |
*--------------------------*/
In the following example, the path $[1]
is matched, the matched array is
extended since $[1][4]
is larger than the existing array, and then foo
is
inserted in the array.
SELECT JSON_SET(JSON '["a", ["b", "c"], "d"]', '$[1][4]', "foo") AS json_data
/*-------------------------------------*
| json_data |
+-------------------------------------+
| ["a",["b","c",null,null,"foo"],"d"] |
*-------------------------------------*/
In the following example, the path $[1][0][0]
implies that the value of
$[1][0]
is an array, but it is not, so the operation is ignored.
SELECT JSON_SET(JSON '["a", ["b", "c"], "d"]', '$[1][0][0]', "foo") AS json_data
/*---------------------*
| json_data |
+---------------------+
| ["a",["b","c"],"d"] |
*---------------------*/
In the following example, the path $[1][2]
is larger than the length of
the matched array. The array length is extended and the remainder of the path
is recursively created. The operation continues to the path $[1][2][1]
and inserts foo
.
SELECT JSON_SET(JSON '["a", ["b", "c"], "d"]', '$[1][2][1]', "foo") AS json_data
/*----------------------------------*
| json_data |
+----------------------------------+
| ["a",["b","c",[null,"foo"]],"d"] |
*----------------------------------*/
In the following example, because the JSON
object is empty, key b
is
inserted, and the remainder of the path is recursively created.
SELECT JSON_SET(JSON '{}', '$.b[2].d', 100) AS json_data
/*-----------------------------*
| json_data |
+-----------------------------+
| {"b":[null,null,{"d":100}]} |
*-----------------------------*/
In the following example, multiple values are set.
SELECT JSON_SET(
JSON '{"a": 1, "b": {"c":3}, "d": [4]}',
'$.a', 'v1',
'$.b.e', 'v2',
'$.d[2]', 'v3') AS json_data
/*---------------------------------------------------*
| json_data |
+---------------------------------------------------+
| {"a":"v1","b":{"c":3,"e":"v2"},"d":[4,null,"v3"]} |
*---------------------------------------------------*/
JSON_STRIP_NULLS
JSON_STRIP_NULLS(
json_expr
[, json_path ]
[, include_arrays => { TRUE | FALSE } ]
[, remove_empty => { TRUE | FALSE } ]
)
Recursively removes JSON nulls from JSON objects and JSON arrays.
Arguments:
json_expr
: JSON. For example:JSON '{"a": null, "b": "c"}'
json_path
: Remove JSON nulls at this JSONPath forjson_expr
.include_arrays
: A named argument that's eitherTRUE
(default) orFALSE
. IfTRUE
or omitted, the function removes JSON nulls from JSON arrays. IfFALSE
, does not.remove_empty
: A named argument that's eitherTRUE
orFALSE
(default). IfTRUE
, the function removes empty JSON objects after JSON nulls are removed. IfFALSE
or omitted, does not.If
remove_empty
isTRUE
andinclude_arrays
isTRUE
or omitted, the function additionally removes empty JSON arrays.
Details:
- If a value is a JSON null, the associated key-value pair is removed.
- If
remove_empty
is set toTRUE
, the function recursively removes empty containers after JSON nulls are removed. - If the function generates JSON with nothing in it, the function returns a JSON null.
- If
json_path
is an invalid JSONPath, an error is produced. - If
json_expr
is SQLNULL
, the function returns SQLNULL
. - If
json_path
,include_arrays
, orremove_empty
is SQLNULL
, the function returnsjson_expr
.
Return type
JSON
Examples
In the following example, all JSON nulls are removed.
SELECT JSON_STRIP_NULLS(JSON '{"a": null, "b": "c"}') AS json_data
/*-----------*
| json_data |
+-----------+
| {"b":"c"} |
*-----------*/
In the following example, all JSON nulls are removed from a JSON array.
SELECT JSON_STRIP_NULLS(JSON '[1, null, 2, null]') AS json_data
/*-----------*
| json_data |
+-----------+
| [1,2] |
*-----------*/
In the following example, include_arrays
is set as FALSE
so that JSON nulls
are not removed from JSON arrays.
SELECT JSON_STRIP_NULLS(JSON '[1, null, 2, null]', include_arrays=>FALSE) AS json_data
/*-----------------*
| json_data |
+-----------------+
| [1,null,2,null] |
*-----------------*/
In the following example, remove_empty
is omitted and defaults to
FALSE
, and the empty structures are retained.
SELECT JSON_STRIP_NULLS(JSON '[1, null, 2, null, [null]]') AS json_data
/*-----------*
| json_data |
+-----------+
| [1,2,[]] |
*-----------*/
In the following example, remove_empty
is set as TRUE
, and the
empty structures are removed.
SELECT JSON_STRIP_NULLS(
JSON '[1, null, 2, null, [null]]',
remove_empty=>TRUE) AS json_data
/*-----------*
| json_data |
+-----------+
| [1,2] |
*-----------*/
In the following examples, remove_empty
is set as TRUE
, and the
empty structures are removed. Because no JSON data is left the function
returns JSON null.
SELECT JSON_STRIP_NULLS(JSON '{"a": null}', remove_empty=>TRUE) AS json_data
/*-----------*
| json_data |
+-----------+
| null |
*-----------*/
SELECT JSON_STRIP_NULLS(JSON '{"a": [null]}', remove_empty=>TRUE) AS json_data
/*-----------*
| json_data |
+-----------+
| null |
*-----------*/
In the following example, empty structures are removed for JSON objects, but not JSON arrays.
SELECT JSON_STRIP_NULLS(
JSON '{"a": {"b": {"c": null}}, "d": [null], "e": [], "f": 1}',
include_arrays=>FALSE,
remove_empty=>TRUE) AS json_data
/*---------------------------*
| json_data |
+---------------------------+
| {"d":[null],"e":[],"f":1} |
*---------------------------*/
In the following example, empty structures are removed for both JSON objects, and JSON arrays.
SELECT JSON_STRIP_NULLS(
JSON '{"a": {"b": {"c": null}}, "d": [null], "e": [], "f": 1}',
remove_empty=>TRUE) AS json_data
/*-----------*
| json_data |
+-----------+
| {"f":1} |
*-----------*/
In the following example, because no JSON data is left, the function returns a JSON null.
SELECT JSON_STRIP_NULLS(JSON 'null') AS json_data
/*-----------*
| json_data |
+-----------+
| null |
*-----------*/
JSON_TYPE
JSON_TYPE(json_expr)
Description
Gets the JSON type of the outermost JSON value and converts the name of
this type to a SQL STRING
value. The names of these JSON types can be
returned: object
, array
, string
, number
, boolean
, null
Arguments:
json_expr
: JSON. For example:JSON '{"name": "sky", "color": "blue"}'
If this expression is SQL
NULL
, the function returns SQLNULL
. If the extracted JSON value is not a valid JSON type, an error is produced.
Return type
STRING
Examples
SELECT json_val, JSON_TYPE(json_val) AS type
FROM
UNNEST(
[
JSON '"apple"',
JSON '10',
JSON '3.14',
JSON 'null',
JSON '{"city": "New York", "State": "NY"}',
JSON '["apple", "banana"]',
JSON 'false'
]
) AS json_val;
/*----------------------------------+---------*
| json_val | type |
+----------------------------------+---------+
| "apple" | string |
| 10 | number |
| 3.14 | number |
| null | null |
| {"State":"NY","city":"New York"} | object |
| ["apple","banana"] | array |
| false | boolean |
*----------------------------------+---------*/
JSON_VALUE
JSON_VALUE(json_string_expr[, json_path])
JSON_VALUE(json_expr[, json_path])
Description
Extracts a JSON scalar value and converts it to a SQL STRING
value.
In addition, this function:
- Removes the outermost quotes and unescapes the values.
- Returns a SQL
NULL
if a non-scalar value is selected. - Uses double quotes to escape invalid JSONPath characters
in JSON keys. For example:
"a.b"
.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'{"name": "Jakob", "age": "6"}'
json_expr
: JSON. For example:JSON '{"name": "Jane", "age": "6"}'
json_path
: The JSONPath. This identifies the data that you want to obtain from the input. If this optional parameter is not provided, then the JSONPath$
symbol is applied, which means that all of the data is analyzed.If
json_path
returns a JSONnull
or a non-scalar value (in other words, ifjson_path
refers to an object or an array), then a SQLNULL
is returned.
There are differences between the JSON-formatted string and JSON input types. For details, see Differences between the JSON and JSON-formatted STRING types.
Return type
STRING
Examples
In the following example, JSON data is extracted and returned as a scalar value.
SELECT JSON_VALUE(JSON '{"name": "Jakob", "age": "6" }', '$.age') AS scalar_age;
/*------------*
| scalar_age |
+------------+
| 6 |
*------------*/
The following example compares how results are returned for the JSON_QUERY
and JSON_VALUE
functions.
SELECT JSON_QUERY('{"name": "Jakob", "age": "6"}', '$.name') AS json_name,
JSON_VALUE('{"name": "Jakob", "age": "6"}', '$.name') AS scalar_name,
JSON_QUERY('{"name": "Jakob", "age": "6"}', '$.age') AS json_age,
JSON_VALUE('{"name": "Jakob", "age": "6"}', '$.age') AS scalar_age;
/*-----------+-------------+----------+------------*
| json_name | scalar_name | json_age | scalar_age |
+-----------+-------------+----------+------------+
| "Jakob" | Jakob | "6" | 6 |
*-----------+-------------+----------+------------*/
SELECT JSON_QUERY('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_query,
JSON_VALUE('{"fruits": ["apple", "banana"]}', '$.fruits') AS json_value;
/*--------------------+------------*
| json_query | json_value |
+--------------------+------------+
| ["apple","banana"] | NULL |
*--------------------+------------*/
In cases where a JSON key uses invalid JSONPath characters, you can escape those characters using double quotes. For example:
SELECT JSON_VALUE('{"a.b": {"c": "world"}}', '$."a.b".c') AS hello;
/*-------*
| hello |
+-------+
| world |
*-------*/
JSON_VALUE_ARRAY
JSON_VALUE_ARRAY(json_string_expr[, json_path])
JSON_VALUE_ARRAY(json_expr[, json_path])
Description
Extracts a JSON array of scalar values and converts it to a SQL
ARRAY<STRING>
value.
In addition, this function:
- Removes the outermost quotes and unescapes the values.
- Returns a SQL
NULL
if the selected value is not an array or not an array containing only scalar values. - Uses double quotes to escape invalid JSONPath characters
in JSON keys. For example:
"a.b"
.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'["apples", "oranges", "grapes"]'
json_expr
: JSON. For example:JSON '["apples", "oranges", "grapes"]'
json_path
: The JSONPath. This identifies the data that you want to obtain from the input. If this optional parameter is not provided, then the JSONPath$
symbol is applied, which means that all of the data is analyzed.
There are differences between the JSON-formatted string and JSON input types. For details, see Differences between the JSON and JSON-formatted STRING types.
Caveats:
- A JSON
null
in the input array produces a SQLNULL
as the output for JSONnull
. If the output contains aNULL
array element, an error is produced because the final output cannot be an array withNULL
values. - If a JSONPath matches an array that contains scalar objects and a JSON
null
, then the output of the function must be transformed because the final output cannot be an array withNULL
values.
Return type
ARRAY<STRING>
Examples
This extracts items in JSON to a string array:
SELECT JSON_VALUE_ARRAY(
JSON '{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits'
) AS string_array;
/*---------------------------*
| string_array |
+---------------------------+
| [apples, oranges, grapes] |
*---------------------------*/
The following example compares how results are returned for the
JSON_QUERY_ARRAY
and JSON_VALUE_ARRAY
functions.
SELECT JSON_QUERY_ARRAY('["apples", "oranges"]') AS json_array,
JSON_VALUE_ARRAY('["apples", "oranges"]') AS string_array;
/*-----------------------+-------------------*
| json_array | string_array |
+-----------------------+-------------------+
| ["apples", "oranges"] | [apples, oranges] |
*-----------------------+-------------------*/
This extracts the items in a JSON-formatted string to a string array:
-- Strips the double quotes
SELECT JSON_VALUE_ARRAY('["foo", "bar", "baz"]', '$') AS string_array;
/*-----------------*
| string_array |
+-----------------+
| [foo, bar, baz] |
*-----------------*/
This extracts a string array and converts it to an integer array:
SELECT ARRAY(
SELECT CAST(integer_element AS INT64)
FROM UNNEST(
JSON_VALUE_ARRAY('[1, 2, 3]', '$')
) AS integer_element
) AS integer_array;
/*---------------*
| integer_array |
+---------------+
| [1, 2, 3] |
*---------------*/
These are equivalent:
SELECT JSON_VALUE_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$.fruits') AS string_array;
SELECT JSON_VALUE_ARRAY('{"fruits": ["apples", "oranges", "grapes"]}', '$."fruits"') AS string_array;
-- The queries above produce the following result:
/*---------------------------*
| string_array |
+---------------------------+
| [apples, oranges, grapes] |
*---------------------------*/
In cases where a JSON key uses invalid JSONPath characters, you can escape those
characters using double quotes: " "
. For example:
SELECT JSON_VALUE_ARRAY('{"a.b": {"c": ["world"]}}', '$."a.b".c') AS hello;
/*---------*
| hello |
+---------+
| [world] |
*---------*/
The following examples explore how invalid requests and empty arrays are handled:
-- An error is thrown if you provide an invalid JSONPath.
SELECT JSON_VALUE_ARRAY('["foo", "bar", "baz"]', 'INVALID_JSONPath') AS result;
-- If the JSON-formatted string is invalid, then NULL is returned.
SELECT JSON_VALUE_ARRAY('}}', '$') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If the JSON document is NULL, then NULL is returned.
SELECT JSON_VALUE_ARRAY(NULL, '$') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath does not match anything, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a": ["foo", "bar", "baz"]}', '$.b') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath matches an object that is not an array, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a": "foo"}', '$') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath matches an array of non-scalar objects, then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a": [{"b": "foo", "c": 1}, {"b": "bar", "c": 2}], "d": "baz"}', '$.a') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath matches an array of mixed scalar and non-scalar objects,
-- then the output is NULL.
SELECT JSON_VALUE_ARRAY('{"a": [10, {"b": 20}]', '$.a') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
-- If a JSONPath matches an empty JSON array, then the output is an empty array instead of NULL.
SELECT JSON_VALUE_ARRAY('{"a": "foo", "b": []}', '$.b') AS result;
/*--------*
| result |
+--------+
| [] |
*--------*/
-- The following query produces and error because the final output cannot be an
-- array with NULLs.
SELECT JSON_VALUE_ARRAY('["world", 1, null]') AS result;
LAX_BOOL
LAX_BOOL(json_expr)
Description
Attempts to convert a JSON value to a SQL BOOL
value.
Arguments:
json_expr
: JSON. For example:JSON 'true'
Details:
- If
json_expr
is SQLNULL
, the function returns SQLNULL
. - See the conversion rules in the next section for additional
NULL
handling.
Conversion rules
From JSON type | To SQL BOOL |
---|---|
boolean |
If the JSON boolean is true , returns TRUE .
Otherwise, returns FALSE .
|
string |
If the JSON string is 'true' , returns TRUE .
If the JSON string is 'false' , returns FALSE .
If the JSON string is any other value or has whitespace in it,
returns NULL .
This conversion is case-insensitive.
|
number |
If the JSON number is a representation of 0 ,
returns FALSE . Otherwise, returns TRUE .
|
other type or null | NULL |
Return type
BOOL
Examples
Example with input that is a JSON boolean:
SELECT LAX_BOOL(JSON 'true') AS result;
/*--------*
| result |
+--------+
| true |
*--------*/
Examples with inputs that are JSON strings:
SELECT LAX_BOOL(JSON '"true"') AS result;
/*--------*
| result |
+--------+
| TRUE |
*--------*/
SELECT LAX_BOOL(JSON '"true "') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
SELECT LAX_BOOL(JSON '"foo"') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
Examples with inputs that are JSON numbers:
SELECT LAX_BOOL(JSON '10') AS result;
/*--------*
| result |
+--------+
| TRUE |
*--------*/
SELECT LAX_BOOL(JSON '0') AS result;
/*--------*
| result |
+--------+
| FALSE |
*--------*/
SELECT LAX_BOOL(JSON '0.0') AS result;
/*--------*
| result |
+--------+
| FALSE |
*--------*/
SELECT LAX_BOOL(JSON '-1.1') AS result;
/*--------*
| result |
+--------+
| TRUE |
*--------*/
LAX_FLOAT64
LAX_FLOAT64(json_expr)
Description
Attempts to convert a JSON value to a
SQL FLOAT64
value.
Arguments:
json_expr
: JSON. For example:JSON '9.8'
Details:
- If
json_expr
is SQLNULL
, the function returns SQLNULL
. - See the conversion rules in the next section for additional
NULL
handling.
Conversion rules
From JSON type | To SQL FLOAT64 |
---|---|
boolean |
NULL
|
string |
If the JSON string represents a JSON number, parses it as
a BIGNUMERIC value, and then safe casts the result as a
FLOAT64 value.
If the JSON string can't be converted, returns NULL .
|
number |
Casts the JSON number as a
FLOAT64 value.
Large JSON numbers are rounded.
|
other type or null | NULL |
Return type
FLOAT64
Examples
Examples with inputs that are JSON numbers:
SELECT LAX_FLOAT64(JSON '9.8') AS result;
/*--------*
| result |
+--------+
| 9.8 |
*--------*/
SELECT LAX_FLOAT64(JSON '9') AS result;
/*--------*
| result |
+--------+
| 9.0 |
*--------*/
SELECT LAX_FLOAT64(JSON '9007199254740993') AS result;
/*--------------------*
| result |
+--------------------+
| 9007199254740992.0 |
*--------------------*/
SELECT LAX_FLOAT64(JSON '1e100') AS result;
/*--------*
| result |
+--------+
| 1e+100 |
*--------*/
Examples with inputs that are JSON booleans:
SELECT LAX_FLOAT64(JSON 'true') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
SELECT LAX_FLOAT64(JSON 'false') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
Examples with inputs that are JSON strings:
SELECT LAX_FLOAT64(JSON '"10"') AS result;
/*--------*
| result |
+--------+
| 10.0 |
*--------*/
SELECT LAX_FLOAT64(JSON '"1.1"') AS result;
/*--------*
| result |
+--------+
| 1.1 |
*--------*/
SELECT LAX_FLOAT64(JSON '"1.1e2"') AS result;
/*--------*
| result |
+--------+
| 110.0 |
*--------*/
SELECT LAX_FLOAT64(JSON '"9007199254740993"') AS result;
/*--------------------*
| result |
+--------------------+
| 9007199254740992.0 |
*--------------------*/
SELECT LAX_FLOAT64(JSON '"+1.5"') AS result;
/*--------*
| result |
+--------+
| 1.5 |
*--------*/
SELECT LAX_FLOAT64(JSON '"NaN"') AS result;
/*--------*
| result |
+--------+
| NaN |
*--------*/
SELECT LAX_FLOAT64(JSON '"Inf"') AS result;
/*----------*
| result |
+----------+
| Infinity |
*----------*/
SELECT LAX_FLOAT64(JSON '"-InfiNiTY"') AS result;
/*-----------*
| result |
+-----------+
| -Infinity |
*-----------*/
SELECT LAX_FLOAT64(JSON '"foo"') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
LAX_INT64
LAX_INT64(json_expr)
Description
Attempts to convert a JSON value to a SQL INT64
value.
Arguments:
json_expr
: JSON. For example:JSON '999'
Details:
- If
json_expr
is SQLNULL
, the function returns SQLNULL
. - See the conversion rules in the next section for additional
NULL
handling.
Conversion rules
From JSON type | To SQL INT64 |
---|---|
boolean |
If the JSON boolean is true , returns 1 .
If false , returns 0 .
|
string |
If the JSON string represents a JSON number, parses it as
a BIGNUMERIC value, and then safe casts the results as an
INT64 value.
If the JSON string can't be converted, returns NULL .
|
number |
Casts the JSON number as an INT64 value.
If the JSON number can't be converted, returns NULL .
|
other type or null | NULL |
Return type
INT64
Examples
Examples with inputs that are JSON numbers:
SELECT LAX_INT64(JSON '10') AS result;
/*--------*
| result |
+--------+
| 10 |
*--------*/
SELECT LAX_INT64(JSON '10.0') AS result;
/*--------*
| result |
+--------+
| 10 |
*--------*/
SELECT LAX_INT64(JSON '1.1') AS result;
/*--------*
| result |
+--------+
| 1 |
*--------*/
SELECT LAX_INT64(JSON '3.5') AS result;
/*--------*
| result |
+--------+
| 4 |
*--------*/
SELECT LAX_INT64(JSON '1.1e2') AS result;
/*--------*
| result |
+--------+
| 110 |
*--------*/
SELECT LAX_INT64(JSON '1e100') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
Examples with inputs that are JSON booleans:
SELECT LAX_INT64(JSON 'true') AS result;
/*--------*
| result |
+--------+
| 1 |
*--------*/
SELECT LAX_INT64(JSON 'false') AS result;
/*--------*
| result |
+--------+
| 0 |
*--------*/
Examples with inputs that are JSON strings:
SELECT LAX_INT64(JSON '"10"') AS result;
/*--------*
| result |
+--------+
| 10 |
*--------*/
SELECT LAX_INT64(JSON '"1.1"') AS result;
/*--------*
| result |
+--------+
| 1 |
*--------*/
SELECT LAX_INT64(JSON '"1.1e2"') AS result;
/*--------*
| result |
+--------+
| 110 |
*--------*/
SELECT LAX_INT64(JSON '"+1.5"') AS result;
/*--------*
| result |
+--------+
| 2 |
*--------*/
SELECT LAX_INT64(JSON '"1e100"') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
SELECT LAX_INT64(JSON '"foo"') AS result;
/*--------*
| result |
+--------+
| NULL |
*--------*/
LAX_STRING
LAX_STRING(json_expr)
Description
Attempts to convert a JSON value to a SQL STRING
value.
Arguments:
json_expr
: JSON. For example:JSON '"name"'
Details:
- If
json_expr
is SQLNULL
, the function returns SQLNULL
. - See the conversion rules in the next section for additional
NULL
handling.
Conversion rules
From JSON type | To SQL STRING |
---|---|
boolean |
If the JSON boolean is true , returns 'true' .
If false , returns 'false' .
|
string |
Returns the JSON string as a STRING value.
|
number |
Returns the JSON number as a STRING value.
|
other type or null | NULL |
Return type
STRING
Examples
Examples with inputs that are JSON strings:
SELECT LAX_STRING(JSON '"purple"') AS result;
/*--------*
| result |
+--------+
| purple |
*--------*/
SELECT LAX_STRING(JSON '"10"') AS result;
/*--------*
| result |
+--------+
| 10 |
*--------*/
Examples with inputs that are JSON booleans:
SELECT LAX_STRING(JSON 'true') AS result;
/*--------*
| result |
+--------+
| true |
*--------*/
SELECT LAX_STRING(JSON 'false') AS result;
/*--------*
| result |
+--------+
| false |
*--------*/
Examples with inputs that are JSON numbers:
SELECT LAX_STRING(JSON '10.0') AS result;
/*--------*
| result |
+--------+
| 10 |
*--------*/
SELECT LAX_STRING(JSON '10') AS result;
/*--------*
| result |
+--------+
| 10 |
*--------*/
SELECT LAX_STRING(JSON '1e100') AS result;
/*--------*
| result |
+--------+
| 1e+100 |
*--------*/
PARSE_JSON
PARSE_JSON(
json_string_expr
[, wide_number_mode => { 'exact' | 'round' } ]
)
Description
Converts a JSON-formatted STRING
value to a JSON
value.
Arguments:
json_string_expr
: A JSON-formatted string. For example:'{"class": {"students": [{"name": "Jane"}]}}'
wide_number_mode
: A named argument with aSTRING
value. Determines how to handle numbers that can't be stored in aJSON
value without the loss of precision. If used,wide_number_mode
must include one of the following values:exact
(default): Only accept numbers that can be stored without loss of precision. If a number that cannot be stored without loss of precision is encountered, the function throws an error.round
: If a number that cannot be stored without loss of precision is encountered, attempt to round it to a number that can be stored without loss of precision. If the number cannot be rounded, the function throws an error.
If a number appears in a JSON object or array, the
wide_number_mode
argument is applied to the number in the object or array.
Numbers from the following domains can be stored in JSON without loss of precision:
- 64-bit signed/unsigned integers, such as
INT64
FLOAT64
Return type
JSON
Examples
In the following example, a JSON-formatted string is converted to JSON
.
SELECT PARSE_JSON('{"coordinates": [10, 20], "id": 1}') AS json_data;
/*--------------------------------*
| json_data |
+--------------------------------+
| {"coordinates":[10,20],"id":1} |
*--------------------------------*/
The following queries fail because:
- The number that was passed in cannot be stored without loss of precision.
wide_number_mode=>'exact'
is used implicitly in the first query and explicitly in the second query.
SELECT PARSE_JSON('{"id": 922337203685477580701}') AS json_data; -- fails
SELECT PARSE_JSON('{"id": 922337203685477580701}', wide_number_mode=>'exact') AS json_data; -- fails
The following query rounds the number to a number that can be stored in JSON.
SELECT PARSE_JSON('{"id": 922337203685477580701}', wide_number_mode=>'round') AS json_data;
/*------------------------------*
| json_data |
+------------------------------+
| {"id":9.223372036854776e+20} |
*------------------------------*/
You can also use valid JSON-formatted strings that don't represent name/value pairs. For example:
SELECT PARSE_JSON('6') AS json_data;
/*------------------------------*
| json_data |
+------------------------------+
| 6 |
*------------------------------*/
SELECT PARSE_JSON('"red"') AS json_data;
/*------------------------------*
| json_data |
+------------------------------+
| "red" |
*------------------------------*/
STRING
STRING(json_expr)
Description
Converts a JSON string to a SQL STRING
value.
Arguments:
json_expr
: JSON. For example:JSON '"purple"'
If the JSON value is not a string, an error is produced. If the expression is SQL
NULL
, the function returns SQLNULL
.
Return type
STRING
Examples
SELECT STRING(JSON '"purple"') AS color;
/*--------*
| color |
+--------+
| purple |
*--------*/
SELECT STRING(JSON_QUERY(JSON '{"name": "sky", "color": "blue"}', "$.color")) AS color;
/*-------*
| color |
+-------+
| blue |
*-------*/
The following examples show how invalid requests are handled:
-- An error is thrown if the JSON is not of type string.
SELECT STRING(JSON '123') AS result; -- Throws an error
SELECT STRING(JSON 'null') AS result; -- Throws an error
SELECT SAFE.STRING(JSON '123') AS result; -- Returns a SQL NULL
TO_JSON
TO_JSON(
sql_value
[, stringify_wide_numbers => { TRUE | FALSE } ]
)
Description
Converts a SQL value to a JSON value.
Arguments:
sql_value
: The SQL value to convert to a JSON value. You can review the GoogleSQL data types that this function supports and their JSON encodings here.stringify_wide_numbers
: A named argument that's eitherTRUE
orFALSE
(default).- If
TRUE
, numeric values outside of theFLOAT64
type domain are encoded as strings. - If
FALSE
(default), numeric values outside of theFLOAT64
type domain are not encoded as strings, but are stored as JSON numbers. If a numerical value cannot be stored in JSON without loss of precision, an error is thrown.
The following numerical data types are affected by the
stringify_wide_numbers
argument:- If
INT64
NUMERIC
BIGNUMERIC
If one of these numerical data types appears in a container data type such as an
ARRAY
orSTRUCT
, thestringify_wide_numbers
argument is applied to the numerical data types in the container data type.
Return type
JSON
Examples
In the following example, the query converts rows in a table to JSON values.
With CoordinatesTable AS (
(SELECT 1 AS id, [10, 20] AS coordinates) UNION ALL
(SELECT 2 AS id, [30, 40] AS coordinates) UNION ALL
(SELECT 3 AS id, [50, 60] AS coordinates))
SELECT TO_JSON(t) AS json_objects
FROM CoordinatesTable AS t;
/*--------------------------------*
| json_objects |
+--------------------------------+
| {"coordinates":[10,20],"id":1} |
| {"coordinates":[30,40],"id":2} |
| {"coordinates":[50,60],"id":3} |
*--------------------------------*/
In the following example, the query returns a large numerical value as a JSON string.
SELECT TO_JSON(9007199254740993, stringify_wide_numbers=>TRUE) as stringify_on;
/*--------------------*
| stringify_on |
+--------------------+
| "9007199254740993" |
*--------------------*/
In the following example, both queries return a large numerical value as a JSON number.
SELECT TO_JSON(9007199254740993, stringify_wide_numbers=>FALSE) as stringify_off;
SELECT TO_JSON(9007199254740993) as stringify_off;
/*------------------*
| stringify_off |
+------------------+
| 9007199254740993 |
*------------------*/
In the following example, only large numeric values are converted to JSON strings.
With T1 AS (
(SELECT 9007199254740993 AS id) UNION ALL
(SELECT 2 AS id))
SELECT TO_JSON(t, stringify_wide_numbers=>TRUE) AS json_objects
FROM T1 AS t;
/*---------------------------*
| json_objects |
+---------------------------+
| {"id":"9007199254740993"} |
| {"id":2} |
*---------------------------*/
In this example, the values 9007199254740993
(INT64
)
and 2.1
(FLOAT64
) are converted
to the common supertype FLOAT64
, which is not
affected by the stringify_wide_numbers
argument.
With T1 AS (
(SELECT 9007199254740993 AS id) UNION ALL
(SELECT 2.1 AS id))
SELECT TO_JSON(t, stringify_wide_numbers=>TRUE) AS json_objects
FROM T1 AS t;
/*------------------------------*
| json_objects |
+------------------------------+
| {"id":9.007199254740992e+15} |
| {"id":2.1} |
*------------------------------*/
TO_JSON_STRING
TO_JSON_STRING(value[, pretty_print])
Description
Converts a SQL value to a JSON-formatted STRING
value.
Arguments:
value
: A SQL value. You can review the GoogleSQL data types that this function supports and their JSON encodings here.pretty_print
: Optional boolean parameter. Ifpretty_print
istrue
, the `returned value is formatted for easy readability.
Return type
A JSON-formatted STRING
Examples
The following query converts a STRUCT
value to a JSON-formatted string:
SELECT TO_JSON_STRING(STRUCT(1 AS id, [10,20] AS coordinates)) AS json_data
/*--------------------------------*
| json_data |
+--------------------------------+
| {"id":1,"coordinates":[10,20]} |
*--------------------------------*/
The following query converts a STRUCT
value to a JSON-formatted string that is
easy to read:
SELECT TO_JSON_STRING(STRUCT(1 AS id, [10,20] AS coordinates), true) AS json_data
/*--------------------*
| json_data |
+--------------------+
| { |
| "id": 1, |
| "coordinates": [ |
| 10, |
| 20 |
| ] |
| } |
*--------------------*/
Supplemental materials
Differences between the JSON and JSON-formatted STRING types
Many JSON functions accept two input types:
JSON
typeSTRING
type
The STRING
version of the extraction functions behaves differently than the
JSON
version, mainly because JSON
type values are always validated whereas
JSON-formatted STRING
type values are not.
Non-validation of STRING
inputs
The following STRING
is invalid JSON because it is missing a trailing }
:
{"hello": "world"
The JSON function reads the input from the beginning and stops as soon as the field to extract is found, without reading the remainder of the input. A parsing error is not produced.
With the JSON
type, however, JSON '{"hello": "world"'
returns a parsing
error.
For example:
SELECT JSON_VALUE('{"hello": "world"', "$.hello") AS hello;
/*-------*
| hello |
+-------+
| world |
*-------*/
SELECT JSON_VALUE(JSON '{"hello": "world"', "$.hello") AS hello;
-- An error is returned: Invalid JSON literal: syntax error while parsing
-- object - unexpected end of input; expected '}'
No strict validation of extracted values
In the following examples, duplicated keys are not removed when using a
JSON-formatted string. Similarly, keys order is preserved. For the JSON
type, JSON '{"key": 1, "key": 2}'
will result in JSON '{"key":1}'
during
parsing.
SELECT JSON_QUERY('{"key": 1, "key": 2}', "$") AS string;
/*-------------------*
| string |
+-------------------+
| {"key":1,"key":2} |
*-------------------*/
SELECT JSON_QUERY(JSON '{"key": 1, "key": 2}', "$") AS json;
/*-----------*
| json |
+-----------+
| {"key":1} |
*-----------*/
JSON null
When using a JSON-formatted STRING
type in a JSON function, a JSON null
value is extracted as a SQL NULL
value.
When using a JSON type in a JSON function, a JSON null
value returns a JSON
null
value.
WITH t AS (
SELECT '{"name": null}' AS json_string, JSON '{"name": null}' AS json)
SELECT JSON_QUERY(json_string, "$.name") AS name_string,
JSON_QUERY(json_string, "$.name") IS NULL AS name_string_is_null,
JSON_QUERY(json, "$.name") AS name_json,
JSON_QUERY(json, "$.name") IS NULL AS name_json_is_null
FROM t;
/*-------------+---------------------+-----------+-------------------*
| name_string | name_string_is_null | name_json | name_json_is_null |
+-------------+---------------------+-----------+-------------------+
| NULL | true | null | false |
*-------------+---------------------+-----------+-------------------*/
JSON encodings
You can encode a SQL value as a JSON value with the following functions:
TO_JSON_STRING
TO_JSON
JSON_SET
(usesTO_JSON
encoding)JSON_ARRAY
(usesTO_JSON
encoding)JSON_ARRAY_APPEND
(usesTO_JSON
encoding)JSON_ARRAY_INSERT
(usesTO_JSON
encoding)JSON_OBJECT
(usesTO_JSON
encoding)
The following SQL to JSON encodings are supported:
From SQL | To JSON | Examples |
---|---|---|
NULL |
null |
SQL input: NULL JSON output: null
|
BOOL | boolean |
SQL input: TRUE JSON output: true SQL input: FALSE JSON output: false |
INT64 |
(TO_JSON_STRING only) number or string Encoded as a number when the value is in the range of [-253, 253], which is the range of integers that can be represented losslessly as IEEE 754 double-precision floating point numbers. A value outside of this range is encoded as a string. |
SQL input: 9007199254740992 JSON output: 9007199254740992 SQL input: 9007199254740993 JSON output: "9007199254740993" |
INT64 |
(TO_JSON only) number or string
If the
If the |
SQL input: 9007199254740992 JSON output: 9007199254740992 SQL input: 9007199254740993 JSON output: 9007199254740993 SQL input with stringify_wide_numbers=>TRUE: 9007199254740992 JSON output: 9007199254740992 SQL input with stringify_wide_numbers=>TRUE: 9007199254740993 JSON output: "9007199254740993" |
NUMERIC
BIGNUMERIC |
(TO_JSON_STRING only) number or string Encoded as a number when the value is in the range of [-253, 253] and has no fractional part. A value outside of this range is encoded as a string. |
SQL input: -1 JSON output: -1 SQL input: 0 JSON output: 0 SQL input: 9007199254740993 JSON output: "9007199254740993" SQL input: 123.56 JSON output: "123.56" |
NUMERIC
BIGNUMERIC |
(TO_JSON only) number or string
If the |
SQL input: -1 JSON output: -1 SQL input: 0 JSON output: 0 SQL input: 9007199254740993 JSON output: 9007199254740993 SQL input: 123.56 JSON output: 123.56 SQL input with stringify_wide_numbers=>TRUE: 9007199254740993 JSON output: "9007199254740993" SQL input with stringify_wide_numbers=>TRUE: 123.56 JSON output: 123.56 |
FLOAT64 |
number or string
|
SQL input: 1.0 JSON output: 1 SQL input: 9007199254740993 JSON output: 9007199254740993 SQL input: "+inf" JSON output: "Infinity" SQL input: "-inf" JSON output: "-Infinity" SQL input: "NaN" JSON output: "NaN" |
STRING |
string
Encoded as a string, escaped according to the JSON standard.
Specifically, |
SQL input: "abc" JSON output: "abc" SQL input: "\"abc\"" JSON output: "\"abc\"" |
BYTES |
string Uses RFC 4648 Base64 data encoding. |
SQL input: b"Google" JSON output: "R29vZ2xl" |
DATE | string |
SQL input: DATE '2017-03-06' JSON output: "2017-03-06" |
TIMESTAMP |
string Encoded as ISO 8601 date and time, where T separates the date and time and Z (Zulu/UTC) represents the time zone. |
SQL input: TIMESTAMP '2017-03-06 12:34:56.789012' JSON output: "2017-03-06T12:34:56.789012Z" |
DATETIME |
string Encoded as ISO 8601 date and time, where T separates the date and time. |
SQL input: DATETIME '2017-03-06 12:34:56.789012' JSON output: "2017-03-06T12:34:56.789012" |
TIME |
string Encoded as ISO 8601 time. |
SQL input: TIME '12:34:56.789012' JSON output: "12:34:56.789012" |
JSON |
data of the input JSON |
SQL input: JSON '{"item": "pen", "price": 10}' JSON output: {"item":"pen", "price":10} SQL input: [1, 2, 3] JSON output: [1, 2, 3] |
ARRAY |
array Can contain zero or more elements. |
SQL input: ["red", "blue", "green"] JSON output: ["red","blue","green"] SQL input: [1, 2, 3] JSON output: [1,2,3] |
STRUCT |
object The object can contain zero or more key-value pairs. Each value is formatted according to its type.
For
Anonymous fields are represented with
Invalid UTF-8 field names might result in unparseable JSON. String
values are escaped according to the JSON standard. Specifically,
|
SQL input: STRUCT(12 AS purchases, TRUE AS inStock) JSON output: {"inStock": true,"purchases":12} |
RANGE |
range
Encoded as an object with a |
SQL input: RANGE<DATE> '[2024-07-24, 2024-07-25)' JSON output: {"start":"2024-07-24","end":"2024-07-25"} SQL input: RANGE<DATETIME> '[2024-07-24 10:00:00, UNBOUNDED)' JSON output: {"start":"2024-07-24T10:00:00","end":null} |
JSONPath format
With the JSONPath format, you can identify the values you want to obtain from a JSON-formatted string.
If a key in a JSON functions contains a JSON format operator, refer to each JSON function for how to escape them.
A JSON function returns NULL
if the JSONPath format does not match a value in
a JSON-formatted string. If the selected value for a scalar function is not
scalar, such as an object or an array, the function returns NULL
. If the
JSONPath format is invalid, an error is produced.
Operators for JSONPath
The JSONPath format supports these operators:
Operator | Description | Examples |
---|---|---|
$ |
Root object or element. The JSONPath format must start with this operator, which refers to the outermost level of the JSON-formatted string. |
JSON-formatted string:
JSON path:
JSON result: |
. |
Child operator. You can identify child values using dot-notation. |
JSON-formatted string:
JSON path:
JSON result: |
[] |
Subscript operator. If the object is a JSON array, you can use brackets to specify the array index. |
JSON-formatted string:
JSON path:
JSON result: |
[][] [][][]...
|
Child subscript operator. If the object is a JSON array within an array, you can use as many additional brackets as you need to specify the child array index. |
JSON-formatted string:
JSON path:
JSON result: |
Modes for JSONPath
Some JSON functions that take a JSONPath let you specify a mode that indicates
how the JSONPath matches the JSON data structure. For example, the JSONPath
could be lax $.class.students
. The following modes are supported:
Mode | Description | Example JSONPath |
---|---|---|
strict (default) |
The JSONPath must structurally match the JSON data | "$.class.students" |
lax |
Implicitly adapts the path to the structure of the JSON data. If the
JSONPath doesn't exactly match the JSON data, then the following rules apply:
|
"lax $.class.students" |
lax recursive |
In addition to lax behavior, JSONPath unwraps consecutive arrays until a non-array type is encountered.
|
"lax recursive $.class.students" |
Mathematical functions
GoogleSQL for BigQuery supports mathematical functions. All mathematical functions have the following behaviors:
- They return
NULL
if any of the input parameters isNULL
. - They return
NaN
if any of the arguments isNaN
.
Categories
Category | Functions |
---|---|
Trigonometric |
ACOS
ACOSH
ASIN
ASINH
ATAN
ATAN2
ATANH
COS
COSH
COT
COTH
CSC
CSCH
SEC
SECH
SIN
SINH
TAN
TANH
|
Exponential and logarithmic |
EXP
LN
LOG
LOG10
|
Rounding and truncation |
CEIL
CEILING
FLOOR
ROUND
TRUNC
|
Power and root |
CBRT
POW
POWER
SQRT
|
Sign |
ABS
SIGN
|
Distance |
COSINE_DISTANCE
EUCLIDEAN_DISTANCE
|
Comparison |
GREATEST
LEAST
|
Random number generator |
RAND
|
Arithmetic and error handling |
DIV
IEEE_DIVIDE
IS_INF
IS_NAN
MOD
SAFE_ADD
SAFE_DIVIDE
SAFE_MULTIPLY
SAFE_NEGATE
SAFE_SUBTRACT
|
Bucket |
RANGE_BUCKET
|
Function list
Name | Summary |
---|---|
ABS
|
Computes the absolute value of X .
|
ACOS
|
Computes the inverse cosine of X .
|
ACOSH
|
Computes the inverse hyperbolic cosine of X .
|
ASIN
|
Computes the inverse sine of X .
|
ASINH
|
Computes the inverse hyperbolic sine of X .
|
ATAN
|
Computes the inverse tangent of X .
|
ATAN2
|
Computes the inverse tangent of X/Y , using the signs of
X and Y to determine the quadrant.
|
ATANH
|
Computes the inverse hyperbolic tangent of X .
|
CBRT
|
Computes the cube root of X .
|
CEIL
|
Gets the smallest integral value that is not less than X .
|
CEILING
|
Synonym of CEIL .
|
COS
|
Computes the cosine of X .
|
COSH
|
Computes the hyperbolic cosine of X .
|
COSINE_DISTANCE
|
Computes the cosine distance between two vectors. |
COT
|
Computes the cotangent of X .
|
COTH
|
Computes the hyperbolic cotangent of X .
|
CSC
|
Computes the cosecant of X .
|
CSCH
|
Computes the hyperbolic cosecant of X .
|
DIV
|
Divides integer X by integer Y .
|
EXP
|
Computes e to the power of X .
|
EUCLIDEAN_DISTANCE
|
Computes the Euclidean distance between two vectors. |
FLOOR
|
Gets the largest integral value that is not greater than X .
|
GREATEST
|
Gets the greatest value among X1,...,XN .
|
IEEE_DIVIDE
|
Divides X by Y , but does not generate errors for
division by zero or overflow.
|
IS_INF
|
Checks if X is positive or negative infinity.
|
IS_NAN
|
Checks if X is a NaN value.
|
LEAST
|
Gets the least value among X1,...,XN .
|
LN
|
Computes the natural logarithm of X .
|
LOG
|
Computes the natural logarithm of X or the logarithm of
X to base Y .
|
LOG10
|
Computes the natural logarithm of X to base 10.
|
MOD
|
Gets the remainder of the division of X by Y .
|
POW
|
Produces the value of X raised to the power of Y .
|
POWER
|
Synonym of POW .
|
RAND
|
Generates a pseudo-random value of type
FLOAT64 in the range of
[0, 1) .
|
RANGE_BUCKET
|
Scans through a sorted array and returns the 0-based position of a point's upper bound. |
ROUND
|
Rounds X to the nearest integer or rounds X
to N decimal places after the decimal point.
|
SAFE_ADD
|
Equivalent to the addition operator (X + Y ), but returns
NULL if overflow occurs.
|
SAFE_DIVIDE
|
Equivalent to the division operator (X / Y ), but returns
NULL if an error occurs.
|
SAFE_MULTIPLY
|
Equivalent to the multiplication operator (X * Y ),
but returns NULL if overflow occurs.
|
SAFE_NEGATE
|
Equivalent to the unary minus operator (-X ), but returns
NULL if overflow occurs.
|
SAFE_SUBTRACT
|
Equivalent to the subtraction operator (X - Y ), but
returns NULL if overflow occurs.
|
SEC
|
Computes the secant of X .
|
SECH
|
Computes the hyperbolic secant of X .
|
SIGN
|
Produces -1 , 0, or +1 for negative, zero, and positive arguments respectively. |
SIN
|
Computes the sine of X .
|
SINH
|
Computes the hyperbolic sine of X .
|
SQRT
|
Computes the square root of X .
|
TAN
|
Computes the tangent of X .
|
TANH
|
Computes the hyperbolic tangent of X .
|
TRUNC
|
Rounds a number like ROUND(X) or ROUND(X, N) ,
but always rounds towards zero and never overflows.
|
ABS
ABS(X)
Description
Computes absolute value. Returns an error if the argument is an integer and the output value cannot be represented as the same type; this happens only for the largest negative input value, which has no positive representation.
X | ABS(X) |
---|---|
25 | 25 |
-25 | 25 |
+inf |
+inf |
-inf |
+inf |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
ACOS
ACOS(X)
Description
Computes the principal value of the inverse cosine of X. The return value is in the range [0,π]. Generates an error if X is a value outside of the range [-1, 1].
X | ACOS(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
X < -1 | Error |
X > 1 | Error |
ACOSH
ACOSH(X)
Description
Computes the inverse hyperbolic cosine of X. Generates an error if X is a value less than 1.
X | ACOSH(X) |
---|---|
+inf |
+inf |
-inf |
NaN |
NaN |
NaN |
X < 1 | Error |
ASIN
ASIN(X)
Description
Computes the principal value of the inverse sine of X. The return value is in the range [-π/2,π/2]. Generates an error if X is outside of the range [-1, 1].
X | ASIN(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
X < -1 | Error |
X > 1 | Error |
ASINH
ASINH(X)
Description
Computes the inverse hyperbolic sine of X. Does not fail.
X | ASINH(X) |
---|---|
+inf |
+inf |
-inf |
-inf |
NaN |
NaN |
ATAN
ATAN(X)
Description
Computes the principal value of the inverse tangent of X. The return value is in the range [-π/2,π/2]. Does not fail.
X | ATAN(X) |
---|---|
+inf |
π/2 |
-inf |
-π/2 |
NaN |
NaN |
ATAN2
ATAN2(X, Y)
Description
Calculates the principal value of the inverse tangent of X/Y using the signs of the two arguments to determine the quadrant. The return value is in the range [-π,π].
X | Y | ATAN2(X, Y) |
---|---|---|
NaN |
Any value | NaN |
Any value | NaN |
NaN |
0.0 | 0.0 | 0.0 |
Positive Finite value | -inf |
π |
Negative Finite value | -inf |
-π |
Finite value | +inf |
0.0 |
+inf |
Finite value | π/2 |
-inf |
Finite value | -π/2 |
+inf |
-inf |
¾π |
-inf |
-inf |
-¾π |
+inf |
+inf |
π/4 |
-inf |
+inf |
-π/4 |
ATANH
ATANH(X)
Description
Computes the inverse hyperbolic tangent of X. Generates an error if X is outside of the range (-1, 1).
X | ATANH(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
X < -1 | Error |
X > 1 | Error |
CBRT
CBRT(X)
Description
Computes the cube root of X
. X
can be any data type
that coerces to FLOAT64
.
Supports the SAFE.
prefix.
X | CBRT(X) |
---|---|
+inf |
inf |
-inf |
-inf |
NaN |
NaN |
0 |
0 |
NULL |
NULL |
Return Data Type
FLOAT64
Example
SELECT CBRT(27) AS cube_root;
/*--------------------*
| cube_root |
+--------------------+
| 3.0000000000000004 |
*--------------------*/
CEIL
CEIL(X)
Description
Returns the smallest integral value that is not less than X.
X | CEIL(X) |
---|---|
2.0 | 2.0 |
2.3 | 3.0 |
2.8 | 3.0 |
2.5 | 3.0 |
-2.3 | -2.0 |
-2.8 | -2.0 |
-2.5 | -2.0 |
0 | 0 |
+inf |
+inf |
-inf |
-inf |
NaN |
NaN |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
CEILING
CEILING(X)
Description
Synonym of CEIL(X)
COS
COS(X)
Description
Computes the cosine of X where X is specified in radians. Never fails.
X | COS(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
COSH
COSH(X)
Description
Computes the hyperbolic cosine of X where X is specified in radians. Generates an error if overflow occurs.
X | COSH(X) |
---|---|
+inf |
+inf |
-inf |
+inf |
NaN |
NaN |
COSINE_DISTANCE
COSINE_DISTANCE(vector1, vector2)
Description
Computes the cosine distance between two vectors.
Definitions
vector1
: A vector that is represented by anARRAY<T>
value or a sparse vector that is represented by anARRAY<STRUCT<dimension,magnitude>>
value.vector2
: A vector that is represented by anARRAY<T>
value or a sparse vector that is represented by anARRAY<STRUCT<dimension,magnitude>>
value.
Details
ARRAY<T>
can be used to represent a vector. Each zero-based index in this array represents a dimension. The value for each element in this array represents a magnitude.T
can represent the following and must be the same for both vectors:FLOAT64
In the following example vector, there are four dimensions. The magnitude is
10.0
for dimension0
,55.0
for dimension1
,40.0
for dimension2
, and34.0
for dimension3
:[10.0, 55.0, 40.0, 34.0]
ARRAY<STRUCT<dimension,magnitude>>
can be used to represent a sparse vector. With a sparse vector, you only need to include dimension-magnitude pairs for non-zero magnitudes. If a magnitude isn't present in the sparse vector, the magnitude is implicitly understood to be zero.For example, if you have a vector with 10,000 dimensions, but only 10 dimensions have non-zero magnitudes, then the vector is a sparse vector. As a result, it's more efficient to describe a sparse vector by only mentioning its non-zero magnitudes.
In
ARRAY<STRUCT<dimension,magnitude>>
,STRUCT<dimension,magnitude>
represents a dimension-magnitude pair for each non-zero magnitude in a sparse vector. These parts need to be included for each dimension-magnitude pair:dimension
: ASTRING
orINT64
value that represents a dimension in a vector.magnitude
: AFLOAT64
value that represents a non-zero magnitude for a specific dimension in a vector.
You don't need to include empty dimension-magnitude pairs in a sparse vector. For example, the following sparse vector and non-sparse vector are equivalent:
-- sparse vector ARRAY<STRUCT<INT64, FLOAT64>> [(1, 10.0), (2: 30.0), (5, 40.0)]
-- vector ARRAY<FLOAT64> [0.0, 10.0, 30.0, 0.0, 0.0, 40.0]
In a sparse vector, dimension-magnitude pairs don't need to be in any particular order. The following sparse vectors are equivalent:
[('a', 10.0), ('b': 30.0), ('d': 40.0)]
[('d': 40.0), ('a', 10.0), ('b': 30.0)]
Both non-sparse vectors in this function must share the same dimensions, and if they don't, an error is produced.
A vector can't be a zero vector. A vector is a zero vector if it has no dimensions or all dimensions have a magnitude of
0
, such as[]
or[0.0, 0.0]
. If a zero vector is encountered, an error is produced.An error is produced if a magnitude in a vector is
NULL
.If a vector is
NULL
,NULL
is returned.
Return type
FLOAT64
Examples
In the following example, non-sparsevectors are used to compute the cosine distance:
SELECT COSINE_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results;
/*----------*
| results |
+----------+
| 0.016130 |
*----------*/
In the following example, sparse vectors are used to compute the cosine distance:
SELECT COSINE_DISTANCE(
[(1, 1.0), (2, 2.0)],
[(2, 4.0), (1, 3.0)]) AS results;
/*----------*
| results |
+----------+
| 0.016130 |
*----------*/
The ordering of numeric values in a vector doesn't impact the results produced by this function. For example these queries produce the same results even though the numeric values in each vector is in a different order:
SELECT COSINE_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results;
SELECT COSINE_DISTANCE([2.0, 1.0], [4.0, 3.0]) AS results;
SELECT COSINE_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results;
/*----------*
| results |
+----------+
| 0.016130 |
*----------*/
In the following example, the function can't compute cosine distance against the first vector, which is a zero vector:
-- ERROR
SELECT COSINE_DISTANCE([0.0, 0.0], [3.0, 4.0]) AS results;
-- ERROR
SELECT COSINE_DISTANCE([(1, 0.0), (2, 0.0)], [(1, 3.0), (2, 4.0)]) AS results;
Both non-sparse vectors must have the same dimensions. If not, an error is produced. In the following example, the first vector has two dimensions and the second vector has three:
-- ERROR
SELECT COSINE_DISTANCE([9.0, 7.0], [8.0, 4.0, 5.0]) AS results;
If you use sparse vectors and you repeat a dimension, an error is produced:
-- ERROR
SELECT COSINE_DISTANCE(
[(1, 9.0), (2, 7.0), (2, 8.0)], [(1, 8.0), (2, 4.0), (3, 5.0)]) AS results;
COT
COT(X)
Description
Computes the cotangent for the angle of X
, where X
is specified in radians.
X
can be any data type
that coerces to FLOAT64
.
Supports the SAFE.
prefix.
X | COT(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
0 |
Error |
NULL |
NULL |
Return Data Type
FLOAT64
Example
SELECT COT(1) AS a, SAFE.COT(0) AS b;
/*---------------------+------*
| a | b |
+---------------------+------+
| 0.64209261593433065 | NULL |
*---------------------+------*/
COTH
COTH(X)
Description
Computes the hyperbolic cotangent for the angle of X
, where X
is specified
in radians. X
can be any data type
that coerces to FLOAT64
.
Supports the SAFE.
prefix.
X | COTH(X) |
---|---|
+inf |
1 |
-inf |
-1 |
NaN |
NaN |
0 |
Error |
NULL |
NULL |
Return Data Type
FLOAT64
Example
SELECT COTH(1) AS a, SAFE.COTH(0) AS b;
/*----------------+------*
| a | b |
+----------------+------+
| 1.313035285499 | NULL |
*----------------+------*/
CSC
CSC(X)
Description
Computes the cosecant of the input angle, which is in radians.
X
can be any data type
that coerces to FLOAT64
.
Supports the SAFE.
prefix.
X | CSC(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
0 |
Error |
NULL |
NULL |
Return Data Type
FLOAT64
Example
SELECT CSC(100) AS a, CSC(-1) AS b, SAFE.CSC(0) AS c;
/*----------------+-----------------+------*
| a | b | c |
+----------------+-----------------+------+
| -1.97485753142 | -1.188395105778 | NULL |
*----------------+-----------------+------*/
CSCH
CSCH(X)
Description
Computes the hyperbolic cosecant of the input angle, which is in radians.
X
can be any data type
that coerces to FLOAT64
.
Supports the SAFE.
prefix.
X | CSCH(X) |
---|---|
+inf |
0 |
-inf |
0 |
NaN |
NaN |
0 |
Error |
NULL |
NULL |
Return Data Type
FLOAT64
Example
SELECT CSCH(0.5) AS a, CSCH(-2) AS b, SAFE.CSCH(0) AS c;
/*----------------+----------------+------*
| a | b | c |
+----------------+----------------+------+
| 1.919034751334 | -0.27572056477 | NULL |
*----------------+----------------+------*/
DIV
DIV(X, Y)
Description
Returns the result of integer division of X by Y. Division by zero returns an error. Division by -1 may overflow.
X | Y | DIV(X, Y) |
---|---|---|
20 | 4 | 5 |
12 | -7 | -1 |
20 | 3 | 6 |
0 | 20 | 0 |
20 | 0 | Error |
Return Data Type
The return data type is determined by the argument types with the following table.
INPUT | INT64 | NUMERIC | BIGNUMERIC |
---|---|---|---|
INT64 | INT64 | NUMERIC | BIGNUMERIC |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC |
EXP
EXP(X)
Description
Computes e to the power of X, also called the natural exponential function. If the result underflows, this function returns a zero. Generates an error if the result overflows.
X | EXP(X) |
---|---|
0.0 | 1.0 |
+inf |
+inf |
-inf |
0.0 |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
EUCLIDEAN_DISTANCE
EUCLIDEAN_DISTANCE(vector1, vector2)
Description
Computes the Euclidean distance between two vectors.
Definitions
vector1
: A vector that is represented by anARRAY<T>
value or a sparse vector that is represented by anARRAY<STRUCT<dimension,magnitude>>
value.vector2
: A vector that is represented by anARRAY<T>
value or a sparse vector that is represented by anARRAY<STRUCT<dimension,magnitude>>
value.
Details
ARRAY<T>
can be used to represent a vector. Each zero-based index in this array represents a dimension. The value for each element in this array represents a magnitude.T
can represent the following and must be the same for both vectors:FLOAT64
In the following example vector, there are four dimensions. The magnitude is
10.0
for dimension0
,55.0
for dimension1
,40.0
for dimension2
, and34.0
for dimension3
:[10.0, 55.0, 40.0, 34.0]
ARRAY<STRUCT<dimension,magnitude>>
can be used to represent a sparse vector. With a sparse vector, you only need to include dimension-magnitude pairs for non-zero magnitudes. If a magnitude isn't present in the sparse vector, the magnitude is implicitly understood to be zero.For example, if you have a vector with 10,000 dimensions, but only 10 dimensions have non-zero magnitudes, then the vector is a sparse vector. As a result, it's more efficient to describe a sparse vector by only mentioning its non-zero magnitudes.
In
ARRAY<STRUCT<dimension,magnitude>>
,STRUCT<dimension,magnitude>
represents a dimension-magnitude pair for each non-zero magnitude in a sparse vector. These parts need to be included for each dimension-magnitude pair:dimension
: ASTRING
orINT64
value that represents a dimension in a vector.magnitude
: AFLOAT64
value that represents a non-zero magnitude for a specific dimension in a vector.
You don't need to include empty dimension-magnitude pairs in a sparse vector. For example, the following sparse vector and non-sparse vector are equivalent:
-- sparse vector ARRAY<STRUCT<INT64, FLOAT64>> [(1, 10.0), (2: 30.0), (5, 40.0)]
-- vector ARRAY<FLOAT64> [0.0, 10.0, 30.0, 0.0, 0.0, 40.0]
In a sparse vector, dimension-magnitude pairs don't need to be in any particular order. The following sparse vectors are equivalent:
[('a', 10.0), ('b': 30.0), ('d': 40.0)]
[('d': 40.0), ('a', 10.0), ('b': 30.0)]
Both non-sparse vectors in this function must share the same dimensions, and if they don't, an error is produced.
A vector can be a zero vector. A vector is a zero vector if it has no dimensions or all dimensions have a magnitude of
0
, such as[]
or[0.0, 0.0]
.An error is produced if a magnitude in a vector is
NULL
.If a vector is
NULL
,NULL
is returned.
Return type
FLOAT64
Examples
In the following example, non-sparse vectors are used to compute the Euclidean distance:
SELECT EUCLIDEAN_DISTANCE([1.0, 2.0], [3.0, 4.0]) AS results;
/*----------*
| results |
+----------+
| 2.828 |
*----------*/
In the following example, sparse vectors are used to compute the Euclidean distance:
SELECT EUCLIDEAN_DISTANCE(
[(1, 1.0), (2, 2.0)],
[(2, 4.0), (1, 3.0)]) AS results;
/*----------*
| results |
+----------+
| 2.828 |
*----------*/
The ordering of magnitudes in a vector doesn't impact the results produced by this function. For example these queries produce the same results even though the magnitudes in each vector is in a different order:
SELECT EUCLIDEAN_DISTANCE([1.0, 2.0], [3.0, 4.0]);
SELECT EUCLIDEAN_DISTANCE([2.0, 1.0], [4.0, 3.0]);
SELECT EUCLIDEAN_DISTANCE([(1, 1.0), (2, 2.0)], [(1, 3.0), (2, 4.0)]) AS results;
/*----------*
| results |
+----------+
| 2.828 |
*----------*/
Both non-sparse vectors must have the same dimensions. If not, an error is produced. In the following example, the first vector has two dimensions and the second vector has three:
-- ERROR
SELECT EUCLIDEAN_DISTANCE([9.0, 7.0], [8.0, 4.0, 5.0]) AS results;
If you use sparse vectors and you repeat a dimension, an error is produced:
-- ERROR
SELECT EUCLIDEAN_DISTANCE(
[(1, 9.0), (2, 7.0), (2, 8.0)], [(1, 8.0), (2, 4.0), (3, 5.0)]) AS results;
FLOOR
FLOOR(X)
Description
Returns the largest integral value that is not greater than X.
X | FLOOR(X) |
---|---|
2.0 | 2.0 |
2.3 | 2.0 |
2.8 | 2.0 |
2.5 | 2.0 |
-2.3 | -3.0 |
-2.8 | -3.0 |
-2.5 | -3.0 |
0 | 0 |
+inf |
+inf |
-inf |
-inf |
NaN |
NaN |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
GREATEST
GREATEST(X1,...,XN)
Description
Returns the greatest value among X1,...,XN
. If any argument is NULL
, returns
NULL
. Otherwise, in the case of floating-point arguments, if any argument is
NaN
, returns NaN
. In all other cases, returns the value among X1,...,XN
that has the greatest value according to the ordering used by the ORDER BY
clause. The arguments X1, ..., XN
must be coercible to a common supertype, and
the supertype must support ordering.
X1,...,XN | GREATEST(X1,...,XN) |
---|---|
3,5,1 | 5 |
This function supports specifying collation.
Return Data Types
Data type of the input values.
IEEE_DIVIDE
IEEE_DIVIDE(X, Y)
Description
Divides X by Y; this function never fails. Returns
FLOAT64
. Unlike the division operator (/),
this function does not generate errors for division by zero or overflow.
X | Y | IEEE_DIVIDE(X, Y) |
---|---|---|
20.0 | 4.0 | 5.0 |
0.0 | 25.0 | 0.0 |
25.0 | 0.0 | +inf |
-25.0 | 0.0 | -inf |
0.0 | 0.0 | NaN |
0.0 | NaN |
NaN |
NaN |
0.0 | NaN |
+inf |
+inf |
NaN |
-inf |
-inf |
NaN |
IS_INF
IS_INF(X)
Description
Returns TRUE
if the value is positive or negative infinity.
X | IS_INF(X) |
---|---|
+inf |
TRUE |
-inf |
TRUE |
25 | FALSE |
IS_NAN
IS_NAN(X)
Description
Returns TRUE
if the value is a NaN
value.
X | IS_NAN(X) |
---|---|
NaN |
TRUE |
25 | FALSE |
LEAST
LEAST(X1,...,XN)
Description
Returns the least value among X1,...,XN
. If any argument is NULL
, returns
NULL
. Otherwise, in the case of floating-point arguments, if any argument is
NaN
, returns NaN
. In all other cases, returns the value among X1,...,XN
that has the least value according to the ordering used by the ORDER BY
clause. The arguments X1, ..., XN
must be coercible to a common supertype, and
the supertype must support ordering.
X1,...,XN | LEAST(X1,...,XN) |
---|---|
3,5,1 | 1 |
This function supports specifying collation.
Return Data Types
Data type of the input values.
LN
LN(X)
Description
Computes the natural logarithm of X. Generates an error if X is less than or equal to zero.
X | LN(X) |
---|---|
1.0 | 0.0 |
+inf |
+inf |
X < 0 |
Error |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
LOG
LOG(X [, Y])
Description
If only X is present, LOG
is a synonym of LN
. If Y is also present,
LOG
computes the logarithm of X to base Y.
X | Y | LOG(X, Y) |
---|---|---|
100.0 | 10.0 | 2.0 |
-inf |
Any value | NaN |
Any value | +inf |
NaN |
+inf |
0.0 < Y < 1.0 | -inf |
+inf |
Y > 1.0 | +inf |
X <= 0 | Any value | Error |
Any value | Y <= 0 | Error |
Any value | 1.0 | Error |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
INT64 | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
LOG10
LOG10(X)
Description
Similar to LOG
, but computes logarithm to base 10.
X | LOG10(X) |
---|---|
100.0 | 2.0 |
-inf |
NaN |
+inf |
+inf |
X <= 0 | Error |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
MOD
MOD(X, Y)
Description
Modulo function: returns the remainder of the division of X by Y. Returned value has the same sign as X. An error is generated if Y is 0.
X | Y | MOD(X, Y) |
---|---|---|
25 | 12 | 1 |
25 | 0 | Error |
Return Data Type
The return data type is determined by the argument types with the following table.
INPUT | INT64 | NUMERIC | BIGNUMERIC |
---|---|---|---|
INT64 | INT64 | NUMERIC | BIGNUMERIC |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC |
POW
POW(X, Y)
Description
Returns the value of X raised to the power of Y. If the result underflows and is not representable, then the function returns a value of zero.
X | Y | POW(X, Y) |
---|---|---|
2.0 | 3.0 | 8.0 |
1.0 | Any value including NaN |
1.0 |
Any value including NaN |
0 | 1.0 |
-1.0 | +inf |
1.0 |
-1.0 | -inf |
1.0 |
ABS(X) < 1 | -inf |
+inf |
ABS(X) > 1 | -inf |
0.0 |
ABS(X) < 1 | +inf |
0.0 |
ABS(X) > 1 | +inf |
+inf |
-inf |
Y < 0 | 0.0 |
-inf |
Y > 0 | -inf if Y is an odd integer, +inf otherwise |
+inf |
Y < 0 | 0 |
+inf |
Y > 0 | +inf |
Finite value < 0 | Non-integer | Error |
0 | Finite value < 0 | Error |
Return Data Type
The return data type is determined by the argument types with the following table.
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
INT64 | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
POWER
POWER(X, Y)
Description
Synonym of POW(X, Y)
.
RAND
RAND()
Description
Generates a pseudo-random value of type FLOAT64
in
the range of [0, 1), inclusive of 0 and exclusive of 1.
RANGE_BUCKET
RANGE_BUCKET(point, boundaries_array)
Description
RANGE_BUCKET
scans through a sorted array and returns the 0-based position
of the point's upper bound. This can be useful if you need to group your data to
build partitions, histograms, business-defined rules, and more.
RANGE_BUCKET
follows these rules:
If the point exists in the array, returns the index of the next larger value.
RANGE_BUCKET(20, [0, 10, 20, 30, 40]) -- 3 is return value RANGE_BUCKET(20, [0, 10, 20, 20, 40, 40]) -- 4 is return value
If the point does not exist in the array, but it falls between two values, returns the index of the larger value.
RANGE_BUCKET(25, [0, 10, 20, 30, 40]) -- 3 is return value
If the point is smaller than the first value in the array, returns 0.
RANGE_BUCKET(-10, [5, 10, 20, 30, 40]) -- 0 is return value
If the point is greater than or equal to the last value in the array, returns the length of the array.
RANGE_BUCKET(80, [0, 10, 20, 30, 40]) -- 5 is return value
If the array is empty, returns 0.
RANGE_BUCKET(80, []) -- 0 is return value
If the point is
NULL
orNaN
, returnsNULL
.RANGE_BUCKET(NULL, [0, 10, 20, 30, 40]) -- NULL is return value
The data type for the point and array must be compatible.
RANGE_BUCKET('a', ['a', 'b', 'c', 'd']) -- 1 is return value RANGE_BUCKET(1.2, [1, 1.2, 1.4, 1.6]) -- 2 is return value RANGE_BUCKET(1.2, [1, 2, 4, 6]) -- execution failure
Execution failure occurs when:
The array has a
NaN
orNULL
value in it.RANGE_BUCKET(80, [NULL, 10, 20, 30, 40]) -- execution failure
The array is not sorted in ascending order.
RANGE_BUCKET(30, [10, 30, 20, 40, 50]) -- execution failure
Parameters
point
: A generic value.boundaries_array
: A generic array of values.
Return Value
INT64
Examples
In a table called students
, check to see how many records would
exist in each age_group
bucket, based on a student's age:
- age_group 0 (age < 10)
- age_group 1 (age >= 10, age < 20)
- age_group 2 (age >= 20, age < 30)
- age_group 3 (age >= 30)
WITH students AS
(
SELECT 9 AS age UNION ALL
SELECT 20 AS age UNION ALL
SELECT 25 AS age UNION ALL
SELECT 31 AS age UNION ALL
SELECT 32 AS age UNION ALL
SELECT 33 AS age
)
SELECT RANGE_BUCKET(age, [10, 20, 30]) AS age_group, COUNT(*) AS count
FROM students
GROUP BY 1
/*--------------+-------*
| age_group | count |
+--------------+-------+
| 0 | 1 |
| 2 | 2 |
| 3 | 3 |
*--------------+-------*/
ROUND
ROUND(X [, N [, rounding_mode]])
Description
If only X is present, rounds X to the nearest integer. If N is present, rounds X to N decimal places after the decimal point. If N is negative, rounds off digits to the left of the decimal point. Rounds halfway cases away from zero. Generates an error if overflow occurs.
If X is a NUMERIC
or BIGNUMERIC
type, then you can
explicitly set rounding_mode
to one of the following:
"ROUND_HALF_AWAY_FROM_ZERO"
: (Default) Rounds halfway cases away from zero."ROUND_HALF_EVEN"
: Rounds halfway cases towards the nearest even digit.
If you set the rounding_mode
and X is not a NUMERIC
or BIGNUMERIC
type,
then the function generates an error.
Expression | Return Value |
---|---|
ROUND(2.0) |
2.0 |
ROUND(2.3) |
2.0 |
ROUND(2.8) |
3.0 |
ROUND(2.5) |
3.0 |
ROUND(-2.3) |
-2.0 |
ROUND(-2.8) |
-3.0 |
ROUND(-2.5) |
-3.0 |
ROUND(0) |
0 |
ROUND(+inf) |
+inf |
ROUND(-inf) |
-inf |
ROUND(NaN) |
NaN |
ROUND(123.7, -1) |
120.0 |
ROUND(1.235, 2) |
1.24 |
ROUND(NUMERIC "2.25", 1, "ROUND_HALF_EVEN") |
2.2 |
ROUND(NUMERIC "2.35", 1, "ROUND_HALF_EVEN") |
2.4 |
ROUND(NUMERIC "2.251", 1, "ROUND_HALF_EVEN") |
2.3 |
ROUND(NUMERIC "-2.5", 0, "ROUND_HALF_EVEN") |
-2 |
ROUND(NUMERIC "2.5", 0, "ROUND_HALF_AWAY_FROM_ZERO") |
3 |
ROUND(NUMERIC "-2.5", 0, "ROUND_HALF_AWAY_FROM_ZERO") |
-3 |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
SAFE_ADD
SAFE_ADD(X, Y)
Description
Equivalent to the addition operator (+
), but returns
NULL
if overflow occurs.
X | Y | SAFE_ADD(X, Y) |
---|---|---|
5 | 4 | 9 |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
INT64 | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
SAFE_DIVIDE
SAFE_DIVIDE(X, Y)
Description
Equivalent to the division operator (X / Y
), but returns
NULL
if an error occurs, such as a division by zero error.
X | Y | SAFE_DIVIDE(X, Y) |
---|---|---|
20 | 4 | 5 |
0 | 20 | 0 |
20 | 0 | NULL |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
INT64 | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
SAFE_MULTIPLY
SAFE_MULTIPLY(X, Y)
Description
Equivalent to the multiplication operator (*
), but returns
NULL
if overflow occurs.
X | Y | SAFE_MULTIPLY(X, Y) |
---|---|---|
20 | 4 | 80 |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
INT64 | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
SAFE_NEGATE
SAFE_NEGATE(X)
Description
Equivalent to the unary minus operator (-
), but returns
NULL
if overflow occurs.
X | SAFE_NEGATE(X) |
---|---|
+1 | -1 |
-1 | +1 |
0 | 0 |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
SAFE_SUBTRACT
SAFE_SUBTRACT(X, Y)
Description
Returns the result of Y subtracted from X.
Equivalent to the subtraction operator (-
), but returns
NULL
if overflow occurs.
X | Y | SAFE_SUBTRACT(X, Y) |
---|---|---|
5 | 4 | 1 |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
INT64 | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
NUMERIC | NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
SEC
SEC(X)
Description
Computes the secant for the angle of X
, where X
is specified in radians.
X
can be any data type
that coerces to FLOAT64
.
X | SEC(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
NULL |
NULL |
Return Data Type
FLOAT64
Example
SELECT SEC(100) AS a, SEC(-1) AS b;
/*----------------+---------------*
| a | b |
+----------------+---------------+
| 1.159663822905 | 1.85081571768 |
*----------------+---------------*/
SECH
SECH(X)
Description
Computes the hyperbolic secant for the angle of X
, where X
is specified
in radians. X
can be any data type
that coerces to FLOAT64
.
Never produces an error.
X | SECH(X) |
---|---|
+inf |
0 |
-inf |
0 |
NaN |
NaN |
NULL |
NULL |
Return Data Type
FLOAT64
Example
SELECT SECH(0.5) AS a, SECH(-2) AS b, SECH(100) AS c;
/*----------------+----------------+---------------------*
| a | b | c |
+----------------+----------------+---------------------+
| 0.88681888397 | 0.265802228834 | 7.4401519520417E-44 |
*----------------+----------------+---------------------*/
SIGN
SIGN(X)
Description
Returns -1
, 0
, or +1
for negative, zero and positive arguments
respectively. For floating point arguments, this function does not distinguish
between positive and negative zero.
X | SIGN(X) |
---|---|
25 | +1 |
0 | 0 |
-25 | -1 |
NaN | NaN |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
SIN
SIN(X)
Description
Computes the sine of X where X is specified in radians. Never fails.
X | SIN(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
SINH
SINH(X)
Description
Computes the hyperbolic sine of X where X is specified in radians. Generates an error if overflow occurs.
X | SINH(X) |
---|---|
+inf |
+inf |
-inf |
-inf |
NaN |
NaN |
SQRT
SQRT(X)
Description
Computes the square root of X. Generates an error if X is less than 0.
X | SQRT(X) |
---|---|
25.0 |
5.0 |
+inf |
+inf |
X < 0 |
Error |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
TAN
TAN(X)
Description
Computes the tangent of X where X is specified in radians. Generates an error if overflow occurs.
X | TAN(X) |
---|---|
+inf |
NaN |
-inf |
NaN |
NaN |
NaN |
TANH
TANH(X)
Description
Computes the hyperbolic tangent of X where X is specified in radians. Does not fail.
X | TANH(X) |
---|---|
+inf |
1.0 |
-inf |
-1.0 |
NaN |
NaN |
TRUNC
TRUNC(X [, N])
Description
If only X is present, TRUNC
rounds X to the nearest integer whose absolute
value is not greater than the absolute value of X. If N is also present, TRUNC
behaves like ROUND(X, N)
, but always rounds towards zero and never overflows.
X | TRUNC(X) |
---|---|
2.0 | 2.0 |
2.3 | 2.0 |
2.8 | 2.0 |
2.5 | 2.0 |
-2.3 | -2.0 |
-2.8 | -2.0 |
-2.5 | -2.0 |
0 | 0 |
+inf |
+inf |
-inf |
-inf |
NaN |
NaN |
Return Data Type
INPUT | INT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|---|
OUTPUT | FLOAT64 | NUMERIC | BIGNUMERIC | FLOAT64 |
Navigation functions
GoogleSQL for BigQuery supports navigation functions. Navigation functions are a subset of window functions. To create a window function call and learn about the syntax for window functions, see Window function_calls.
Navigation functions generally compute some
value_expression
over a different row in the window frame from the
current row. The OVER
clause syntax varies across navigation functions.
For all navigation functions, the result data type is the same type as
value_expression
.
Function list
Name | Summary |
---|---|
FIRST_VALUE
|
Gets a value for the first row in the current window frame. |
LAG
|
Gets a value for a preceding row. |
LAST_VALUE
|
Gets a value for the last row in the current window frame. |
LEAD
|
Gets a value for a subsequent row. |
NTH_VALUE
|
Gets a value for the Nth row of the current window frame. |
PERCENTILE_CONT
|
Computes the specified percentile for a value, using linear interpolation. |
PERCENTILE_DISC
|
Computes the specified percentile for a discrete value. |
FIRST_VALUE
FIRST_VALUE (value_expression [{RESPECT | IGNORE} NULLS])
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
[ window_frame_clause ]
Description
Returns the value of the value_expression
for the first row in the current
window frame.
This function includes NULL
values in the calculation unless IGNORE NULLS
is
present. If IGNORE NULLS
is present, the function excludes NULL
values from
the calculation.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
value_expression
can be any data type that an expression can return.
Return Data Type
Same type as value_expression
.
Examples
The following example computes the fastest time for each division.
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
FORMAT_TIMESTAMP('%X', finish_time) AS finish_time,
division,
FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time,
TIMESTAMP_DIFF(finish_time, fastest_time, SECOND) AS delta_in_seconds
FROM (
SELECT name,
finish_time,
division,
FIRST_VALUE(finish_time)
OVER (PARTITION BY division ORDER BY finish_time ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS fastest_time
FROM finishers);
/*-----------------+-------------+----------+--------------+------------------*
| name | finish_time | division | fastest_time | delta_in_seconds |
+-----------------+-------------+----------+--------------+------------------+
| Carly Forte | 03:08:58 | F25-29 | 03:08:58 | 0 |
| Sophia Liu | 02:51:45 | F30-34 | 02:51:45 | 0 |
| Nikki Leith | 02:59:01 | F30-34 | 02:51:45 | 436 |
| Jen Edwards | 03:06:36 | F30-34 | 02:51:45 | 891 |
| Meghan Lederer | 03:07:41 | F30-34 | 02:51:45 | 956 |
| Lauren Reasoner | 03:10:14 | F30-34 | 02:51:45 | 1109 |
| Lisa Stelzner | 02:54:11 | F35-39 | 02:54:11 | 0 |
| Lauren Matthews | 03:01:17 | F35-39 | 02:54:11 | 426 |
| Desiree Berry | 03:05:42 | F35-39 | 02:54:11 | 691 |
| Suzy Slane | 03:06:24 | F35-39 | 02:54:11 | 733 |
*-----------------+-------------+----------+--------------+------------------*/
LAG
LAG (value_expression[, offset [, default_expression]])
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
Description
Returns the value of the value_expression
on a preceding row. Changing the
offset
value changes which preceding row is returned; the default value is
1
, indicating the previous row in the window frame. An error occurs if
offset
is NULL or a negative value.
The optional default_expression
is used if there isn't a row in the window
frame at the specified offset. This expression must be a constant expression and
its type must be implicitly coercible to the type of value_expression
. If left
unspecified, default_expression
defaults to NULL.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
value_expression
can be any data type that can be returned from an expression.offset
must be a non-negative integer literal or parameter.default_expression
must be compatible with the value expression type.
Return Data Type
Same type as value_expression
.
Examples
The following example illustrates a basic use of the LAG
function.
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
finish_time,
division,
LAG(name)
OVER (PARTITION BY division ORDER BY finish_time ASC) AS preceding_runner
FROM finishers;
/*-----------------+-------------+----------+------------------*
| name | finish_time | division | preceding_runner |
+-----------------+-------------+----------+------------------+
| Carly Forte | 03:08:58 | F25-29 | NULL |
| Sophia Liu | 02:51:45 | F30-34 | NULL |
| Nikki Leith | 02:59:01 | F30-34 | Sophia Liu |
| Jen Edwards | 03:06:36 | F30-34 | Nikki Leith |
| Meghan Lederer | 03:07:41 | F30-34 | Jen Edwards |
| Lauren Reasoner | 03:10:14 | F30-34 | Meghan Lederer |
| Lisa Stelzner | 02:54:11 | F35-39 | NULL |
| Lauren Matthews | 03:01:17 | F35-39 | Lisa Stelzner |
| Desiree Berry | 03:05:42 | F35-39 | Lauren Matthews |
| Suzy Slane | 03:06:24 | F35-39 | Desiree Berry |
*-----------------+-------------+----------+------------------*/
This next example uses the optional offset
parameter.
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
finish_time,
division,
LAG(name, 2)
OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_ahead
FROM finishers;
/*-----------------+-------------+----------+-------------------*
| name | finish_time | division | two_runners_ahead |
+-----------------+-------------+----------+-------------------+
| Carly Forte | 03:08:58 | F25-29 | NULL |
| Sophia Liu | 02:51:45 | F30-34 | NULL |
| Nikki Leith | 02:59:01 | F30-34 | NULL |
| Jen Edwards | 03:06:36 | F30-34 | Sophia Liu |
| Meghan Lederer | 03:07:41 | F30-34 | Nikki Leith |
| Lauren Reasoner | 03:10:14 | F30-34 | Jen Edwards |
| Lisa Stelzner | 02:54:11 | F35-39 | NULL |
| Lauren Matthews | 03:01:17 | F35-39 | NULL |
| Desiree Berry | 03:05:42 | F35-39 | Lisa Stelzner |
| Suzy Slane | 03:06:24 | F35-39 | Lauren Matthews |
*-----------------+-------------+----------+-------------------*/
The following example replaces NULL values with a default value.
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
finish_time,
division,
LAG(name, 2, 'Nobody')
OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_ahead
FROM finishers;
/*-----------------+-------------+----------+-------------------*
| name | finish_time | division | two_runners_ahead |
+-----------------+-------------+----------+-------------------+
| Carly Forte | 03:08:58 | F25-29 | Nobody |
| Sophia Liu | 02:51:45 | F30-34 | Nobody |
| Nikki Leith | 02:59:01 | F30-34 | Nobody |
| Jen Edwards | 03:06:36 | F30-34 | Sophia Liu |
| Meghan Lederer | 03:07:41 | F30-34 | Nikki Leith |
| Lauren Reasoner | 03:10:14 | F30-34 | Jen Edwards |
| Lisa Stelzner | 02:54:11 | F35-39 | Nobody |
| Lauren Matthews | 03:01:17 | F35-39 | Nobody |
| Desiree Berry | 03:05:42 | F35-39 | Lisa Stelzner |
| Suzy Slane | 03:06:24 | F35-39 | Lauren Matthews |
*-----------------+-------------+----------+-------------------*/
LAST_VALUE
LAST_VALUE (value_expression [{RESPECT | IGNORE} NULLS])
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
[ window_frame_clause ]
Description
Returns the value of the value_expression
for the last row in the current
window frame.
This function includes NULL
values in the calculation unless IGNORE NULLS
is
present. If IGNORE NULLS
is present, the function excludes NULL
values from
the calculation.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
value_expression
can be any data type that an expression can return.
Return Data Type
Same type as value_expression
.
Examples
The following example computes the slowest time for each division.
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
FORMAT_TIMESTAMP('%X', finish_time) AS finish_time,
division,
FORMAT_TIMESTAMP('%X', slowest_time) AS slowest_time,
TIMESTAMP_DIFF(slowest_time, finish_time, SECOND) AS delta_in_seconds
FROM (
SELECT name,
finish_time,
division,
LAST_VALUE(finish_time)
OVER (PARTITION BY division ORDER BY finish_time ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS slowest_time
FROM finishers);
/*-----------------+-------------+----------+--------------+------------------*
| name | finish_time | division | slowest_time | delta_in_seconds |
+-----------------+-------------+----------+--------------+------------------+
| Carly Forte | 03:08:58 | F25-29 | 03:08:58 | 0 |
| Sophia Liu | 02:51:45 | F30-34 | 03:10:14 | 1109 |
| Nikki Leith | 02:59:01 | F30-34 | 03:10:14 | 673 |
| Jen Edwards | 03:06:36 | F30-34 | 03:10:14 | 218 |
| Meghan Lederer | 03:07:41 | F30-34 | 03:10:14 | 153 |
| Lauren Reasoner | 03:10:14 | F30-34 | 03:10:14 | 0 |
| Lisa Stelzner | 02:54:11 | F35-39 | 03:06:24 | 733 |
| Lauren Matthews | 03:01:17 | F35-39 | 03:06:24 | 307 |
| Desiree Berry | 03:05:42 | F35-39 | 03:06:24 | 42 |
| Suzy Slane | 03:06:24 | F35-39 | 03:06:24 | 0 |
*-----------------+-------------+----------+--------------+------------------*/
LEAD
LEAD (value_expression[, offset [, default_expression]])
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
Description
Returns the value of the value_expression
on a subsequent row. Changing the
offset
value changes which subsequent row is returned; the default value is
1
, indicating the next row in the window frame. An error occurs if offset
is
NULL or a negative value.
The optional default_expression
is used if there isn't a row in the window
frame at the specified offset. This expression must be a constant expression and
its type must be implicitly coercible to the type of value_expression
. If left
unspecified, default_expression
defaults to NULL.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
value_expression
can be any data type that can be returned from an expression.offset
must be a non-negative integer literal or parameter.default_expression
must be compatible with the value expression type.
Return Data Type
Same type as value_expression
.
Examples
The following example illustrates a basic use of the LEAD
function.
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
finish_time,
division,
LEAD(name)
OVER (PARTITION BY division ORDER BY finish_time ASC) AS followed_by
FROM finishers;
/*-----------------+-------------+----------+-----------------*
| name | finish_time | division | followed_by |
+-----------------+-------------+----------+-----------------+
| Carly Forte | 03:08:58 | F25-29 | NULL |
| Sophia Liu | 02:51:45 | F30-34 | Nikki Leith |
| Nikki Leith | 02:59:01 | F30-34 | Jen Edwards |
| Jen Edwards | 03:06:36 | F30-34 | Meghan Lederer |
| Meghan Lederer | 03:07:41 | F30-34 | Lauren Reasoner |
| Lauren Reasoner | 03:10:14 | F30-34 | NULL |
| Lisa Stelzner | 02:54:11 | F35-39 | Lauren Matthews |
| Lauren Matthews | 03:01:17 | F35-39 | Desiree Berry |
| Desiree Berry | 03:05:42 | F35-39 | Suzy Slane |
| Suzy Slane | 03:06:24 | F35-39 | NULL |
*-----------------+-------------+----------+-----------------*/
This next example uses the optional offset
parameter.
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
finish_time,
division,
LEAD(name, 2)
OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_back
FROM finishers;
/*-----------------+-------------+----------+------------------*
| name | finish_time | division | two_runners_back |
+-----------------+-------------+----------+------------------+
| Carly Forte | 03:08:58 | F25-29 | NULL |
| Sophia Liu | 02:51:45 | F30-34 | Jen Edwards |
| Nikki Leith | 02:59:01 | F30-34 | Meghan Lederer |
| Jen Edwards | 03:06:36 | F30-34 | Lauren Reasoner |
| Meghan Lederer | 03:07:41 | F30-34 | NULL |
| Lauren Reasoner | 03:10:14 | F30-34 | NULL |
| Lisa Stelzner | 02:54:11 | F35-39 | Desiree Berry |
| Lauren Matthews | 03:01:17 | F35-39 | Suzy Slane |
| Desiree Berry | 03:05:42 | F35-39 | NULL |
| Suzy Slane | 03:06:24 | F35-39 | NULL |
*-----------------+-------------+----------+------------------*/
The following example replaces NULL values with a default value.
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
finish_time,
division,
LEAD(name, 2, 'Nobody')
OVER (PARTITION BY division ORDER BY finish_time ASC) AS two_runners_back
FROM finishers;
/*-----------------+-------------+----------+------------------*
| name | finish_time | division | two_runners_back |
+-----------------+-------------+----------+------------------+
| Carly Forte | 03:08:58 | F25-29 | Nobody |
| Sophia Liu | 02:51:45 | F30-34 | Jen Edwards |
| Nikki Leith | 02:59:01 | F30-34 | Meghan Lederer |
| Jen Edwards | 03:06:36 | F30-34 | Lauren Reasoner |
| Meghan Lederer | 03:07:41 | F30-34 | Nobody |
| Lauren Reasoner | 03:10:14 | F30-34 | Nobody |
| Lisa Stelzner | 02:54:11 | F35-39 | Desiree Berry |
| Lauren Matthews | 03:01:17 | F35-39 | Suzy Slane |
| Desiree Berry | 03:05:42 | F35-39 | Nobody |
| Suzy Slane | 03:06:24 | F35-39 | Nobody |
*-----------------+-------------+----------+------------------*/
NTH_VALUE
NTH_VALUE (value_expression, constant_integer_expression [{RESPECT | IGNORE} NULLS])
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
[ window_frame_clause ]
Description
Returns the value of value_expression
at the Nth row of the current window
frame, where Nth is defined by constant_integer_expression
. Returns NULL if
there is no such row.
This function includes NULL
values in the calculation unless IGNORE NULLS
is
present. If IGNORE NULLS
is present, the function excludes NULL
values from
the calculation.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
value_expression
can be any data type that can be returned from an expression.constant_integer_expression
can be any constant expression that returns an integer.
Return Data Type
Same type as value_expression
.
Examples
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 3:07:41', 'F30-34'
UNION ALL SELECT 'Carly Forte', TIMESTAMP '2016-10-18 3:08:58', 'F25-29'
UNION ALL SELECT 'Lauren Reasoner', TIMESTAMP '2016-10-18 3:10:14', 'F30-34')
SELECT name,
FORMAT_TIMESTAMP('%X', finish_time) AS finish_time,
division,
FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time,
FORMAT_TIMESTAMP('%X', second_fastest) AS second_fastest
FROM (
SELECT name,
finish_time,
division,finishers,
FIRST_VALUE(finish_time)
OVER w1 AS fastest_time,
NTH_VALUE(finish_time, 2)
OVER w1 as second_fastest
FROM finishers
WINDOW w1 AS (
PARTITION BY division ORDER BY finish_time ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING));
/*-----------------+-------------+----------+--------------+----------------*
| name | finish_time | division | fastest_time | second_fastest |
+-----------------+-------------+----------+--------------+----------------+
| Carly Forte | 03:08:58 | F25-29 | 03:08:58 | NULL |
| Sophia Liu | 02:51:45 | F30-34 | 02:51:45 | 02:59:01 |
| Nikki Leith | 02:59:01 | F30-34 | 02:51:45 | 02:59:01 |
| Jen Edwards | 03:06:36 | F30-34 | 02:51:45 | 02:59:01 |
| Meghan Lederer | 03:07:41 | F30-34 | 02:51:45 | 02:59:01 |
| Lauren Reasoner | 03:10:14 | F30-34 | 02:51:45 | 02:59:01 |
| Lisa Stelzner | 02:54:11 | F35-39 | 02:54:11 | 03:01:17 |
| Lauren Matthews | 03:01:17 | F35-39 | 02:54:11 | 03:01:17 |
| Desiree Berry | 03:05:42 | F35-39 | 02:54:11 | 03:01:17 |
| Suzy Slane | 03:06:24 | F35-39 | 02:54:11 | 03:01:17 |
*-----------------+-------------+----------+--------------+----------------*/
PERCENTILE_CONT
PERCENTILE_CONT (value_expression, percentile [{RESPECT | IGNORE} NULLS])
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
Description
Computes the specified percentile value for the value_expression, with linear interpolation.
This function ignores NULL
values if
RESPECT NULLS
is absent. If RESPECT NULLS
is present:
- Interpolation between two
NULL
values returnsNULL
. - Interpolation between a
NULL
value and a non-NULL
value returns the non-NULL
value.
To learn more about the OVER
clause and how to use it, see
Window function calls.
PERCENTILE_CONT
can be used with differential privacy. To learn more, see
Differentially private aggregate functions.
Supported Argument Types
value_expression
andpercentile
must have one of the following types:NUMERIC
BIGNUMERIC
FLOAT64
percentile
must be a literal in the range[0, 1]
.
Return Data Type
The return data type is determined by the argument types with the following table.
INPUT | NUMERIC | BIGNUMERIC | FLOAT64 |
---|---|---|---|
NUMERIC | NUMERIC | BIGNUMERIC | FLOAT64 |
BIGNUMERIC | BIGNUMERIC | BIGNUMERIC | FLOAT64 |
FLOAT64 | FLOAT64 | FLOAT64 | FLOAT64 |
Examples
The following example computes the value for some percentiles from a column of values while ignoring nulls.
SELECT
PERCENTILE_CONT(x, 0) OVER() AS min,
PERCENTILE_CONT(x, 0.01) OVER() AS percentile1,
PERCENTILE_CONT(x, 0.5) OVER() AS median,
PERCENTILE_CONT(x, 0.9) OVER() AS percentile90,
PERCENTILE_CONT(x, 1) OVER() AS max
FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1;
/*-----+-------------+--------+--------------+-----*
| min | percentile1 | median | percentile90 | max |
+-----+-------------+--------+--------------+-----+
| 0 | 0.03 | 1.5 | 2.7 | 3 |
*-----+-------------+--------+--------------+-----*/
The following example computes the value for some percentiles from a column of values while respecting nulls.
SELECT
PERCENTILE_CONT(x, 0 RESPECT NULLS) OVER() AS min,
PERCENTILE_CONT(x, 0.01 RESPECT NULLS) OVER() AS percentile1,
PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER() AS median,
PERCENTILE_CONT(x, 0.9 RESPECT NULLS) OVER() AS percentile90,
PERCENTILE_CONT(x, 1 RESPECT NULLS) OVER() AS max
FROM UNNEST([0, 3, NULL, 1, 2]) AS x LIMIT 1;
/*------+-------------+--------+--------------+-----*
| min | percentile1 | median | percentile90 | max |
+------+-------------+--------+--------------+-----+
| NULL | 0 | 1 | 2.6 | 3 |
*------+-------------+--------+--------------+-----*/
PERCENTILE_DISC
PERCENTILE_DISC (value_expression, percentile [{RESPECT | IGNORE} NULLS])
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
Description
Computes the specified percentile value for a discrete value_expression
. The
returned value is the first sorted value of value_expression
with cumulative
distribution greater than or equal to the given percentile
value.
This function ignores NULL
values unless
RESPECT NULLS
is present.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Supported Argument Types
value_expression
can be any orderable type.percentile
must be a literal in the range[0, 1]
, with one of the following types:NUMERIC
BIGNUMERIC
FLOAT64
Return Data Type
Same type as value_expression
.
Examples
The following example computes the value for some percentiles from a column of values while ignoring nulls.
SELECT
x,
PERCENTILE_DISC(x, 0) OVER() AS min,
PERCENTILE_DISC(x, 0.5) OVER() AS median,
PERCENTILE_DISC(x, 1) OVER() AS max
FROM UNNEST(['c', NULL, 'b', 'a']) AS x;
/*------+-----+--------+-----*
| x | min | median | max |
+------+-----+--------+-----+
| c | a | b | c |
| NULL | a | b | c |
| b | a | b | c |
| a | a | b | c |
*------+-----+--------+-----*/
The following example computes the value for some percentiles from a column of values while respecting nulls.
SELECT
x,
PERCENTILE_DISC(x, 0 RESPECT NULLS) OVER() AS min,
PERCENTILE_DISC(x, 0.5 RESPECT NULLS) OVER() AS median,
PERCENTILE_DISC(x, 1 RESPECT NULLS) OVER() AS max
FROM UNNEST(['c', NULL, 'b', 'a']) AS x;
/*------+------+--------+-----*
| x | min | median | max |
+------+------+--------+-----+
| c | NULL | a | c |
| NULL | NULL | a | c |
| b | NULL | a | c |
| a | NULL | a | c |
*------+------+--------+-----*/
Net functions
GoogleSQL for BigQuery supports the following Net functions.
Function list
Name | Summary |
---|---|
NET.HOST
|
Gets the hostname from a URL. |
NET.IP_FROM_STRING
|
Converts an IPv4 or IPv6 address from a STRING value to
a BYTES value in network byte order.
|
NET.IP_NET_MASK
|
Gets a network mask. |
NET.IP_TO_STRING
|
Converts an IPv4 or IPv6 address from a BYTES value in
network byte order to a STRING value.
|
NET.IP_TRUNC
|
Converts a BYTES IPv4 or IPv6 address in
network byte order to a BYTES subnet address.
|
NET.IPV4_FROM_INT64
|
Converts an IPv4 address from an INT64 value to a
BYTES value in network byte order.
|
NET.IPV4_TO_INT64
|
Converts an IPv4 address from a BYTES value in network
byte order to an INT64 value.
|
NET.PUBLIC_SUFFIX
|
Gets the public suffix from a URL. |
NET.REG_DOMAIN
|
Gets the registered or registrable domain from a URL. |
NET.SAFE_IP_FROM_STRING
|
Similar to the NET.IP_FROM_STRING , but returns
NULL instead of producing an error if the input is invalid.
|
NET.HOST
NET.HOST(url)
Description
Takes a URL as a STRING
value and returns the host. For best results, URL
values should comply with the format as defined by
RFC 3986. If the URL value does not comply
with RFC 3986 formatting, this function makes a best effort to parse the input
and return a relevant result. If the function cannot parse the input, it
returns NULL
.
Return Data Type
STRING
Example
SELECT
FORMAT("%T", input) AS input,
description,
FORMAT("%T", NET.HOST(input)) AS host,
FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix,
FORMAT("%T", NET.REG_DOMAIN(input)) AS domain
FROM (
SELECT "" AS input, "invalid input" AS description
UNION ALL SELECT "http://abc.xyz", "standard URL"
UNION ALL SELECT "//user:password@a.b:80/path?query",
"standard URL with relative scheme, port, path and query, but no public suffix"
UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host"
UNION ALL SELECT "http://例子.卷筒纸.中国", "standard URL with internationalized domain name"
UNION ALL SELECT " www.Example.Co.UK ",
"non-standard URL with spaces, upper case letters, and without scheme"
UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported"
);
input | description | host | suffix | domain |
---|---|---|---|---|
"" | invalid input | NULL | NULL | NULL |
"http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" |
"//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL |
"https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL |
"http://例子.卷筒纸.中国" | standard URL with internationalized domain name | "例子.卷筒纸.中国" | "中国" | "卷筒纸.中国" |
" www.Example.Co.UK " | non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK" | "Co.UK" | "Example.Co.UK" |
"mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL |
NET.IP_FROM_STRING
NET.IP_FROM_STRING(addr_str)
Description
Converts an IPv4 or IPv6 address from text (STRING) format to binary (BYTES) format in network byte order.
This function supports the following formats for addr_str
:
- IPv4: Dotted-quad format. For example,
10.1.2.3
. - IPv6: Colon-separated format. For example,
1234:5678:90ab:cdef:1234:5678:90ab:cdef
. For more examples, see the IP Version 6 Addressing Architecture.
This function does not support CIDR notation, such as 10.1.2.3/32
.
If this function receives a NULL
input, it returns NULL
. If the input is
considered invalid, an OUT_OF_RANGE
error occurs.
Return Data Type
BYTES
Example
SELECT
addr_str, FORMAT("%T", NET.IP_FROM_STRING(addr_str)) AS ip_from_string
FROM UNNEST([
'48.49.50.51',
'::1',
'3031:3233:3435:3637:3839:4041:4243:4445',
'::ffff:192.0.2.128'
]) AS addr_str;
/*---------------------------------------------------------------------------------------------------------------*
| addr_str | ip_from_string |
+---------------------------------------------------------------------------------------------------------------+
| 48.49.50.51 | b"0123" |
| ::1 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" |
| 3031:3233:3435:3637:3839:4041:4243:4445 | b"0123456789@ABCDE" |
| ::ffff:192.0.2.128 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" |
*---------------------------------------------------------------------------------------------------------------*/
NET.IP_NET_MASK
NET.IP_NET_MASK(num_output_bytes, prefix_length)
Description
Returns a network mask: a byte sequence with length equal to num_output_bytes
,
where the first prefix_length
bits are set to 1 and the other bits are set to
0. num_output_bytes
and prefix_length
are INT64.
This function throws an error if num_output_bytes
is not 4 (for IPv4) or 16
(for IPv6). It also throws an error if prefix_length
is negative or greater
than 8 * num_output_bytes
.
Return Data Type
BYTES
Example
SELECT x, y, FORMAT("%T", NET.IP_NET_MASK(x, y)) AS ip_net_mask
FROM UNNEST([
STRUCT(4 as x, 0 as y),
(4, 20),
(4, 32),
(16, 0),
(16, 1),
(16, 128)
]);
/*--------------------------------------------------------------------------------*
| x | y | ip_net_mask |
+--------------------------------------------------------------------------------+
| 4 | 0 | b"\x00\x00\x00\x00" |
| 4 | 20 | b"\xff\xff\xf0\x00" |
| 4 | 32 | b"\xff\xff\xff\xff" |
| 16 | 0 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" |
| 16 | 1 | b"\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" |
| 16 | 128 | b"\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff" |
*--------------------------------------------------------------------------------*/
NET.IP_TO_STRING
NET.IP_TO_STRING(addr_bin)
Description Converts an IPv4 or IPv6 address from binary (BYTES) format in network byte order to text (STRING) format.
If the input is 4 bytes, this function returns an IPv4 address as a STRING. If the input is 16 bytes, it returns an IPv6 address as a STRING.
If this function receives a NULL
input, it returns NULL
. If the input has
a length different from 4 or 16, an OUT_OF_RANGE
error occurs.
Return Data Type
STRING
Example
SELECT FORMAT("%T", x) AS addr_bin, NET.IP_TO_STRING(x) AS ip_to_string
FROM UNNEST([
b"0123",
b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01",
b"0123456789@ABCDE",
b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80"
]) AS x;
/*---------------------------------------------------------------------------------------------------------------*
| addr_bin | ip_to_string |
+---------------------------------------------------------------------------------------------------------------+
| b"0123" | 48.49.50.51 |
| b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" | ::1 |
| b"0123456789@ABCDE" | 3031:3233:3435:3637:3839:4041:4243:4445 |
| b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" | ::ffff:192.0.2.128 |
*---------------------------------------------------------------------------------------------------------------*/
NET.IP_TRUNC
NET.IP_TRUNC(addr_bin, prefix_length)
Description
Takes addr_bin
, an IPv4 or IPv6 address in binary (BYTES) format in network
byte order, and returns a subnet address in the same format. The result has the
same length as addr_bin
, where the first prefix_length
bits are equal to
those in addr_bin
and the remaining bits are 0.
This function throws an error if LENGTH(addr_bin)
is not 4 or 16, or if
prefix_len
is negative or greater than LENGTH(addr_bin) * 8
.
Return Data Type
BYTES
Example
SELECT
FORMAT("%T", x) as addr_bin, prefix_length,
FORMAT("%T", NET.IP_TRUNC(x, prefix_length)) AS ip_trunc
FROM UNNEST([
STRUCT(b"\xAA\xBB\xCC\xDD" as x, 0 as prefix_length),
(b"\xAA\xBB\xCC\xDD", 11), (b"\xAA\xBB\xCC\xDD", 12),
(b"\xAA\xBB\xCC\xDD", 24), (b"\xAA\xBB\xCC\xDD", 32),
(b'0123456789@ABCDE', 80)
]);
/*-----------------------------------------------------------------------------*
| addr_bin | prefix_length | ip_trunc |
+-----------------------------------------------------------------------------+
| b"\xaa\xbb\xcc\xdd" | 0 | b"\x00\x00\x00\x00" |
| b"\xaa\xbb\xcc\xdd" | 11 | b"\xaa\xa0\x00\x00" |
| b"\xaa\xbb\xcc\xdd" | 12 | b"\xaa\xb0\x00\x00" |
| b"\xaa\xbb\xcc\xdd" | 24 | b"\xaa\xbb\xcc\x00" |
| b"\xaa\xbb\xcc\xdd" | 32 | b"\xaa\xbb\xcc\xdd" |
| b"0123456789@ABCDE" | 80 | b"0123456789\x00\x00\x00\x00\x00\x00" |
*-----------------------------------------------------------------------------*/
NET.IPV4_FROM_INT64
NET.IPV4_FROM_INT64(integer_value)
Description
Converts an IPv4 address from integer format to binary (BYTES) format in network
byte order. In the integer input, the least significant bit of the IP address is
stored in the least significant bit of the integer, regardless of host or client
architecture. For example, 1
means 0.0.0.1
, and 0x1FF
means 0.0.1.255
.
This function checks that either all the most significant 32 bits are 0, or all
the most significant 33 bits are 1 (sign-extended from a 32-bit integer).
In other words, the input should be in the range [-0x80000000, 0xFFFFFFFF]
;
otherwise, this function throws an error.
This function does not support IPv6.
Return Data Type
BYTES
Example
SELECT x, x_hex, FORMAT("%T", NET.IPV4_FROM_INT64(x)) AS ipv4_from_int64
FROM (
SELECT CAST(x_hex AS INT64) x, x_hex
FROM UNNEST(["0x0", "0xABCDEF", "0xFFFFFFFF", "-0x1", "-0x2"]) AS x_hex
);
/*-----------------------------------------------*
| x | x_hex | ipv4_from_int64 |
+-----------------------------------------------+
| 0 | 0x0 | b"\x00\x00\x00\x00" |
| 11259375 | 0xABCDEF | b"\x00\xab\xcd\xef" |
| 4294967295 | 0xFFFFFFFF | b"\xff\xff\xff\xff" |
| -1 | -0x1 | b"\xff\xff\xff\xff" |
| -2 | -0x2 | b"\xff\xff\xff\xfe" |
*-----------------------------------------------*/
NET.IPV4_TO_INT64
NET.IPV4_TO_INT64(addr_bin)
Description
Converts an IPv4 address from binary (BYTES) format in network byte order to
integer format. In the integer output, the least significant bit of the IP
address is stored in the least significant bit of the integer, regardless of
host or client architecture. For example, 1
means 0.0.0.1
, and 0x1FF
means
0.0.1.255
. The output is in the range [0, 0xFFFFFFFF]
.
If the input length is not 4, this function throws an error.
This function does not support IPv6.
Return Data Type
INT64
Example
SELECT
FORMAT("%T", x) AS addr_bin,
FORMAT("0x%X", NET.IPV4_TO_INT64(x)) AS ipv4_to_int64
FROM
UNNEST([b"\x00\x00\x00\x00", b"\x00\xab\xcd\xef", b"\xff\xff\xff\xff"]) AS x;
/*-------------------------------------*
| addr_bin | ipv4_to_int64 |
+-------------------------------------+
| b"\x00\x00\x00\x00" | 0x0 |
| b"\x00\xab\xcd\xef" | 0xABCDEF |
| b"\xff\xff\xff\xff" | 0xFFFFFFFF |
*-------------------------------------*/
NET.PUBLIC_SUFFIX
NET.PUBLIC_SUFFIX(url)
Description
Takes a URL as a STRING
value and returns the public suffix (such as com
,
org
, or net
). A public suffix is an ICANN domain registered at
publicsuffix.org. For best results, URL values
should comply with the format as defined by
RFC 3986. If the URL value does not comply
with RFC 3986 formatting, this function makes a best effort to parse the input
and return a relevant result.
This function returns NULL
if any of the following is true:
- It cannot parse the host from the input;
- The parsed host contains adjacent dots in the middle (not leading or trailing);
- The parsed host does not contain any public suffix.
Before looking up the public suffix, this function temporarily normalizes the host by converting uppercase English letters to lowercase and encoding all non-ASCII characters with Punycode. The function then returns the public suffix as part of the original host instead of the normalized host.
Return Data Type
STRING
Example
SELECT
FORMAT("%T", input) AS input,
description,
FORMAT("%T", NET.HOST(input)) AS host,
FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix,
FORMAT("%T", NET.REG_DOMAIN(input)) AS domain
FROM (
SELECT "" AS input, "invalid input" AS description
UNION ALL SELECT "http://abc.xyz", "standard URL"
UNION ALL SELECT "//user:password@a.b:80/path?query",
"standard URL with relative scheme, port, path and query, but no public suffix"
UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host"
UNION ALL SELECT "http://例子.卷筒纸.中国", "standard URL with internationalized domain name"
UNION ALL SELECT " www.Example.Co.UK ",
"non-standard URL with spaces, upper case letters, and without scheme"
UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported"
);
input | description | host | suffix | domain |
---|---|---|---|---|
"" | invalid input | NULL | NULL | NULL |
"http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" |
"//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL |
"https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL |
"http://例子.卷筒纸.中国" | standard URL with internationalized domain name | "例子.卷筒纸.中国" | "中国" | "卷筒纸.中国" |
" www.Example.Co.UK " | non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK" | "Co.UK" | "Example.Co.UK |
"mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL |
NET.REG_DOMAIN
NET.REG_DOMAIN(url)
Description
Takes a URL as a string and returns the registered or registrable domain (the public suffix plus one preceding label), as a string. For best results, URL values should comply with the format as defined by RFC 3986. If the URL value does not comply with RFC 3986 formatting, this function makes a best effort to parse the input and return a relevant result.
This function returns NULL
if any of the following is true:
- It cannot parse the host from the input;
- The parsed host contains adjacent dots in the middle (not leading or trailing);
- The parsed host does not contain any public suffix;
- The parsed host contains only a public suffix without any preceding label.
Before looking up the public suffix, this function temporarily normalizes the host by converting uppercase English letters to lowercase and encoding all non-ASCII characters with Punycode. The function then returns the registered or registerable domain as part of the original host instead of the normalized host.
Return Data Type
STRING
Example
SELECT
FORMAT("%T", input) AS input,
description,
FORMAT("%T", NET.HOST(input)) AS host,
FORMAT("%T", NET.PUBLIC_SUFFIX(input)) AS suffix,
FORMAT("%T", NET.REG_DOMAIN(input)) AS domain
FROM (
SELECT "" AS input, "invalid input" AS description
UNION ALL SELECT "http://abc.xyz", "standard URL"
UNION ALL SELECT "//user:password@a.b:80/path?query",
"standard URL with relative scheme, port, path and query, but no public suffix"
UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host"
UNION ALL SELECT "http://例子.卷筒纸.中国", "standard URL with internationalized domain name"
UNION ALL SELECT " www.Example.Co.UK ",
"non-standard URL with spaces, upper case letters, and without scheme"
UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported"
);
input | description | host | suffix | domain |
---|---|---|---|---|
"" | invalid input | NULL | NULL | NULL |
"http://abc.xyz" | standard URL | "abc.xyz" | "xyz" | "abc.xyz" |
"//user:password@a.b:80/path?query" | standard URL with relative scheme, port, path and query, but no public suffix | "a.b" | NULL | NULL |
"https://[::1]:80" | standard URL with IPv6 host | "[::1]" | NULL | NULL |
"http://例子.卷筒纸.中国" | standard URL with internationalized domain name | "例子.卷筒纸.中国" | "中国" | "卷筒纸.中国" |
" www.Example.Co.UK " | non-standard URL with spaces, upper case letters, and without scheme | "www.Example.Co.UK" | "Co.UK" | "Example.Co.UK" |
"mailto:?to=&subject=&body=" | URI rather than URL--unsupported | "mailto" | NULL | NULL |
NET.SAFE_IP_FROM_STRING
NET.SAFE_IP_FROM_STRING(addr_str)
Description
Similar to NET.IP_FROM_STRING
, but returns NULL
instead of throwing an error if the input is invalid.
Return Data Type
BYTES
Example
SELECT
addr_str,
FORMAT("%T", NET.SAFE_IP_FROM_STRING(addr_str)) AS safe_ip_from_string
FROM UNNEST([
'48.49.50.51',
'::1',
'3031:3233:3435:3637:3839:4041:4243:4445',
'::ffff:192.0.2.128',
'48.49.50.51/32',
'48.49.50',
'::wxyz'
]) AS addr_str;
/*---------------------------------------------------------------------------------------------------------------*
| addr_str | safe_ip_from_string |
+---------------------------------------------------------------------------------------------------------------+
| 48.49.50.51 | b"0123" |
| ::1 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01" |
| 3031:3233:3435:3637:3839:4041:4243:4445 | b"0123456789@ABCDE" |
| ::ffff:192.0.2.128 | b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc0\x00\x02\x80" |
| 48.49.50.51/32 | NULL |
| 48.49.50 | NULL |
| ::wxyz | NULL |
*---------------------------------------------------------------------------------------------------------------*/
Numbering functions
GoogleSQL for BigQuery supports numbering functions. Numbering functions are a subset of window functions. To create a window function call and learn about the syntax for window functions, see Window function calls.
Numbering functions assign integer values to each row based on their position
within the specified window. The OVER
clause syntax varies across
numbering functions.
Function list
Name | Summary |
---|---|
CUME_DIST
|
Gets the cumulative distribution (relative position (0,1]) of each row within a window. |
DENSE_RANK
|
Gets the dense rank (1-based, no gaps) of each row within a window. |
NTILE
|
Gets the quantile bucket number (1-based) of each row within a window. |
PERCENT_RANK
|
Gets the percentile rank (from 0 to 1) of each row within a window. |
RANK
|
Gets the rank (1-based) of each row within a window. |
ROW_NUMBER
|
Gets the sequential row number (1-based) of each row within a window. |
CUME_DIST
CUME_DIST()
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
Description
Return the relative rank of a row defined as NP/NR. NP is defined to be the number of rows that either precede or are peers with the current row. NR is the number of rows in the partition.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Type
FLOAT64
Example
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
finish_time,
division,
CUME_DIST() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;
/*-----------------+------------------------+----------+-------------*
| name | finish_time | division | finish_rank |
+-----------------+------------------------+----------+-------------+
| Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 0.25 |
| Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 0.75 |
| Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 0.75 |
| Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 1 |
| Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 0.25 |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 0.5 |
| Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 0.75 |
| Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 1 |
*-----------------+------------------------+----------+-------------*/
DENSE_RANK
DENSE_RANK()
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
Description
Returns the ordinal (1-based) rank of each row within the window partition. All peer rows receive the same rank value, and the subsequent rank value is incremented by one.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Type
INT64
Examples
WITH Numbers AS
(SELECT 1 as x
UNION ALL SELECT 2
UNION ALL SELECT 2
UNION ALL SELECT 5
UNION ALL SELECT 8
UNION ALL SELECT 10
UNION ALL SELECT 10
)
SELECT x,
DENSE_RANK() OVER (ORDER BY x ASC) AS dense_rank
FROM Numbers
/*-------------------------*
| x | dense_rank |
+-------------------------+
| 1 | 1 |
| 2 | 2 |
| 2 | 2 |
| 5 | 3 |
| 8 | 4 |
| 10 | 5 |
| 10 | 5 |
*-------------------------*/
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
finish_time,
division,
DENSE_RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;
/*-----------------+------------------------+----------+-------------*
| name | finish_time | division | finish_rank |
+-----------------+------------------------+----------+-------------+
| Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 |
| Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 |
| Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 |
| Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 3 |
| Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 |
| Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 |
| Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 |
*-----------------+------------------------+----------+-------------*/
NTILE
NTILE(constant_integer_expression)
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
Description
This function divides the rows into constant_integer_expression
buckets based on row ordering and returns the 1-based bucket number that is
assigned to each row. The number of rows in the buckets can differ by at most 1.
The remainder values (the remainder of number of rows divided by buckets) are
distributed one for each bucket, starting with bucket 1. If
constant_integer_expression
evaluates to NULL, 0 or negative, an
error is provided.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Type
INT64
Example
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
finish_time,
division,
NTILE(3) OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;
/*-----------------+------------------------+----------+-------------*
| name | finish_time | division | finish_rank |
+-----------------+------------------------+----------+-------------+
| Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 |
| Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 1 |
| Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 |
| Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 3 |
| Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 1 |
| Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 2 |
| Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 3 |
*-----------------+------------------------+----------+-------------*/
PERCENT_RANK
PERCENT_RANK()
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
Description
Return the percentile rank of a row defined as (RK-1)/(NR-1), where RK is
the RANK
of the row and NR is the number of rows in the partition.
Returns 0 if NR=1.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Type
FLOAT64
Example
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
finish_time,
division,
PERCENT_RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;
/*-----------------+------------------------+----------+---------------------*
| name | finish_time | division | finish_rank |
+-----------------+------------------------+----------+---------------------+
| Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 0 |
| Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 0.33333333333333331 |
| Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 0.33333333333333331 |
| Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 1 |
| Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 0 |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 0.33333333333333331 |
| Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 0.66666666666666663 |
| Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 1 |
*-----------------+------------------------+----------+---------------------*/
RANK
RANK()
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
ORDER BY expression [ { ASC | DESC } ] [, ...]
Description
Returns the ordinal (1-based) rank of each row within the ordered partition.
All peer rows receive the same rank value. The next row or set of peer rows
receives a rank value which increments by the number of peers with the previous
rank value, instead of DENSE_RANK
, which always increments by 1.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Type
INT64
Examples
WITH Numbers AS
(SELECT 1 as x
UNION ALL SELECT 2
UNION ALL SELECT 2
UNION ALL SELECT 5
UNION ALL SELECT 8
UNION ALL SELECT 10
UNION ALL SELECT 10
)
SELECT x,
RANK() OVER (ORDER BY x ASC) AS rank
FROM Numbers
/*-------------------------*
| x | rank |
+-------------------------+
| 1 | 1 |
| 2 | 2 |
| 2 | 2 |
| 5 | 4 |
| 8 | 5 |
| 10 | 6 |
| 10 | 6 |
*-------------------------*/
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
finish_time,
division,
RANK() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;
/*-----------------+------------------------+----------+-------------*
| name | finish_time | division | finish_rank |
+-----------------+------------------------+----------+-------------+
| Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 |
| Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 |
| Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 2 |
| Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 4 |
| Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 |
| Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 |
| Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 |
*-----------------+------------------------+----------+-------------*/
ROW_NUMBER
ROW_NUMBER()
OVER over_clause
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
Description
Does not require the ORDER BY
clause. Returns the sequential
row ordinal (1-based) of each row for each ordered partition. If the
ORDER BY
clause is unspecified then the result is
non-deterministic.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Type
INT64
Examples
WITH Numbers AS
(SELECT 1 as x
UNION ALL SELECT 2
UNION ALL SELECT 2
UNION ALL SELECT 5
UNION ALL SELECT 8
UNION ALL SELECT 10
UNION ALL SELECT 10
)
SELECT x,
ROW_NUMBER() OVER (ORDER BY x) AS row_num
FROM Numbers
/*-------------------------*
| x | row_num |
+-------------------------+
| 1 | 1 |
| 2 | 2 |
| 2 | 3 |
| 5 | 4 |
| 8 | 5 |
| 10 | 6 |
| 10 | 7 |
*-------------------------*/
WITH finishers AS
(SELECT 'Sophia Liu' as name,
TIMESTAMP '2016-10-18 2:51:45' as finish_time,
'F30-34' as division
UNION ALL SELECT 'Lisa Stelzner', TIMESTAMP '2016-10-18 2:54:11', 'F35-39'
UNION ALL SELECT 'Nikki Leith', TIMESTAMP '2016-10-18 2:59:01', 'F30-34'
UNION ALL SELECT 'Lauren Matthews', TIMESTAMP '2016-10-18 3:01:17', 'F35-39'
UNION ALL SELECT 'Desiree Berry', TIMESTAMP '2016-10-18 3:05:42', 'F35-39'
UNION ALL SELECT 'Suzy Slane', TIMESTAMP '2016-10-18 3:06:24', 'F35-39'
UNION ALL SELECT 'Jen Edwards', TIMESTAMP '2016-10-18 3:06:36', 'F30-34'
UNION ALL SELECT 'Meghan Lederer', TIMESTAMP '2016-10-18 2:59:01', 'F30-34')
SELECT name,
finish_time,
division,
ROW_NUMBER() OVER (PARTITION BY division ORDER BY finish_time ASC) AS finish_rank
FROM finishers;
/*-----------------+------------------------+----------+-------------*
| name | finish_time | division | finish_rank |
+-----------------+------------------------+----------+-------------+
| Sophia Liu | 2016-10-18 09:51:45+00 | F30-34 | 1 |
| Meghan Lederer | 2016-10-18 09:59:01+00 | F30-34 | 2 |
| Nikki Leith | 2016-10-18 09:59:01+00 | F30-34 | 3 |
| Jen Edwards | 2016-10-18 10:06:36+00 | F30-34 | 4 |
| Lisa Stelzner | 2016-10-18 09:54:11+00 | F35-39 | 1 |
| Lauren Matthews | 2016-10-18 10:01:17+00 | F35-39 | 2 |
| Desiree Berry | 2016-10-18 10:05:42+00 | F35-39 | 3 |
| Suzy Slane | 2016-10-18 10:06:24+00 | F35-39 | 4 |
*-----------------+------------------------+----------+-------------*/
Range functions
GoogleSQL for BigQuery supports the following range functions.
Function list
Name | Summary |
---|---|
GENERATE_RANGE_ARRAY
|
Splits a range into an array of subranges. |
RANGE
|
Constructs a range of DATE , DATETIME ,
or TIMESTAMP values.
|
RANGE_CONTAINS
|
Signature 1: Checks if one range is in another range.
Signature 2: Checks if a value is in a range. |
RANGE_END
|
Gets the upper bound of a range. |
RANGE_INTERSECT
|
Gets a segment of two ranges that intersect. |
RANGE_OVERLAPS
|
Checks if two ranges overlap. |
RANGE_SESSIONIZE
|
Produces a table of sessionized ranges. |
RANGE_START
|
Gets the lower bound of a range. |
GENERATE_RANGE_ARRAY
GENERATE_RANGE_ARRAY(range_to_split, step_interval)
GENERATE_RANGE_ARRAY(range_to_split, step_interval, include_last_partial_range)
Description
Splits a range into an array of subranges.
Definitions
range_to_split
: TheRANGE<T>
value to split.step_interval
: TheINTERVAL
value, which determines the maximum size of each subrange in the resulting array. An interval single date and time part is supported, but an interval range of date and time parts is not.If
range_to_split
isRANGE<DATE>
, these interval date parts are supported:YEAR
toDAY
.If
range_to_split
isRANGE<DATETIME>
, these interval date and time parts are supported:YEAR
toSECOND
.If
range_to_split
isRANGE<TIMESTAMP>
, these interval date and time parts are supported:DAY
toSECOND
.
include_last_partial_range
: ABOOL
value, which determines whether or not to include the last subrange if it's a partial subrange. If this argument is not specified, the default value isTRUE
.TRUE
(default): The last subrange is included, even if it's smaller thanstep_interval
.FALSE
: Exclude the last subrange if it's smaller thanstep_interval
.
Details
Returns NULL
if any input isNULL
.
Return type
ARRAY<RANGE<T>>
Examples
In the following example, a date range between 2020-01-01
and 2020-01-06
is split into an array of subranges that are one day long. There are
no partial ranges.
SELECT GENERATE_RANGE_ARRAY(
RANGE(DATE '2020-01-01', DATE '2020-01-06'),
INTERVAL 1 DAY) AS results;
/*----------------------------+
| results |
+----------------------------+
| [ |
| [2020-01-01, 2020-01-02), |
| [2020-01-02, 2020-01-03), |
| [2020-01-03, 2020-01-04), |
| [2020-01-04, 2020-01-05), |
| [2020-01-05, 2020-01-06), |
| ] |
+----------------------------*/
In the following examples, a date range between 2020-01-01
and 2020-01-06
is split into an array of subranges that are two days long. The final subrange
is smaller than two days:
SELECT GENERATE_RANGE_ARRAY(
RANGE(DATE '2020-01-01', DATE '2020-01-06'),
INTERVAL 2 DAY) AS results;
/*----------------------------+
| results |
+----------------------------+
| [ |
| [2020-01-01, 2020-01-03), |
| [2020-01-03, 2020-01-05), |
| [2020-01-05, 2020-01-06) |
| ] |
+----------------------------*/
SELECT GENERATE_RANGE_ARRAY(
RANGE(DATE '2020-01-01', DATE '2020-01-06'),
INTERVAL 2 DAY,
TRUE) AS results;
/*----------------------------+
| results |
+----------------------------+
| [ |
| [2020-01-01, 2020-01-03), |
| [2020-01-03, 2020-01-05), |
| [2020-01-05, 2020-01-06) |
| ] |
+----------------------------*/
In the following example, a date range between 2020-01-01
and 2020-01-06
is split into an array of subranges that are two days long, but the final
subrange is excluded because it's smaller than two days:
SELECT GENERATE_RANGE_ARRAY(
RANGE(DATE '2020-01-01', DATE '2020-01-06'),
INTERVAL 2 DAY,
FALSE) AS results;
/*----------------------------+
| results |
+----------------------------+
| [ |
| [2020-01-01, 2020-01-03), |
| [2020-01-03, 2020-01-05) |
| ] |
+----------------------------*/
RANGE
RANGE(lower_bound, upper_bound)
Description
Constructs a range of DATE
, DATETIME
, or
TIMESTAMP
values.
Definitions
lower_bound
: The range starts from this value. This can be aDATE
,DATETIME
, orTIMESTAMP
value. If this value isNULL
, the range doesn't include a lower bound.upper_bound
: The range ends before this value. This can be aDATE
,DATETIME
, orTIMESTAMP
value. If this value isNULL
, the range doesn't include an upper bound.
Details
lower_bound
and upper_bound
must be of the same data type.
Produces an error if lower_bound
is greater than or equal to upper_bound
.
To return NULL
instead, add the SAFE.
prefix to the function name.
Return type
RANGE<T>
, where T
is the same data type as the input.
Examples
The following query constructs a date range:
SELECT RANGE(DATE '2022-12-01', DATE '2022-12-31') AS results;
/*--------------------------+
| results |
+--------------------------+
| [2022-12-01, 2022-12-31) |
+--------------------------*/
The following query constructs a datetime range:
SELECT RANGE(DATETIME '2022-10-01 14:53:27',
DATETIME '2022-10-01 16:00:00') AS results;
/*---------------------------------------------+
| results |
+---------------------------------------------+
| [2022-10-01T14:53:27, 2022-10-01T16:00:00) |
+---------------------------------------------*/
The following query constructs a timestamp range:
SELECT RANGE(TIMESTAMP '2022-10-01 14:53:27 America/Los_Angeles',
TIMESTAMP '2022-10-01 16:00:00 America/Los_Angeles') AS results;
-- Results depend upon where this query was executed.
/*------------------------------------------------------------------+
| results |
+------------------------------------------------------------------+
| [2022-10-01 21:53:27.000000 UTC, 2022-10-01 23:00:00.000000 UTC) |
+------------------------------------------------------------------*/
The following query constructs a date range with no lower bound:
SELECT RANGE(NULL, DATE '2022-12-31') AS results;
/*-------------------------+
| results |
+-------------------------+
| [UNBOUNDED, 2022-12-31) |
+-------------------------*/
The following query constructs a date range with no upper bound:
SELECT RANGE(DATE '2022-10-01', NULL) AS results;
/*--------------------------+
| results |
+--------------------------+
| [2022-10-01, UNBOUNDED) |
+--------------------------*/
RANGE_CONTAINS
- Signature 1: Checks if every value in one range is in another range.
- Signature 2: Checks if a value is in a range.
Signature 1
RANGE_CONTAINS(outer_range, inner_range)
Description
Checks if the inner range is in the outer range.
Definitions
outer_range
: TheRANGE<T>
value to search within.inner_range
: TheRANGE<T>
value to search for inouter_range
.
Details
Returns TRUE
if inner_range
exists in outer_range
.
Otherwise, returns FALSE
.
T
must be of the same type for all inputs.
Return type
BOOL
Examples
In the following query, the inner range is in the outer range:
SELECT RANGE_CONTAINS(
RANGE<DATE> '[2022-01-01, 2023-01-01)',
RANGE<DATE> '[2022-04-01, 2022-07-01)') AS results;
/*---------+
| results |
+---------+
| TRUE |
+---------*/
In the following query, the inner range is not in the outer range:
SELECT RANGE_CONTAINS(
RANGE<DATE> '[2022-01-01, 2023-01-01)',
RANGE<DATE> '[2023-01-01, 2023-04-01)') AS results;
/*---------+
| results |
+---------+
| FALSE |
+---------*/
Signature 2
RANGE_CONTAINS(range_to_search, value_to_find)
Description
Checks if a value is in a range.
Definitions
range_to_search
: TheRANGE<T>
value to search within.value_to_find
: The value to search for inrange_to_search
.
Details
Returns TRUE
if value_to_find
exists in range_to_search
.
Otherwise, returns FALSE
.
The data type for value_to_find
must be the same data type as T
in
range_to_search
.
Return type
BOOL
Examples
In the following query, the value 2022-04-01
is found in the range
[2022-01-01, 2023-01-01)
:
SELECT RANGE_CONTAINS(
RANGE<DATE> '[2022-01-01, 2023-01-01)',
DATE '2022-04-01') AS results;
/*---------+
| results |
+---------+
| TRUE |
+---------*/
In the following query, the value 2023-04-01
is not found in the range
[2022-01-01, 2023-01-01)
:
SELECT RANGE_CONTAINS(
RANGE<DATE> '[2022-01-01, 2023-01-01)',
DATE '2023-04-01') AS results;
/*---------+
| results |
+---------+
| FALSE |
+---------*/
RANGE_END
RANGE_END(range_to_check)
Description
Gets the upper bound of a range.
Definitions
range_to_check
: TheRANGE<T>
value.
Details
Returns NULL
if the upper bound in range_value
is UNBOUNDED
.
Returns NULL
if range_to_check
is NULL
.
Return type
T
in range_value
Examples
In the following query, the upper bound of the range is retrieved:
SELECT RANGE_END(RANGE<DATE> '[2022-12-01, 2022-12-31)') AS results;
/*------------+
| results |
+------------+
| 2022-12-31 |
+------------*/
In the following query, the upper bound of the range is unbounded, so
NULL
is returned:
SELECT RANGE_END(RANGE<DATE> '[2022-12-01, UNBOUNDED)') AS results;
/*------------+
| results |
+------------+
| NULL |
+------------*/
RANGE_INTERSECT
RANGE_INTERSECT(range_a, range_b)
Description
Gets a segment of two ranges that intersect.
Definitions
range_a
: The firstRANGE<T>
value.range_b
: The secondRANGE<T>
value.
Details
Returns NULL
if any input isNULL
.
Produces an error if range_a
and range_b
don't overlap. To return
NULL
instead, add the SAFE.
prefix to the function name.
T
must be of the same type for all inputs.
Return type
RANGE<T>
Examples
SELECT RANGE_INTERSECT(
RANGE<DATE> '[2022-02-01, 2022-09-01)',
RANGE<DATE> '[2021-06-15, 2022-04-15)') AS results;
/*--------------------------+
| results |
+--------------------------+
| [2022-02-01, 2022-04-15) |
+--------------------------*/
SELECT RANGE_INTERSECT(
RANGE<DATE> '[2022-02-01, UNBOUNDED)',
RANGE<DATE> '[2021-06-15, 2022-04-15)') AS results;
/*--------------------------+
| results |
+--------------------------+
| [2022-02-01, 2022-04-15) |
+--------------------------*/
SELECT RANGE_INTERSECT(
RANGE<DATE> '[2022-02-01, UNBOUNDED)',
RANGE<DATE> '[2021-06-15, UNBOUNDED)') AS results;
/*-------------------------+
| results |
+-------------------------+
| [2022-02-01, UNBOUNDED) |
+-------------------------*/
RANGE_OVERLAPS
RANGE_OVERLAPS(range_a, range_b)
Description
Checks if two ranges overlap.
Definitions
range_a
: The firstRANGE<T>
value.range_b
: The secondRANGE<T>
value.
Details
Returns TRUE
if a part of range_a
intersects with range_b
, otherwise
returns FALSE
.
T
must be of the same type for all inputs.
To get the part of the range that overlaps, use the
RANGE_INTERSECT
function.
Return type
BOOL
Examples
In the following query, the first and second ranges overlap between
2022-02-01
and 2022-04-15
:
SELECT RANGE_OVERLAPS(
RANGE<DATE> '[2022-02-01, 2022-09-01)',
RANGE<DATE> '[2021-06-15, 2022-04-15)') AS results;
/*---------+
| results |
+---------+
| TRUE |
+---------*/
In the following query, the first and second ranges don't overlap:
SELECT RANGE_OVERLAPS(
RANGE<DATE> '[2020-02-01, 2020-09-01)',
RANGE<DATE> '[2021-06-15, 2022-04-15)') AS results;
/*---------+
| results |
+---------+
| FALSE |
+---------*/
In the following query, the first and second ranges overlap between
2022-02-01
and UNBOUNDED
:
SELECT RANGE_OVERLAPS(
RANGE<DATE> '[2022-02-01, UNBOUNDED)',
RANGE<DATE> '[2021-06-15, UNBOUNDED)') AS results;
/*---------+
| results |
+---------+
| TRUE |
+---------*/
RANGE_SESSIONIZE
RANGE_SESSIONIZE(
TABLE table_name,
range_column,
partitioning_columns
)
RANGE_SESSIONIZE(
TABLE table_name,
range_column,
partitioning_columns,
sessionize_option
)
Description
Produces a table of sessionized ranges.
Definitions
table_name
: A table expression that represents the name of the table to construct. This can represent any relation withrange_column
.range_column
: ASTRING
literal that indicates whichRANGE
column in a table contains the data to sessionize.partitioning_columns
: AnARRAY<STRING>
literal that indicates which columns should partition the data before the data is sessionized.sessionize_option
: ASTRING
value that describes how order-adjacent ranges are sessionized. Your choices are as follows:MEETS
(default): Ranges that meet or overlap are sessionized.OVERLAPS
: Only a range that is overlapped by another range is sessionized.
If this argument is not provided,
MEETS
is used by default.
Details
This function produces a table that includes all columns in the
input table and an additional RANGE
column called
session_range
, which indicates the start and end of a session. The
start and end of each session is determined by the sessionize_option
argument.
Return type
TABLE
Examples
The examples in this section reference the following table called
my_sessionized_range_table
in a dataset called mydataset
:
INSERT mydataset.my_sessionized_range_table (emp_id, dept_id, duration)
VALUES(10, 1000, RANGE<DATE> '[2010-01-10, 2010-03-10)'),
(10, 2000, RANGE<DATE> '[2010-03-10, 2010-07-15)'),
(10, 2000, RANGE<DATE> '[2010-06-15, 2010-08-18)'),
(20, 2000, RANGE<DATE> '[2010-03-10, 2010-07-20)'),
(20, 1000, RANGE<DATE> '[2020-05-10, 2020-09-20)');
SELECT * FROM mydataset.my_sessionized_range_table ORDER BY emp_id;
/*--------+---------+--------------------------+
| emp_id | dept_id | duration |
+--------+---------+--------------------------+
| 10 | 1000 | [2010-01-10, 2010-03-10) |
| 10 | 2000 | [2010-03-10, 2010-07-15) |
| 10 | 2000 | [2010-06-15, 2010-08-18) |
| 20 | 2000 | [2010-03-10, 2010-07-20) |
| 20 | 1000 | [2020-05-10, 2020-09-20) |
+--------+---------+--------------------------*/
In the following query, a table of sessionized data is produced for
my_sessionized_range_table
, and only ranges that meet or overlap are
sessionized:
SELECT
emp_id, duration, session_range
FROM
RANGE_SESSIONIZE(
TABLE mydataset.my_sessionized_range_table,
'duration',
['emp_id'])
ORDER BY emp_id;
/*--------+--------------------------+--------------------------+
| emp_id | duration | session_range |
+--------+--------------------------+--------------------------+
| 10 | [2010-01-10, 2010-03-10) | [2010-01-10, 2010-08-18) |
| 10 | [2010-03-10, 2010-07-15) | [2010-01-10, 2010-08-18) |
| 10 | [2010-06-15, 2010-08-18) | [2010-01-10, 2010-08-18) |
| 20 | [2010-03-10, 2010-07-20) | [2010-03-10, 2010-07-20) |
| 20 | [2020-05-10, 2020-09-20) | [2020-05-10, 2020-09-20) |
+--------+-----------------------------------------------------*/
In the following query, a table of sessionized data is produced for
my_sessionized_range_table
, and only a range that is overlapped by another
range is sessionized:
SELECT
emp_id, duration, session_range
FROM
RANGE_SESSIONIZE(
TABLE mydataset.my_sessionized_range_table,
'duration',
['emp_id'],
'OVERLAPS')
ORDER BY emp_id;
/*--------+--------------------------+--------------------------+
| emp_id | duration | session_range |
+--------+--------------------------+--------------------------+
| 10 | [2010-03-10, 2010-07-15) | [2010-03-10, 2010-08-18) |
| 10 | [2010-06-15, 2010-08-18) | [2010-03-10, 2010-08-18) |
| 10 | [2010-01-10, 2010-03-10) | [2010-01-10, 2010-03-10) |
| 20 | [2020-05-10, 2020-09-20) | [2020-05-10, 2020-09-20) |
| 20 | [2010-03-10, 2010-07-20) | [2010-03-10, 2010-07-20) |
+--------+-----------------------------------------------------*/
If you need to normalize sessionized data, you can use a query similar to the following:
SELECT emp_id, session_range AS normalized FROM (
SELECT emp_id, session_range
FROM RANGE_SESSIONIZE(
TABLE mydataset.my_sessionized_range_table,
'duration',
['emp_id'],
'MEETS')
)
GROUP BY emp_id, normalized;
/*--------+--------------------------+
| emp_id | normalized |
+--------+--------------------------+
| 20 | [2010-03-10, 2010-07-20) |
| 10 | [2010-01-10, 2010-08-18) |
| 20 | [2020-05-10, 2020-09-20) |
+--------+--------------------------*/
RANGE_START
RANGE_START(range_to_check)
Description
Gets the lower bound of a range.
Definitions
range_to_check
: TheRANGE<T>
value.
Details
Returns NULL
if the lower bound of range_value
is UNBOUNDED
.
Returns NULL
if range_to_check
is NULL
.
Return type
T
in range_value
Examples
In the following query, the lower bound of the range is retrieved:
SELECT RANGE_START(RANGE<DATE> '[2022-12-01, 2022-12-31)') AS results;
/*------------+
| results |
+------------+
| 2022-12-01 |
+------------*/
In the following query, the lower bound of the range is unbounded, so
NULL
is returned:
SELECT RANGE_START(RANGE<DATE> '[UNBOUNDED, 2022-12-31)') AS results;
/*------------+
| results |
+------------+
| NULL |
+------------*/
Search functions
GoogleSQL for BigQuery supports the following search functions.
Function list
Name | Summary |
---|---|
SEARCH
|
Checks to see whether a table or other search data contains a set of search terms. |
VECTOR_SEARCH
|
Performs a vector search on embeddings to find semantically similar entities. |
SEARCH
SEARCH(
data_to_search, search_query
[, json_scope => { 'JSON_VALUES' | 'JSON_KEYS' | 'JSON_KEYS_AND_VALUES' } ]
[, analyzer => { 'LOG_ANALYZER' | 'NO_OP_ANALYZER' | 'PATTERN_ANALYZER'} ]
[, analyzer_options => analyzer_options_values ]
)
Description
The SEARCH
function checks to see whether a BigQuery table or other
search data contains a set of search terms (tokens). It returns TRUE
if all
search terms appear in the data, based on the rules for search_query
and text analysis described in the text analyzer. Otherwise,
this function returns FALSE
.
Definitions
data_to_search
: The data to search over. The value can be:- Any GoogleSQL data type literal
- A list of columns
- A table reference
- A column of any type
A table reference is evaluated as a
STRUCT
whose fields are the columns of the table.data_to_search
can be any type, butSEARCH
will returnFALSE
for all types except those listed here:ARRAY<STRING>
ARRAY<STRUCT>
JSON
STRING
STRUCT
You can search for string literals in columns of the preceding types. For additional rules, see Search data rules.
search_query
: ASTRING
literal, or aSTRING
constant expression that represents the terms of the search query. Ifsearch_query
isNULL
, an error is returned. Ifsearch_query
produces no search tokens, and the text analyzer isLOG_ANALYZER
orPATTERN_ANALYZER
, an error is produced.json_scope
: A named argument with aSTRING
value. Takes one of the following values to indicate the scope of JSON data to be searched. It has no effect ifdata_to_search
isn't a JSON value or doesn't contain a JSON field.'JSON_VALUES'
(default): Only the JSON values are searched. Ifjson_scope
isn't provided, this is used by default.'JSON_KEYS'
: Only the JSON keys are searched.'JSON_KEYS_AND_VALUES'
: The JSON keys and values are searched.
analyzer
: A named argument with aSTRING
value. Takes one of the following values to indicate the text analyzer to use:'LOG_ANALYZER'
(default): Breaks the input into tokens when delimiters are encountered and then normalizes the tokens. For more information, seeLOG_ANALYZER
.'NO_OP_ANALYZER'
: Extracts the text as a single token, but doesn't apply normalization. For more information about this analyzer, seeNO_OP_ANALYZER
.'PATTERN_ANALYZER'
: Breaks the input into tokens that match a regular expression. For more information, seePATTERN_ANALYZER
text analyzer.
analyzer_options
: A named argument with a JSON-formattedSTRING
value. Takes a list of text analysis rules. For more information, see Text analyzer options.
Details
The SEARCH
function is designed to work with search indexes to
optimize point lookups. Although the SEARCH
function works for
tables that aren't indexed, its performance will be greatly improved with a
search index. If both the analyzer and analyzer options match the one used
to create the index, the search index will be used.
Rules for search_query
A search query is initially broken down into terms (subqueries) using the white spaces in the search query. Each term is then further broken down into zero or more searchable tokens based on the text analyzer. This section contains the rules for how different types of terms are analyzed and evaluated.
Rules for backticks in search_query
:
If the
LOG_ANALYZER
text analyzer is used, text enclosed in backticks forces an exact match.For example,
`Hello World` happy days
becomesHello World
,happy
, anddays
.Search terms enclosed in backticks must match exactly in
data_to_search
, subject to the following conditions:It appears at the start of
data_to_search
or is immediately preceded by a delimiter.It appears at the end of
data_to_search
or is immediately followed by a delimiter.
For example,
SEARCH('foo.bar', '`foo.`')
returnsFALSE
because the text enclosed in the backticksfoo.
is immediately followed by the characterb
in the search datafoo.bar
, rather than by a delimiter or the end of the string. However,SEARCH('foo..bar', '`foo.`')
returnsTRUE
becausefoo.
is immediately followed by the delimiter.
in the search data.The backtick itself can be escaped using a backslash, as in
\`foobar\`
.The following are reserved words and must be enclosed in backticks:
AND
,NOT
,OR
,IN
, andNEAR
Rules for reserved characters in search_query
:
Text not enclosed in backticks requires the following reserved characters to be escaped by a double backslash
\\
:[ ] < > ( ) { } | ! ' " * & ? + / : = - \ ~ ^
If the quoted string is preceded by the character
r
orR
, such asr"my\+string"
, then it is treated as a raw string and only a single backslash is required to escape the reserved characters. For more information about raw strings and escape sequences, see String and byte literals.
Rules for phrases in search_query
:
- A phrase is a type of term. If text is enclosed in double quotes and the
analyzer
isLOG_ANALYZER
,PATTERN_ANALYZER
, or not set (LOG_ANALYZER
by default), the term represents a phrase. - When a phrase is analyzed, a subset of tokens is created for that phrase.
For example, from the phrase
"foo baz.bar"
, the analyzer calledLOG_ANALYZER
generates the phrase-specific tokensfoo
,baz
, andbar
. The order of terms in a phrase matters. A match is only returned if the tokens that were produced for the phrase are next to each other and in the same order as the tokens for
data_to_search
.For example:
-- FALSE because 'foo' and 'bar' are not next to each other in -- 'foo baz.bar'. SEARCH('foo baz.bar', '"foo bar"')
-- TRUE because 'foo' and 'baz' are next to each other in -- 'foo baz.bar'. SEARCH('foo baz.bar', '"foo baz"')
A single quote inside of the phrase is analyzed as a special character.
An escaped double quote (double quote after a backslash) is analyzed as a double quote character.
How search_query
is broken into searchable tokens
The following table shows how search_query
is broken into
searchable tokens by the LOG_ANALYZER
text analyzer. All entries are strings.
search_query | searchable tokens |
---|---|
127.0.0.1 | 127 0 1 127.0.0.1 . 127.0.0 127.0 0.0 0.0.1 0.1 |
foobar@example.com | foobar example com foobar@example example.com foobar@example.com |
The fox. | the fox The The fox The fox. fox fox. |
The following table shows how search_query
is broken into searchable tokens
by the LOG_ANALYZER
text analyzer. All entries are strings.
search_query | searchable tokens |
---|---|
127.0.0.1 | 127 0 1 |
`127.0.0.1` | 127.0.0.1 |
foobar@example.com | foobar example com |
`foobar@example.com` | foobar@example.com |
Rules for data_to_search
General rules for data_to_search
:
data_to_search
must contain all tokens produced forsearch_query
for the function to returnTRUE
.- To perform a cross-field search,
data_to_search
must be aSTRUCT
,ARRAY
, orJSON
data type. - Each
STRING
field in a compound data type is individually searched for terms. If at least one field in
data_to_search
includes all search terms produced bysearch_query
,SEARCH
returnsTRUE
. Otherwise it has the following behavior:If at least one
STRING
field isNULL
,SEARCH
returnsNULL
.Otherwise,
SEARCH
returnsFALSE
.
Return type
BOOL
Examples
The following queries show how tokens in search_query
are analyzed
by a SEARCH
function call using the default analyzer, LOG_ANALYZER
:
SELECT
-- ERROR: `search_query` is NULL.
SEARCH('foobarexample', NULL) AS a,
-- ERROR: `search_query` contains no tokens.
SEARCH('foobarexample', '') AS b,
SELECT
-- TRUE: '-' and ' ' are delimiters.
SEARCH('foobar-example', 'foobar example') AS a,
-- TRUE: The search query is a constant expression evaluated to 'foobar'.
SEARCH('foobar-example', CONCAT('foo', 'bar')) AS b,
-- FALSE: The search_query is not split.
SEARCH('foobar-example', 'foobarexample') AS c,
-- TRUE: The double backslash escapes the ampersand which is a delimiter.
SEARCH('foobar-example', 'foobar\\&example') AS d,
-- TRUE: The single backslash escapes the ampersand in a raw string.
SEARCH('foobar-example', R'foobar\&example')AS e,
-- FALSE: The backticks indicate that there must be an exact match for
-- foobar&example.
SEARCH('foobar-example', '`foobar&example`') AS f,
-- TRUE: An exact match is found.
SEARCH('foobar&example', '`foobar&example`') AS g
/*-------+-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f | g |
+-------+-------+-------+-------+-------+-------+-------+
| true | true | false | true | true | false | true |
*-------+-------+-------+-------+-------+-------+-------*/
SELECT
-- TRUE: The order of terms doesn't matter.
SEARCH('foobar-example', 'example foobar') AS a,
-- TRUE: Tokens are made lower-case.
SEARCH('foobar-example', 'Foobar Example') AS b,
-- TRUE: An exact match is found.
SEARCH('foobar-example', '`foobar-example`') AS c,
-- FALSE: Backticks preserve capitalization.
SEARCH('foobar-example', '`Foobar`') AS d,
-- FALSE: Backticks don't have special meaning for search_data and are
-- not delimiters in the default LOG_ANALYZER.
SEARCH('`foobar-example`', '`foobar-example`') AS e,
-- TRUE: An exact match is found after the delimiter in search_data.
SEARCH('foobar@example.com', '`example.com`') AS f,
-- TRUE: An exact match is found between the space delimiters.
SEARCH('a foobar-example b', '`foobar-example`') AS g;
/*-------+-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f | g |
+-------+-------+-------+-------+-------+-------+-------+
| true | true | true | false | false | true | true |
*-------+-------+-------+-------+-------+-------+-------*/
SELECT
-- FALSE: No single array entry matches all search terms.
SEARCH(['foobar', 'example'], 'foobar example') AS a,
-- FALSE: The search query is equivalent to foobar\\=.
SEARCH('foobar=', '`foobar\\=`') AS b,
-- FALSE: This is equivalent to the previous example.
SEARCH('foobar=', R'`\foobar=`') AS c,
-- TRUE: The equals sign is a delimiter in the data and query.
SEARCH('foobar=', 'foobar\\=') AS d,
-- TRUE: This is equivalent to the previous example.
SEARCH('foobar=', R'foobar\=') AS e,
-- TRUE: An exact match is found.
SEARCH('foobar.example', '`foobar`') AS f,
-- FALSE: `foobar.\` is not analyzed because of backticks; it is not
-- followed by a delimiter in search_data 'foobar.example'.
SEARCH('foobar.example', '`foobar.\`') AS g,
-- TRUE: `foobar.` is not analyzed because of backticks; it is
-- followed by the delimiter '.' in search_data 'foobar..example'.
SEARCH('foobar..example', '`foobar.`') AS h;
/*-------+-------+-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f | g | h |
+-------+-------+-------+-------+-------+-------+-------+-------+
| false | false | false | true | true | true | false | true |
*-------+-------+-------+-------+-------+-------+-------+-------*/
The following queries show how phrases in search_query
are analyzed
by a SEARCH
function call:
SELECT
-- TRUE: The phrase `foo bar` is in `foo bar baz`.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo` and `bar`
-- and because they appear in that exact order in `data_to_search`,
-- the function returns TRUE.
SEARCH(R'foo bar baz', R'"foo bar"') AS a,
-- TRUE: Case is ignored.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo` and `bar`
-- and because they appear in that exact order in `data_to_search`,
-- the function return TRUE.
SEARCH(R'Foo bar baz', R'"foo Bar"') AS b,
-- TRUE: Both `-` and `&` are delimiters used during tokenization.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo` and `bar`
-- and because they appear in that exact order in `data_to_search`,
-- the function returns TRUE.
SEARCH(R'foo-bar baz', R'"foo&bar"') AS c,
-- FALSE: Backticks in a phrase are treated as normal characters.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are:
-- `foo
-- bar`
-- Because these searchable tokens don't appear in `data_to_search`,
-- the function returns FALSE.
SEARCH(R'foo bar baz', R'"`foo bar`"') AS d,
-- FALSE: `foo bar` is not in `foo else bar`.
-- The tokens in `data_to_search` are `foo`, `else`, and `bar`.
-- The searchable tokens in `query_string` are `foo` and `bar`.
-- Even though they appear in `data_to_search`, but because they
-- do not appear in that exact order (`foo` before `bar`),
-- the function returns FALSE.
SEARCH(R'foo else bar', R'"foo bar"') AS e,
-- FALSE: `foo baz` is not in `foo bar baz`.
-- The `search_query` produces two terms. The first term is `bar`, which
-- matches with the similar token in `data_to_search`. However, the second
-- term is the phrase "foo&baz" with two tokens, `foo` and `baz`. Because
-- `foo` and `baz` do not appear next to each other in `data_to_search`
-- (`bar` is in between), the function returns FALSE.
SEARCH(R'foo-bar-baz', R'bar "foo&baz"') AS f;
/*-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f |
+-------+-------+-------+-------+-------+-------+
| true | true | false | false | false | false |
*-------+-------+-------+-------+-------+-------*/
SELECT
-- FALSE: Only double quotes need to be escaped in a phrase.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo\` and `bar` and they
-- must appear in that exact order in `data_to_search`, but don't.
SEARCH(
R'foo bar baz',
R'"foo\ bar"',
analyzer_options=>'{"delimiters": [" "]}') AS a,
-- TRUE: `foo bar` is in `foo bar baz` after tokenization with the given
-- delimiters.
-- The tokens in `data_to_search` are `foo`, `bar`, and `baz`.
-- The searchable tokens in `query_string` are `foo` and `bar` and they
-- must appear in that exact order in `data_to_search`.
SEARCH(
R'foo bar baz',
R'"foo? bar"',
analyzer_options=>'{"delimiters": [" ", "?"]}') AS b,
-- TRUE: `read book` is in `read book now` after `the` is ignored.
-- The tokens in `data_to_search` are `read`, `book`, and `now`.
-- The searchable tokens in `query_string` are `read` and `book` and they
-- must appear in that exact order in `data_to_search`.
SEARCH(
'read the book now',
R'"read the book"',
analyzer_options => '{ "token_filters": [{"stop_words": ["the"]}] }') AS c,
-- FALSE: `c d` is not in `a`, `b`, `cd`, `e` or `f` after tokenization with
-- the given pattern.
-- The tokens in `data_to_search` are `a`, `b`, `cd`, `e` and `f`.
-- The searchable tokens in `query_string` are `c` and `d` and they
-- must appear in that exact order in `data_to_search`. `data_to_search`
-- contains a `cd` token, but not a `c` or `d` token.
SEARCH(
R'abcdef',
R'"c d"',
analyzer=>'PATTERN_ANALYZER',
analyzer_options=>'{"patterns": ["(?:cd)|[a-z]"]}') AS d,
-- TRUE: `ant apple` is in `ant apple avocado` after tokenization with
-- the given pattern.
-- The tokens in `data_to_search` are `ant`, `apple`, and `avocado`.
-- The searchable tokens in `query_string` are `ant` and `apple` and they
-- must appear in that exact order in `data_to_search`.
SEARCH(
R'ant orange apple avocado',
R'"ant apple"',
analyzer=>'PATTERN_ANALYZER',
analyzer_options=>'{"patterns": ["a[a-z]"]}') AS e;
/*-------+-------+-------+-------+-------*
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| false | true | true | false | true |
*-------+-------+-------+-------+-------*/
The following query shows examples of calls to the SEARCH
function using the
NO_OP_ANALYZER
text analyzer and reasons for various return values:
SELECT
-- TRUE: exact match
SEARCH('foobar', 'foobar', analyzer=>'NO_OP_ANALYZER') AS a,
-- FALSE: Backticks are not special characters for `NO_OP_ANALYZER`.
SEARCH('foobar', '\`foobar\`', analyzer=>'NO_OP_ANALYZER') AS b,
-- FALSE: The capitalization does not match.
SEARCH('foobar', 'Foobar', analyzer=>'NO_OP_ANALYZER') AS c,
-- FALSE: There are no delimiters for `NO_OP_ANALYZER`.
SEARCH('foobar example', 'foobar', analyzer=>'NO_OP_ANALYZER') AS d,
-- TRUE: An exact match is found.
SEARCH('', '', analyzer=>'NO_OP_ANALYZER') AS e,
-- FALSE: 'foo bar' and "foo bar" are not considered an exact match.
SEARCH( R'foo bar baz', R'"foo bar"', analyzer=>'NO_OP_ANALYZER') AS f,
-- TRUE: "foo bar" and "foo Bar" are considered an exact match because the
-- analysis is case-insensitive.
SEARCH( R'"foo bar"', R'"foo Bar"', analyzer=>'NO_OP_ANALYZER') AS g;
/*-------+-------+-------+-------+-------+-------+-------*
| a | b | c | d | e | f | g |
+-------+-------+-------+-------+-------+-------+-------+
| true | false | false | false | true | false | true |
*-------+-------+-------+-------+-------+-------+-------*/
Consider the following table called meals
with columns breakfast
, lunch
,
and dinner
:
/*-------------------+-------------------------+------------------*
| breakfast | lunch | dinner |
+-------------------+-------------------------+------------------+
| Potato pancakes | Toasted cheese sandwich | Beef soup |
| Avocado toast | Tomato soup | Chicken soup |
*-------------------+-------------------------+------------------*/
The following query shows how to search single columns, multiple columns, and
whole tables, using the default LOG_ANALYZER
text analyzer
with the default analyzer options:
WITH
meals AS (
SELECT
'Potato pancakes' AS breakfast,
'Toasted cheese sandwich' AS lunch,
'Beef soup' AS dinner
UNION ALL
SELECT
'Avocado toast' AS breakfast,
'Tomato soup' AS lunch,
'Chicken soup' AS dinner
)
SELECT
SEARCH(lunch, 'soup') AS lunch_soup,
SEARCH((breakfast, dinner), 'soup') AS breakfast_or_dinner_soup,
SEARCH(meals, 'soup') AS anytime_soup
FROM meals;
/*------------+--------------------------+--------------*
| lunch_soup | breakfast_or_dinner_soup | anytime_soup |
+------------+--------------------------+--------------+
| false | true | true |
| true | true | true |
*------------+--------------------------+--------------*/
The following query shows additional ways to search, using the
default LOG_ANALYZER
text analyzer with
default analyzer options:
WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
SEARCH(email, 'exam') AS a,
SEARCH(email, 'foobar') AS b,
SEARCH(email, 'example.com') AS c,
SEARCH(email, R'"please use"') AS d,
SEARCH(email, R'"as email"') AS e
FROM data;
/*-------+-------+-------+-------+-------*
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| false | true | true | true | false |
*-------+-------+-------+-------+-------*/
The following query shows additional ways to search, using the
default LOG_ANALYZER
text analyzer with custom
analyzer options. Terms are only split when a space or @
symbol is
encountered.
WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
SEARCH(email, 'foobar', analyzer_options=>'{"delimiters": [" ", "@"]}') AS a,
SEARCH(email, 'example', analyzer_options=>'{"delimiters": [" ", "@"]}') AS b,
SEARCH(email, 'example.com', analyzer_options=>'{"delimiters": [" ", "@"]}') AS c,
SEARCH(email, 'foobar@example.com', analyzer_options=>'{"delimiters": [" ", "@"]}') AS d,
SEARCH(email, R'use "foobar example.com" "as your"', analyzer_options=>'{"delimiters": [" ", "@"]}') AS e
FROM data;
/*-------+-------+-------+-------+-------*
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| true | false | true | true | true |
*-------+-------+-------+-------+-------*/
The following query shows how to search, using the
NO_OP_ANALYZER
text analyzer:
WITH meals AS ( SELECT 'Tomato soup' AS lunch )
SELECT
SEARCH(lunch, 'Tomato soup', analyzer=>'NO_OP_ANALYZER') AS a,
SEARCH(lunch, 'soup', analyzer=>'NO_OP_ANALYZER') AS b,
SEARCH(lunch, 'tomato soup', analyzer=>'NO_OP_ANALYZER') AS c,
SEARCH(lunch, R'"Tomato soup"', analyzer=>'NO_OP_ANALYZER') AS d
FROM meals;
/*-------+-------+-------+-------*
| a | b | c | d |
+-------+-------+-------+-------+
| true | false | false | false |
*-------+-------+-------+-------*/
The following query shows how to use the PATTERN_ANALYZER
text analyzer with default analyzer options:
WITH data AS ( SELECT 'Please use foobar@example.com as your email.' AS email )
SELECT
SEARCH(email, 'exam', analyzer=>'PATTERN_ANALYZER') AS a,
SEARCH(email, 'foobar', analyzer=>'PATTERN_ANALYZER') AS b,
SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER') AS c,
SEARCH(email, R'foobar "EXAMPLE.com as" email', analyzer=>'PATTERN_ANALYZER') AS d
FROM data;
/*-------+-------+-------+-------*
| a | b | c | d |
+-------+-------+-------+-------+
| false | true | true | true |
*-------+-------+-------+-------*/
The following query shows additional ways to search, using the
PATTERN_ANALYZER
text analyzer with
custom analyzer options:
WITH data AS ( SELECT 'Please use foobar@EXAMPLE.com as your email.' AS email )
SELECT
SEARCH(email, 'EXAMPLE', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[A-Z]*"]}') AS a,
SEARCH(email, 'example', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-z]*"]}') AS b,
SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-z]*"]}') AS c,
SEARCH(email, 'example.com', analyzer=>'PATTERN_ANALYZER', analyzer_options=>'{"patterns": ["[a-zA-Z.]*"]}') AS d
FROM data;
/*-------+-------+-------+-------+-------*
| a | b | c | d | e |
+-------+-------+-------+-------+-------+
| true | false | false | true | false |
*-------+-------+-------+-------+-------*/
For additional examples that include analyzer options, see the Text analysis reference guide.
For helpful analyzer recipes that you can use to enhance analyzer-supported queries, see the Search with text analyzers user guide.
VECTOR_SEARCH
VECTOR_SEARCH(
{TABLE base_table | base_table_query_statement},
column_to_search,
TABLE query_table
[, query_column_to_search => query_column_to_search_value]
[, top_k => top_k_value ]
[, distance_type => distance_type_value ]
[, options => options_value ]
)
VECTOR_SEARCH(
{ TABLE base_table | base_table_query_statement },
column_to_search,
(query_statement)
[, [ query_column_to_search => ] query_column_to_search_value]
[, top_k => top_k_value ]
[, distance_type => distance_type_value ]
[, options => options_value ]
)
Description
The VECTOR_SEARCH
function lets you search embeddings to find semantically
similar entities.
Embeddings are high-dimensional numerical vectors that represent a given entity, like a piece of text or an audio file. Machine learning (ML) models use embeddings to encode semantics about such entities to make it easier to reason about and compare them. For example, a common operation in clustering, classification, and recommendation models is to measure the distance between vectors in an embedding space to find items that are most semantically similar.
Definitions
base_table
: The table to search for nearest neighbor embeddings.base_table_query_statement
: A query that you can use to pre-filter the base table. OnlySELECT
,FROM
, andWHERE
clauses are allowed in this query. Don't apply any filters to the embedding column. You can't use logical views in this query. Using a subquery might interfere with index usage or cause your query to fail. If the base table is indexed and theWHERE
clause contains columns that are not stored in the index, thenVECTOR_SEARCH
post-filters on those columns instead. To learn more, see Store columns and pre-filter.column_to_search
: The name of the base table column to search for nearest neighbor embeddings. The column must have a type ofARRAY<FLOAT64>
. All elements in the array must be non-NULL
, and all values in the column must have the same array dimensions. If the column has a vector index, BigQuery attempts to use it. To determine if an index was used in the vector search, see Vector index usage.query_table
: The table that provides the embeddings for which to find nearest neighbors. All columns are passed through as output columns.query_statement
: A query that provides the embeddings for which to find nearest neighbors. All columns are passed through as output columns.query_column_to_search
: A named argument with aSTRING
value.query_column_to_search_value
specifies the name of the column in the query table or statement that contains the embeddings for which to find nearest neighbors. The column must have a type ofARRAY<FLOAT64>
. All elements in the array must be non-NULL
and all values in the column must have the same array dimensions as the values in thecolumn_to_search
column. If you don't specifyquery_column_to_search_value
, the function uses thecolumn_to_search
value or picks the most appropriate column.top_k
: A named argument with anINT64
value.top_k_value
specifies the number of nearest neighbors to return. The default is10
. A negative value is treated as infinity, meaning that all values are counted as neighbors and returned.distance_type
: A named argument with aSTRING
value.distance_type_value
specifies the type of metric to use to compute the distance between two vectors. Supported distance types areEUCLIDEAN
,COSINE
, andDOT_PRODUCT
. The default isEUCLIDEAN
.If you don't specify
distance_type_value
and thecolumn_to_search
column has a vector index that is used,VECTOR_SEARCH
uses the distance type specified in thedistance_type
option of theCREATE VECTOR INDEX
statement.options
: A named argument with a JSON-formattedSTRING
value.options_value
is a literal that specifies the following vector search options:fraction_lists_to_search
: A JSON number that specifies the percentage of lists to search. For example,options => '{"fraction_lists_to_search":0.15}'
. Thefraction_lists_to_search
value must be in the range0.0
to1.0
, exclusive.Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage.
fraction_lists_to_search
is only used when a vector index is also used. If you don't specify afraction_lists_to_search
value but an index is matched, an appropriate value is picked.The number of available lists to search is determined by the
num_lists
option in theivf_options
option or derived from theleaf_node_embedding_count
option in thetree_ah_options
option of theCREATE VECTOR INDEX
statement if specified. Otherwise, BigQuery calculates an appropriate number.You can't specify
fraction_lists_to_search
whenuse_brute_force
is set totrue
.use_brute_force
: A JSON boolean that determines whether to use brute force search by skipping the vector index if one is available. For example,options => '{"use_brute_force":true}'
. The default isfalse
. If you specifyuse_brute_force=false
and there is no useable vector index available, brute force is used anyway.
options
defaults to'{}'
to denote that all underlying options use their corresponding default values.
Details
You can optionally use VECTOR_SEARCH
with a vector index. When
a vector index is used, VECTOR_SEARCH
uses the Approximate Nearest
Neighbor search technique to help improve vector search performance, with
the trade-off of reducing recall and so returning more approximate
results. Brute force is used to return exact results when a vector index isn't
available, and you can choose to use brute force to get exact results even when
a vector index is available.
Output
For each row in the query data, the output contains multiple rows from the
base table that satisfy the search criteria. The number of results rows per
query table row is either 10 or the top_k
value if it is specified. The
order of the output isn't guaranteed.
The output includes the following columns:
query
: ASTRUCT
value that contains all selected columns from the query data.base
: ASTRUCT
value that contains all columns frombase_table
or a subset of the columns frombase_table
that you selected in thebase_table_query_statement
query.distance
: AFLOAT64
value that represents the distance between the base data and the query data.
Limitations
BigQuery data security and governance rules apply to the use of
VECTOR_SEARCH
, which results in the following behavior:
- If the base table has
row-level security policies,
VECTOR_SEARCH
applies the row-level access policies to the query results. - If the indexed column from the base table has
data masking policies,
VECTOR_SEARCH
succeeds only if the user running the query has theFine-Grained Reader
role on the policy tags that are used. Otherwise,VECTOR_SEARCH
fails with an invalid query error. If any base table column or any column in the query table or statement has column-level security policies and you don't have appropriate permissions to access the column,
VECTOR_SEARCH
fails with a permission denied error.The project that runs the query containing
VECTOR_SEARCH
must match the project that contains the base table.
Examples
The following queries create test tables table1
and table2
to use in
subsequent query examples :
CREATE OR REPLACE TABLE mydataset.table1
(
id INT64,
my_embedding ARRAY<FLOAT64>
);
INSERT mydataset.table1 (id, my_embedding)
VALUES(1, [1.0, 2.0]),
(2, [2.0, 4.0]),
(3, [1.5, 7.0]),
(4, [1.0, 3.2]),
(5, [5.0, 5.4]),
(6, [3.7, 1.8]),
(7, [4.4, 2.9]);
CREATE OR REPLACE TABLE mydataset.table2
(
query_id STRING,
embedding ARRAY<FLOAT64>
);
INSERT mydataset.table2 (query_id, embedding)
VALUES('dog', [1.0, 2.0]),
('cat', [3.0, 5.2]);
The following example searches the my_embedding
column of table1
for the top
two embeddings that match each row of data in the embedding
column of
table2
:
SELECT *
FROM
VECTOR_SEARCH(
TABLE mydataset.table1,
'my_embedding',
(SELECT query_id, embedding FROM mydataset.table2),
'embedding',
top_k => 2);
/*----------------+-----------------+---------+----------------------------------------*
| query.query_id | query.embedding | base.id | base.my_embedding | distance |
+----------------+-----------------+---------+-------------------+--------------------+
| dog | 1.0 | 1 | 1.0 | 0 |
| | 2.0 | | 2.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| dog | 1.0 | 4 | 1.0 | 1.2000000000000002 |
| | 2.0 | | 3.2 | |
+----------------+-----------------+---------+-------------------+--------------------+
| cat | 3.0 | 2 | 2.0 | 1.5620499351813311 |
| | 5.2 | | 4.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| cat | 3.0 | 5 | 5.0 | 2.0099751242241779 |
| | 5.2 | | 5.4 | |
*----------------+-----------------+---------+-------------------+--------------------*/
The following example pre-filters table1
to rows where id
is not equal to
4 and then searches the my_embedding
column of table1
for the top
two embeddings that match each row of data in the embedding
column of
table2
.
SELECT *
FROM
VECTOR_SEARCH(
(SELECT * FROM mydataset.table1 WHERE id != 4),
'my_embedding',
(SELECT query_id, embedding FROM mydataset.table2),
'embedding',
top_k => 2,
options => '{"use_brute_force":true}');
/*----------------+-----------------+---------+----------------------------------------*
| query.query_id | query.embedding | base.id | base.my_embedding | distance |
+----------------+-----------------+---------+-------------------+--------------------+
| dog | 1.0 | 1 | 1.0 | 0 |
| | 2.0 | | 2.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| dog | 1.0 | 2 | 2.0 | 2.23606797749979 |
| | 2.0 | | 4.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| cat | 3.0 | 2 | 2.0 | 1.5620499351813311 |
| | 5.2 | | 4.0 | |
+----------------+-----------------+---------+-------------------+--------------------+
| cat | 3.0 | 5 | 5.0 | 2.0099751242241779 |
| | 5.2 | | 5.4 | |
*----------------+-----------------+---------+-------------------+--------------------*/
The following example searches the my_embedding
column of table1
for the top
two embeddings that match each row of data in the embedding
column of
table2
, and uses the COSINE
distance type to measure the distance between
the embeddings:
SELECT *
FROM
VECTOR_SEARCH(
TABLE mydataset.table1,
'my_embedding',
TABLE mydataset.table2,
'embedding',
top_k => 2,
distance_type => 'COSINE');
/*----------------+-----------------+---------+-------------------------------------------+
| query.query_id | query.embedding | base.id | base.my_embedding | distance |
+----------------+-----------------+---------+-------------------+-----------------------+
| dog | 1.0 | 2 | 2.0 | 0 |
| | 2.0 | | 4.0 | |
+----------------+-----------------+---------+-------------------+-----------------------+
| dog | 1.0 | 1 | 1.0 | 0 |
| | 2.0 | | 2.0 | |
+----------------+-----------------+---------+-------------------+-----------------------+
| cat | 3.0 | 2 | 2.0 | 0.0017773842088002478 |
| | 5.2 | | 4.0 | |
+----------------+-----------------+---------+-------------------+-----------------------+
| cat | 3.0 | 1 | 1.0 | 0.0017773842088002478 |
| | 5.2 | | 2.0 | |
*----------------+-----------------+---------+-------------------+-----------------------*/
Security functions
GoogleSQL for BigQuery supports the following security functions.
Function list
Name | Summary |
---|---|
SESSION_USER
|
Get the email address or principal identifier of the user that is running the query. |
SESSION_USER
SESSION_USER()
Description
For first-party users, returns the email address of the user that is running the query. For third-party users, returns the principal identifier of the user that is running the query. For more information about identities, see Principals.
Return Data Type
STRING
Example
SELECT SESSION_USER() as user;
/*----------------------*
| user |
+----------------------+
| jdoe@example.com |
*----------------------*/
Statistical aggregate functions
GoogleSQL for BigQuery supports statistical aggregate functions. To learn about the syntax for aggregate function calls, see Aggregate function calls.
Function list
Name | Summary |
---|---|
CORR
|
Computes the Pearson coefficient of correlation of a set of number pairs. |
COVAR_POP
|
Computes the population covariance of a set of number pairs. |
COVAR_SAMP
|
Computes the sample covariance of a set of number pairs. |
STDDEV
|
An alias of the STDDEV_SAMP function.
|
STDDEV_POP
|
Computes the population (biased) standard deviation of the values. |
STDDEV_SAMP
|
Computes the sample (unbiased) standard deviation of the values. |
VAR_POP
|
Computes the population (biased) variance of the values. |
VAR_SAMP
|
Computes the sample (unbiased) variance of the values. |
VARIANCE
|
An alias of VAR_SAMP .
|
CORR
CORR(
X1, X2
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the Pearson coefficient
of correlation of a set of number pairs. For each number pair, the first number
is the dependent variable and the second number is the independent variable.
The return result is between -1
and 1
. A result of 0
indicates no
correlation.
All numeric types are supported. If the
input is NUMERIC
or BIGNUMERIC
then the internal aggregation is
stable with the final output converted to a FLOAT64
.
Otherwise the input is converted to a FLOAT64
before aggregation, resulting in a potentially unstable result.
This function ignores any input pairs that contain one or more NULL
values. If
there are fewer than two input pairs without NULL
values, this function
returns NULL
.
NaN
is produced if:
- Any input value is
NaN
- Any input value is positive infinity or negative infinity.
- The variance of
X1
orX2
is0
. - The covariance of
X1
andX2
is0
.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Data Type
FLOAT64
Examples
SELECT CORR(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 5.0 AS x),
(3.0, 9.0),
(4.0, 7.0)]);
/*--------------------*
| results |
+--------------------+
| 0.6546536707079772 |
*--------------------*/
SELECT CORR(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 5.0 AS x),
(3.0, 9.0),
(4.0, NULL)]);
/*---------*
| results |
+---------+
| 1 |
*---------*/
SELECT CORR(y, x) AS results
FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)])
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT CORR(y, x) AS results
FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)])
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT CORR(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 5.0 AS x),
(3.0, 9.0),
(4.0, 7.0),
(5.0, 1.0),
(7.0, CAST('Infinity' as FLOAT64))])
/*---------*
| results |
+---------+
| NaN |
*---------*/
SELECT CORR(x, y) AS results
FROM
(
SELECT 0 AS x, 0 AS y
UNION ALL
SELECT 0 AS x, 0 AS y
)
/*---------*
| results |
+---------+
| NaN |
*---------*/
COVAR_POP
COVAR_POP(
X1, X2
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the population covariance of
a set of number pairs. The first number is the dependent variable; the second
number is the independent variable. The return result is between -Inf
and
+Inf
.
All numeric types are supported. If the
input is NUMERIC
or BIGNUMERIC
then the internal aggregation is
stable with the final output converted to a FLOAT64
.
Otherwise the input is converted to a FLOAT64
before aggregation, resulting in a potentially unstable result.
This function ignores any input pairs that contain one or more NULL
values. If
there is no input pair without NULL
values, this function returns NULL
.
If there is exactly one input pair without NULL
values, this function returns
0
.
NaN
is produced if:
- Any input value is
NaN
- Any input value is positive infinity or negative infinity.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Data Type
FLOAT64
Examples
SELECT COVAR_POP(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 1.0 AS x),
(2.0, 6.0),
(9.0, 3.0),
(2.0, 6.0),
(9.0, 3.0)])
/*---------------------*
| results |
+---------------------+
| -1.6800000000000002 |
*---------------------*/
SELECT COVAR_POP(y, x) AS results
FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)])
/*---------*
| results |
+---------+
| 0 |
*---------*/
SELECT COVAR_POP(y, x) AS results
FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)])
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT COVAR_POP(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 1.0 AS x),
(2.0, 6.0),
(9.0, 3.0),
(2.0, 6.0),
(NULL, 3.0)])
/*---------*
| results |
+---------+
| -1 |
*---------*/
SELECT COVAR_POP(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 1.0 AS x),
(2.0, 6.0),
(9.0, 3.0),
(2.0, 6.0),
(CAST('Infinity' as FLOAT64), 3.0)])
/*---------*
| results |
+---------+
| NaN |
*---------*/
COVAR_SAMP
COVAR_SAMP(
X1, X2
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the sample covariance of a
set of number pairs. The first number is the dependent variable; the second
number is the independent variable. The return result is between -Inf
and
+Inf
.
All numeric types are supported. If the
input is NUMERIC
or BIGNUMERIC
then the internal aggregation is
stable with the final output converted to a FLOAT64
.
Otherwise the input is converted to a FLOAT64
before aggregation, resulting in a potentially unstable result.
This function ignores any input pairs that contain one or more NULL
values. If
there are fewer than two input pairs without NULL
values, this function
returns NULL
.
NaN
is produced if:
- Any input value is
NaN
- Any input value is positive infinity or negative infinity.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Data Type
FLOAT64
Examples
SELECT COVAR_SAMP(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 1.0 AS x),
(2.0, 6.0),
(9.0, 3.0),
(2.0, 6.0),
(9.0, 3.0)])
/*---------*
| results |
+---------+
| -2.1 |
*---------*/
SELECT COVAR_SAMP(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 1.0 AS x),
(2.0, 6.0),
(9.0, 3.0),
(2.0, 6.0),
(NULL, 3.0)])
/*----------------------*
| results |
+----------------------+
| --1.3333333333333333 |
*----------------------*/
SELECT COVAR_SAMP(y, x) AS results
FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, 3.0)])
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT COVAR_SAMP(y, x) AS results
FROM UNNEST([STRUCT(1.0 AS y, NULL AS x),(9.0, NULL)])
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT COVAR_SAMP(y, x) AS results
FROM
UNNEST(
[
STRUCT(1.0 AS y, 1.0 AS x),
(2.0, 6.0),
(9.0, 3.0),
(2.0, 6.0),
(CAST('Infinity' as FLOAT64), 3.0)])
/*---------*
| results |
+---------+
| NaN |
*---------*/
STDDEV
STDDEV(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
An alias of STDDEV_SAMP.
STDDEV_POP
STDDEV_POP(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the population (biased) standard deviation of the values. The return
result is between 0
and +Inf
.
All numeric types are supported. If the
input is NUMERIC
or BIGNUMERIC
then the internal aggregation is
stable with the final output converted to a FLOAT64
.
Otherwise the input is converted to a FLOAT64
before aggregation, resulting in a potentially unstable result.
This function ignores any NULL
inputs. If all inputs are ignored, this
function returns NULL
. If this function receives a single non-NULL
input,
it returns 0
.
NaN
is produced if:
- Any input value is
NaN
- Any input value is positive infinity or negative infinity.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
If this function is used with the OVER
clause, it's part of a
window function call. In a window function call,
aggregate function clauses can't be used.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Data Type
FLOAT64
Examples
SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, 18]) AS x
/*-------------------*
| results |
+-------------------+
| 3.265986323710904 |
*-------------------*/
SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, NULL]) AS x
/*---------*
| results |
+---------+
| 2 |
*---------*/
SELECT STDDEV_POP(x) AS results FROM UNNEST([10, NULL]) AS x
/*---------*
| results |
+---------+
| 0 |
*---------*/
SELECT STDDEV_POP(x) AS results FROM UNNEST([NULL]) AS x
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT STDDEV_POP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as FLOAT64)]) AS x
/*---------*
| results |
+---------+
| NaN |
*---------*/
STDDEV_SAMP
STDDEV_SAMP(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the sample (unbiased) standard deviation of the values. The return
result is between 0
and +Inf
.
All numeric types are supported. If the
input is NUMERIC
or BIGNUMERIC
then the internal aggregation is
stable with the final output converted to a FLOAT64
.
Otherwise the input is converted to a FLOAT64
before aggregation, resulting in a potentially unstable result.
This function ignores any NULL
inputs. If there are fewer than two non-NULL
inputs, this function returns NULL
.
NaN
is produced if:
- Any input value is
NaN
- Any input value is positive infinity or negative infinity.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
If this function is used with the OVER
clause, it's part of a
window function call. In a window function call,
aggregate function clauses can't be used.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Data Type
FLOAT64
Examples
SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, 18]) AS x
/*---------*
| results |
+---------+
| 4 |
*---------*/
SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, NULL]) AS x
/*--------------------*
| results |
+--------------------+
| 2.8284271247461903 |
*--------------------*/
SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, NULL]) AS x
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT STDDEV_SAMP(x) AS results FROM UNNEST([NULL]) AS x
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT STDDEV_SAMP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as FLOAT64)]) AS x
/*---------*
| results |
+---------+
| NaN |
*---------*/
VAR_POP
VAR_POP(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the population (biased) variance of the values. The return result is
between 0
and +Inf
.
All numeric types are supported. If the
input is NUMERIC
or BIGNUMERIC
then the internal aggregation is
stable with the final output converted to a FLOAT64
.
Otherwise the input is converted to a FLOAT64
before aggregation, resulting in a potentially unstable result.
This function ignores any NULL
inputs. If all inputs are ignored, this
function returns NULL
. If this function receives a single non-NULL
input,
it returns 0
.
NaN
is produced if:
- Any input value is
NaN
- Any input value is positive infinity or negative infinity.
If this function is used with the OVER
clause, it's part of a
window function call. In a window function call,
aggregate function clauses can't be used.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Data Type
FLOAT64
Examples
SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, 18]) AS x
/*--------------------*
| results |
+--------------------+
| 10.666666666666666 |
*--------------------*/
SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, NULL]) AS x
/*----------*
| results |
+---------+
| 4 |
*---------*/
SELECT VAR_POP(x) AS results FROM UNNEST([10, NULL]) AS x
/*----------*
| results |
+---------+
| 0 |
*---------*/
SELECT VAR_POP(x) AS results FROM UNNEST([NULL]) AS x
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT VAR_POP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as FLOAT64)]) AS x
/*---------*
| results |
+---------+
| NaN |
*---------*/
VAR_SAMP
VAR_SAMP(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
Returns the sample (unbiased) variance of the values. The return result is
between 0
and +Inf
.
All numeric types are supported. If the
input is NUMERIC
or BIGNUMERIC
then the internal aggregation is
stable with the final output converted to a FLOAT64
.
Otherwise the input is converted to a FLOAT64
before aggregation, resulting in a potentially unstable result.
This function ignores any NULL
inputs. If there are fewer than two non-NULL
inputs, this function returns NULL
.
NaN
is produced if:
- Any input value is
NaN
- Any input value is positive infinity or negative infinity.
To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.
This function can be used with the
AGGREGATION_THRESHOLD
clause.
If this function is used with the OVER
clause, it's part of a
window function call. In a window function call,
aggregate function clauses can't be used.
To learn more about the OVER
clause and how to use it, see
Window function calls.
Return Data Type
FLOAT64
Examples
SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, 18]) AS x
/*---------*
| results |
+---------+
| 16 |
*---------*/
SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, NULL]) AS x
/*---------*
| results |
+---------+
| 8 |
*---------*/
SELECT VAR_SAMP(x) AS results FROM UNNEST([10, NULL]) AS x
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT VAR_SAMP(x) AS results FROM UNNEST([NULL]) AS x
/*---------*
| results |
+---------+
| NULL |
*---------*/
SELECT VAR_SAMP(x) AS results FROM UNNEST([10, 14, CAST('Infinity' as FLOAT64)]) AS x
/*---------*
| results |
+---------+
| NaN |
*---------*/
VARIANCE
VARIANCE(
[ DISTINCT ]
expression
)
[ OVER over_clause ]
over_clause:
{ named_window | ( [ window_specification ] ) }
window_specification:
[ named_window ]
[ PARTITION BY partition_expression [, ...] ]
[ ORDER BY expression [ { ASC | DESC } ] [, ...] ]
[ window_frame_clause ]
Description
An alias of VAR_SAMP.
String functions
GoogleSQL for BigQuery supports string functions.
These string functions work on two different values:
STRING
and BYTES
data types. STRING
values must be well-formed UTF-8.
Functions that return position values, such as STRPOS,
encode those positions as INT64
. The value 1
refers to the first character (or byte), 2
refers to the second, and so on.
The value 0
indicates an invalid position. When working on STRING
types, the
returned positions refer to character positions.
All string comparisons are done byte-by-byte, without regard to Unicode canonical equivalence.
Function list
Name | Summary |
---|---|
ASCII
|
Gets the ASCII code for the first character or byte in a STRING
or BYTES value.
|
BYTE_LENGTH
|
Gets the number of BYTES in a STRING or
BYTES value.
|
CHAR_LENGTH
|
Gets the number of characters in a STRING value.
|
CHARACTER_LENGTH
|
Synonym for CHAR_LENGTH .
|
CHR
|
Converts a Unicode code point to a character. |
CODE_POINTS_TO_BYTES
|
Converts an array of extended ASCII code points to a
BYTES value.
|
CODE_POINTS_TO_STRING
|
Converts an array of extended ASCII code points to a
STRING value.
|
COLLATE
|
Combines a STRING value and a collation specification into a
collation specification-supported STRING value.
|
CONCAT
|
Concatenates one or more STRING or BYTES
values into a single result.
|
CONTAINS_SUBSTR
|
Performs a normalized, case-insensitive search to see if a value exists as a substring in an expression. |
EDIT_DISTANCE
|
Computes the Levenshtein distance between two STRING
or BYTES values.
|
ENDS_WITH
|
Checks if a STRING or BYTES value is the suffix
of another value.
|
FORMAT
|
Formats data and produces the results as a STRING value.
|
FROM_BASE32
|
Converts a base32-encoded STRING value into a
BYTES value.
|
FROM_BASE64
|
Converts a base64-encoded STRING value into a
BYTES value.
|
FROM_HEX
|
Converts a hexadecimal-encoded STRING value into a
BYTES value.
|
INITCAP
|
Formats a STRING as proper case, which means that the first
character in each word is uppercase and all other characters are lowercase.
|
INSTR
|
Finds the position of a subvalue inside another value, optionally starting the search at a given offset or occurrence. |
LEFT
|
Gets the specified leftmost portion from a STRING or
BYTES value.
|
LENGTH
|
Gets the length of a STRING or BYTES value.
|
LOWER
|
Formats alphabetic characters in a STRING value as
lowercase.
Formats ASCII characters in a BYTES value as
lowercase.
|
LPAD
|
Prepends a STRING or BYTES value with a pattern.
|
LTRIM
|
Identical to the TRIM function, but only removes leading
characters.
|
NORMALIZE
|
Case-sensitively normalizes the characters in a STRING value.
|
NORMALIZE_AND_CASEFOLD
|
Case-insensitively normalizes the characters in a STRING value.
|
OCTET_LENGTH
|
Alias for BYTE_LENGTH .
|
REGEXP_CONTAINS
|
Checks if a value is a partial match for a regular expression. |
REGEXP_EXTRACT
|
Produces a substring that matches a regular expression. |
REGEXP_EXTRACT_ALL
|
Produces an array of all substrings that match a regular expression. |
REGEXP_INSTR
|
Finds the position of a regular expression match in a value, optionally starting the search at a given offset or occurrence. |
REGEXP_REPLACE
|
Produces a STRING value where all substrings that match a
regular expression are replaced with a specified value.
|
REGEXP_SUBSTR
|
Synonym for REGEXP_EXTRACT .
|
REPEAT
|
Produces a STRING or BYTES value that consists of
an original value, repeated.
|
REPLACE
|
Replaces all occurrences of a pattern with another pattern in a
STRING or BYTES value.
|
REVERSE
|
Reverses a STRING or BYTES value.
|
RIGHT
|
Gets the specified rightmost portion from a STRING or
BYTES value.
|
RPAD
|
Appends a STRING or BYTES value with a pattern.
|
RTRIM
|
Identical to the TRIM function, but only removes trailing
characters.
|
SAFE_CONVERT_BYTES_TO_STRING
|
Converts a BYTES value to a STRING value and
replace any invalid UTF-8 characters with the Unicode replacement character,
U+FFFD .
|
SOUNDEX
|
Gets the Soundex codes for words in a STRING value.
|
SPLIT
|
Splits a STRING or BYTES value, using a delimiter.
|
STARTS_WITH
|
Checks if a STRING or BYTES value is a
prefix of another value.
|
STRPOS
|
Finds the position of the first occurrence of a subvalue inside another value. |
SUBSTR
|
Gets a portion of a STRING or BYTES value.
|
SUBSTRING
|
Alias for SUBSTR |
TO_BASE32
|
Converts a BYTES value to a
base32-encoded STRING value.
|
TO_BASE64
|
Converts a BYTES value to a
base64-encoded STRING value.
|
TO_CODE_POINTS
|
Converts a STRING or BYTES value into an array of
extended ASCII code points.
|
TO_HEX
|
Converts a BYTES value to a
hexadecimal STRING value.
|
TRANSLATE
|
Within a value, replaces each source character with the corresponding target character. |
TRIM
|
Removes the specified leading and trailing Unicode code points or bytes
from a STRING or BYTES value.
|
UNICODE
|
Gets the Unicode code point for the first character in a value. |
UPPER
|
Formats alphabetic characters in a STRING value as
uppercase.
Formats ASCII characters in a BYTES value as
uppercase.
|
ASCII
ASCII(value)
Description
Returns the ASCII code for the first character or byte in value
. Returns
0
if value
is empty or the ASCII code is 0
for the first character
or byte.
Return type
INT64
Examples
SELECT ASCII('abcd') as A, ASCII('a') as B, ASCII('') as C, ASCII(NULL) as D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| 97 | 97 | 0 | NULL |
*-------+-------+-------+-------*/
BYTE_LENGTH
BYTE_LENGTH(value)
Description
Gets the number of BYTES
in a STRING
or BYTES
value,
regardless of whether the value is a STRING
or BYTES
type.
Return type
INT64
Examples
SELECT BYTE_LENGTH('абвгд') AS string_example;
/*----------------*
| string_example |
+----------------+
| 10 |
*----------------*/
SELECT BYTE_LENGTH(b'абвгд') AS bytes_example;
/*----------------*
| bytes_example |
+----------------+
| 10 |
*----------------*/
CHAR_LENGTH
CHAR_LENGTH(value)
Description
Gets the number of characters in a STRING
value.
Return type
INT64
Examples
SELECT CHAR_LENGTH('абвгд') AS char_length;
/*-------------*
| char_length |
+-------------+
| 5 |
*------------ */
CHARACTER_LENGTH
CHARACTER_LENGTH(value)
Description
Synonym for CHAR_LENGTH.
Return type
INT64
Examples
SELECT
'абвгд' AS characters,
CHARACTER_LENGTH('абвгд') AS char_length_example
/*------------+---------------------*
| characters | char_length_example |
+------------+---------------------+
| абвгд | 5 |
*------------+---------------------*/
CHR
CHR(value)
Description
Takes a Unicode code point and returns
the character that matches the code point. Each valid code point should fall
within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. Returns an empty string
if the code point is 0
. If an invalid Unicode code point is specified, an
error is returned.
To work with an array of Unicode code points, see
CODE_POINTS_TO_STRING
Return type
STRING
Examples
SELECT CHR(65) AS A, CHR(255) AS B, CHR(513) AS C, CHR(1024) AS D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| A | ÿ | ȁ | Ѐ |
*-------+-------+-------+-------*/
SELECT CHR(97) AS A, CHR(0xF9B5) AS B, CHR(0) AS C, CHR(NULL) AS D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| a | 例 | | NULL |
*-------+-------+-------+-------*/
CODE_POINTS_TO_BYTES
CODE_POINTS_TO_BYTES(ascii_code_points)
Description
Takes an array of extended ASCII
code points
as ARRAY<INT64>
and returns BYTES
.
To convert from BYTES
to an array of code points, see
TO_CODE_POINTS.
Return type
BYTES
Examples
The following is a basic example using CODE_POINTS_TO_BYTES
.
SELECT CODE_POINTS_TO_BYTES([65, 98, 67, 100]) AS bytes;
-- Note that the result of CODE_POINTS_TO_BYTES is of type BYTES, displayed as a base64-encoded string.
-- In BYTES format, b'AbCd' is the result.
/*----------*
| bytes |
+----------+
| QWJDZA== |
*----------*/
The following example uses a rotate-by-13 places (ROT13) algorithm to encode a string.
SELECT CODE_POINTS_TO_BYTES(ARRAY_AGG(
(SELECT
CASE
WHEN chr BETWEEN b'a' and b'z'
THEN TO_CODE_POINTS(b'a')[offset(0)] +
MOD(code+13-TO_CODE_POINTS(b'a')[offset(0)],26)
WHEN chr BETWEEN b'A' and b'Z'
THEN TO_CODE_POINTS(b'A')[offset(0)] +
MOD(code+13-TO_CODE_POINTS(b'A')[offset(0)],26)
ELSE code
END
FROM
(SELECT code, CODE_POINTS_TO_BYTES([code]) chr)
) ORDER BY OFFSET)) AS encoded_string
FROM UNNEST(TO_CODE_POINTS(b'Test String!')) code WITH OFFSET;
-- Note that the result of CODE_POINTS_TO_BYTES is of type BYTES, displayed as a base64-encoded string.
-- In BYTES format, b'Grfg Fgevat!' is the result.
/*------------------*
| encoded_string |
+------------------+
| R3JmZyBGZ2V2YXQh |
*------------------*/
CODE_POINTS_TO_STRING
CODE_POINTS_TO_STRING(unicode_code_points)
Description
Takes an array of Unicode code points
as ARRAY<INT64>
and returns a STRING
.
To convert from a string to an array of code points, see TO_CODE_POINTS.
Return type
STRING
Examples
The following are basic examples using CODE_POINTS_TO_STRING
.
SELECT CODE_POINTS_TO_STRING([65, 255, 513, 1024]) AS string;
/*--------*
| string |
+--------+
| AÿȁЀ |
*--------*/
SELECT CODE_POINTS_TO_STRING([97, 0, 0xF9B5]) AS string;
/*--------*
| string |
+--------+
| a例 |
*--------*/
SELECT CODE_POINTS_TO_STRING([65, 255, NULL, 1024]) AS string;
/*--------*
| string |
+--------+
| NULL |
*--------*/
The following example computes the frequency of letters in a set of words.
WITH Words AS (
SELECT word
FROM UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) AS word
)
SELECT
CODE_POINTS_TO_STRING([code_point]) AS letter,
COUNT(*) AS letter_count
FROM Words,
UNNEST(TO_CODE_POINTS(word)) AS code_point
GROUP BY 1
ORDER BY 2 DESC;
/*--------+--------------*
| letter | letter_count |
+--------+--------------+
| a | 5 |
| f | 3 |
| r | 2 |
| b | 2 |
| l | 2 |
| o | 2 |
| g | 1 |
| z | 1 |
| e | 1 |
| m | 1 |
| i | 1 |
*--------+--------------*/
COLLATE
COLLATE(value, collate_specification)
Takes a STRING
and a collation specification. Returns
a STRING
with a collation specification. If collate_specification
is empty,
returns a value with collation removed from the STRING
.
The collation specification defines how the resulting STRING
can be compared
and sorted. To learn more, see
Working with collation.
collation_specification
must be a string literal, otherwise an error is thrown.- Returns
NULL
ifvalue
isNULL
.
Return type
STRING
Examples
In this example, the weight of a
is less than the weight of Z
. This
is because the collate specification, und:ci
assigns more weight to Z
.
WITH Words AS (
SELECT
COLLATE('a', 'und:ci') AS char1,
COLLATE('Z', 'und:ci') AS char2
)
SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z
FROM Words;
/*----------------*
| a_less_than_Z |
+----------------+
| TRUE |
*----------------*/
In this example, the weight of a
is greater than the weight of Z
. This
is because the default collate specification assigns more weight to a
.
WITH Words AS (
SELECT
'a' AS char1,
'Z' AS char2
)
SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z
FROM Words;
/*----------------*
| a_less_than_Z |
+----------------+
| FALSE |
*----------------*/
CONCAT
CONCAT(value1[, ...])
Description
Concatenates one or more values into a single result. All values must be
BYTES
or data types that can be cast to STRING
.
The function returns NULL
if any input argument is NULL
.
Return type
STRING
or BYTES
Examples
SELECT CONCAT('T.P.', ' ', 'Bar') as author;
/*---------------------*
| author |
+---------------------+
| T.P. Bar |
*---------------------*/
SELECT CONCAT('Summer', ' ', 1923) as release_date;
/*---------------------*
| release_date |
+---------------------+
| Summer 1923 |
*---------------------*/
With Employees AS
(SELECT
'John' AS first_name,
'Doe' AS last_name
UNION ALL
SELECT
'Jane' AS first_name,
'Smith' AS last_name
UNION ALL
SELECT
'Joe' AS first_name,
'Jackson' AS last_name)
SELECT
CONCAT(first_name, ' ', last_name)
AS full_name
FROM Employees;
/*---------------------*
| full_name |
+---------------------+
| John Doe |
| Jane Smith |
| Joe Jackson |
*---------------------*/
CONTAINS_SUBSTR
CONTAINS_SUBSTR(
expression,
search_value_literal
[, json_scope => json_scope_value ]
)
Description
Performs a normalized, case-insensitive search to see if a value exists as a
substring in an expression. Returns TRUE
if the value exists, otherwise
returns FALSE
.
Before values are compared, they are normalized and case folded with NFKC
normalization. Wildcard searches are not
supported.
Arguments
search_value_literal
: The value to search for. It must be aSTRING
literal or aSTRING
constant expression.expression
: The data to search over. The expression can be a column or table reference. A table reference is evaluated as aSTRUCT
whose fields are the columns of the table. A column reference is evaluated as one the following data types:STRING
INT64
BOOL
NUMERIC
BIGNUMERIC
TIMESTAMP
TIME
DATE
DATETIME
ARRAY
STRUCT
When the expression is evaluated, the result is cast to a
STRING
, and then the function looks for the search value in the result.You can perform a cross-field search on an expression that evaluates to a
STRUCT
orARRAY
. If the expression evaluates to aSTRUCT
, the cross-field search is recursive and includes all subfields inside theSTRUCT
.In a cross-field search, each field and subfield is individually converted to a string and searched for the value. The function returns
TRUE
if at least one field includes the search value; otherwise, if at least one field isNULL
, it returnsNULL
; otherwise, if the search value is not found and all fields are non-NULL
, it returnsFALSE
.If the expression is
NULL
, the return value isNULL
.json_scope
: A named argument with aSTRING
value. Takes one of the following values to indicate the scope ofJSON
data to be searched. It has no effect ifexpression
is notJSON
or does not contain aJSON
field.'JSON_VALUES'
: Only theJSON
values are searched. Ifjson_scope
is not provided, this is used by default.'JSON_KEYS'
: Only theJSON
keys are searched.'JSON_KEYS_AND_VALUES'
: TheJSON
keys and values are searched.
Return type
BOOL
Examples
The following query returns TRUE
because this case-insensitive match
was found: blue house
and Blue house
.
SELECT CONTAINS_SUBSTR('the blue house', 'Blue house') AS result;
/*--------*
| result |
+--------+
| true |
*--------*/
The following query returns TRUE
similar to the above example, but in this
case the search value is a constant expression with CONCAT function.
SELECT CONTAINS_SUBSTR('the blue house', CONCAT('Blue ', 'house')) AS result;
/*--------*
| result |
+--------+
| true |
*--------*/
The following query returns FALSE
because blue
was not found
in the red house
.
SELECT CONTAINS_SUBSTR('the red house', 'blue') AS result;
/*--------*
| result |
+--------+
| false |
*--------*/
The following query returns TRUE
because Ⅸ
and IX
represent the same
normalized value.
SELECT '\u2168 day' AS a, 'IX' AS b, CONTAINS_SUBSTR('\u2168', 'IX') AS result;
/*----------------------*
| a | b | result |
+----------------------+
| Ⅸ day | IX | true |
*----------------------*/
The following query returns TRUE
because 35
was found inside a
STRUCT
field.
SELECT CONTAINS_SUBSTR((23, 35, 41), '35') AS result;
/*--------*
| result |
+--------+
| true |
*--------*/
The following query returns TRUE
because jk
was found during a
recursive search inside a STRUCT
.
SELECT CONTAINS_SUBSTR(('abc', ['def', 'ghi', 'jkl'], 'mno'), 'jk');
/*--------*
| result |
+--------+
| true |
*--------*/
The following query returns TRUE
because NULL
s are ignored when
a match is found found inside a STRUCT
or ARRAY
.
SELECT CONTAINS_SUBSTR((23, NULL, 41), '41') AS result;
/*--------*
| result |
+--------+
| true |
*--------*/
The following query returns NULL
because a NULL
existed in a STRUCT
that
did not result in a match.
SELECT CONTAINS_SUBSTR((23, NULL, 41), '35') AS result;
/*--------*
| result |
+--------+
| null |
*--------*/
In the following query, an error is thrown because the search value cannot be
a literal NULL
.
SELECT CONTAINS_SUBSTR('hello', NULL) AS result;
-- Throws an error
The following examples reference a table called Recipes
that you can emulate
with a WITH
clause like this:
WITH Recipes AS
(SELECT 'Blueberry pancakes' as Breakfast, 'Egg salad sandwich' as Lunch, 'Potato dumplings' as Dinner UNION ALL
SELECT 'Potato pancakes', 'Toasted cheese sandwich', 'Beef stroganoff' UNION ALL
SELECT 'Ham scramble', 'Steak avocado salad', 'Tomato pasta' UNION ALL
SELECT 'Avocado toast', 'Tomato soup', 'Blueberry salmon' UNION ALL
SELECT 'Corned beef hash', 'Lentil potato soup', 'Glazed ham')
SELECT * FROM Recipes;
/*-------------------+-------------------------+------------------*
| Breakfast | Lunch | Dinner |
+-------------------+-------------------------+------------------+
| Bluberry pancakes | Egg salad sandwich | Potato dumplings |
| Potato pancakes | Toasted cheese sandwich | Beef stroganoff |
| Ham scramble | Steak avocado salad | Tomato pasta |
| Avocado toast | Tomato soup | Blueberry samon |
| Corned beef hash | Lentil potato soup | Glazed ham |
*-------------------+-------------------------+------------------*/
The following query searches across all columns of the Recipes
table for the
value toast
and returns the rows that contain this value.
SELECT * FROM Recipes WHERE CONTAINS_SUBSTR(Recipes, 'toast');
/*-------------------+-------------------------+------------------*
| Breakfast | Lunch | Dinner |
+-------------------+-------------------------+------------------+
| Potato pancakes | Toasted cheese sandwich | Beef stroganoff |
| Avocado toast | Tomato soup | Blueberry samon |
*-------------------+-------------------------+------------------*/
The following query searches the Lunch
and Dinner
columns of the
Recipe
table for the value potato
and returns the row if either column
contains this value.
SELECT * FROM Recipes WHERE CONTAINS_SUBSTR((Lunch, Dinner), 'potato');
/*-------------------+-------------------------+------------------*
| Breakfast | Lunch | Dinner |
+-------------------+-------------------------+------------------+
| Bluberry pancakes | Egg salad sandwich | Potato dumplings |
| Corned beef hash | Lentil potato soup | Glazed ham |
*-------------------+-------------------------+------------------*/
The following query searches across all columns of the Recipes
table
except for the Lunch
and Dinner
columns. It returns the rows of any
columns other than Lunch
or Dinner
that contain the value potato
.
SELECT *
FROM Recipes
WHERE CONTAINS_SUBSTR(
(SELECT AS STRUCT Recipes.* EXCEPT (Lunch, Dinner)),
'potato'
);
/*-------------------+-------------------------+------------------*
| Breakfast | Lunch | Dinner |
+-------------------+-------------------------+------------------+
| Potato pancakes | Toasted cheese sandwich | Beef stroganoff |
*-------------------+-------------------------+------------------*/
The following query searches for the value lunch
in the JSON
{"lunch":"soup"}
and returns FALSE
because the default json_scope
is
"JSON_VALUES"
, and lunch
is a JSON
key, not a JSON
value.
SELECT CONTAINS_SUBSTR(JSON '{"lunch":"soup"}',"lunch") AS result;
/*--------*
| result |
+--------+
| FALSE |
*--------*/
The following query searches for the value lunch
in the values of the JSON
{"lunch":"soup"}
and returns FALSE
because lunch
is a JSON
key, not a
JSON
value.
SELECT CONTAINS_SUBSTR(JSON '{"lunch":"soup"}',
"lunch",
json_scope=>"JSON_VALUES") AS result;
/*--------*
| result |
+--------+
| FALSE |
*--------*/
The following query searches for the value lunch
in the keys and values of the
JSON {"lunch":"soup"}
and returns TRUE
because lunch
is a JSON
key.
SELECT CONTAINS_SUBSTR(JSON '{"lunch":"soup"}',
"lunch",
json_scope=>"JSON_KEYS_AND_VALUES") AS result;
/*--------*
| result |
+--------+
| TRUE |
*--------*/
The following query searches for the value lunch
in the keys of the JSON
{"lunch":"soup"}
and returns TRUE
because lunch
is a JSON
key.
SELECT CONTAINS_SUBSTR(JSON '{"lunch":"soup"}',
"lunch",
json_scope=>"JSON_KEYS") AS result;
/*--------*
| result |
+--------+
| TRUE |
*--------*/
EDIT_DISTANCE
EDIT_DISTANCE(
value1,
value2,
[ max_distance => max_distance_value ]
)
Description
Computes the Levenshtein distance between two STRING
or
BYTES
values.
Definitions
value1
: The firstSTRING
orBYTES
value to compare.value2
: The secondSTRING
orBYTES
value to compare.max_distance
: A named argument with aINT64
value that is greater than or equal to zero. Represents the maximum distance between the two values to compute.If this distance is exceeded, the function returns this value. The default value for this argument is the maximum size of
value1
andvalue2
.
Details
If value1
or value2
is NULL
, NULL
is returned.
You can only compare values of the same type. Otherwise, an error is produced.
Return type
INT64
Examples
In the following example, the first character in both strings is different:
SELECT EDIT_DISTANCE('a', 'b') AS results;
/*---------*
| results |
+---------+
| 1 |
*---------*/
In the following example, the first and second characters in both strings are different:
SELECT EDIT_DISTANCE('aa', 'b') AS results;
/*---------*
| results |
+---------+
| 2 |
*---------*/
In the following example, only the first character in both strings is different:
SELECT EDIT_DISTANCE('aa', 'ba') AS results;
/*---------*
| results |
+---------+
| 1 |
*---------*/
In the following example, the last six characters are different, but because
the maximum distance is 2
, this function exits early and returns 2
, the
maximum distance:
SELECT EDIT_DISTANCE('abcdefg', 'a', max_distance => 2) AS results;
/*---------*
| results |
+---------+
| 2 |
*---------*/
ENDS_WITH
ENDS_WITH(value, suffix)
Description
Takes two STRING
or BYTES
values. Returns TRUE
if suffix
is a suffix of value
.
This function supports specifying collation.
Return type
BOOL
Examples
SELECT ENDS_WITH('apple', 'e') as example
/*---------*
| example |
+---------+
| True |
*---------*/
FORMAT
FORMAT(format_string_expression, data_type_expression[, ...])
Description
FORMAT
formats a data type expression as a string.
format_string_expression
: Can contain zero or more format specifiers. Each format specifier is introduced by the%
symbol, and must map to one or more of the remaining arguments. In general, this is a one-to-one mapping, except when the*
specifier is present. For example,%.*i
maps to two arguments—a length argument and a signed integer argument. If the number of arguments related to the format specifiers is not the same as the number of arguments, an error occurs.data_type_expression
: The value to format as a string. This can be any GoogleSQL data type.
Return type
STRING
Examples
Description | Statement | Result |
---|---|---|
Simple integer | FORMAT('%d', 10) | 10 |
Integer with left blank padding | FORMAT('|%10d|', 11) | | 11| |
Integer with left zero padding | FORMAT('+%010d+', 12) | +0000000012+ |
Integer with commas | FORMAT("%'d", 123456789) | 123,456,789 |
STRING | FORMAT('-%s-', 'abcd efg') | -abcd efg- |
FLOAT64 | FORMAT('%f %E', 1.1, 2.2) | 1.100000 2.200000E+00 |
DATE | FORMAT('%t', date '2015-09-01') | 2015-09-01 |
TIMESTAMP | FORMAT('%t', timestamp '2015-09-01 12:34:56 America/Los_Angeles') | 2015‑09‑01 19:34:56+00 |
The FORMAT()
function does not provide fully customizable formatting for all
types and values, nor formatting that is sensitive to locale.
If custom formatting is necessary for a type, you must first format it using
type-specific format functions, such as FORMAT_DATE()
or FORMAT_TIMESTAMP()
.
For example:
SELECT FORMAT('date: %s!', FORMAT_DATE('%B %d, %Y', date '2015-01-02'));
Returns
date: January 02, 2015!
Supported format specifiers
%[flags][width][.precision]specifier
A format specifier adds formatting when casting a value to a string. It can optionally contain these sub-specifiers:
Additional information about format specifiers:
- %g and %G behavior
- %p and %P behavior
- %t and %T behavior
- Error conditions
- NULL argument handling
- Additional semantic rules
Format specifiers
Specifier | Description | Examples | Types |
d or i |
Decimal integer | 392 |
INT64 |
o |
Octal
Note: If an INT64 value is negative, an error is produced.
|
610 |
INT64 |
x |
Hexadecimal integer
Note: If an INT64 value is negative, an error is produced.
|
7fa |
INT64 |
X |
Hexadecimal integer (uppercase)
Note: If an INT64 value is negative, an error is produced.
|
7FA |
INT64 |
f |
Decimal notation, in [-](integer part).(fractional part) for finite values, and in lowercase for non-finite values | 392.650000 inf nan |
NUMERIC BIGNUMERIC FLOAT64 |
F |
Decimal notation, in [-](integer part).(fractional part) for finite values, and in uppercase for non-finite values | 392.650000 INF NAN |
NUMERIC BIGNUMERIC FLOAT64 |
e |
Scientific notation (mantissa/exponent), lowercase | 3.926500e+02 inf nan |
NUMERIC BIGNUMERIC FLOAT64 |
E |
Scientific notation (mantissa/exponent), uppercase | 3.926500E+02 INF NAN |
NUMERIC BIGNUMERIC FLOAT64 |
g |
Either decimal notation or scientific notation, depending on the input value's exponent and the specified precision. Lowercase. See %g and %G behavior for details. | 392.65 3.9265e+07 inf nan |
NUMERIC BIGNUMERIC FLOAT64 |
G |
Either decimal notation or scientific notation, depending on the input value's exponent and the specified precision. Uppercase. See %g and %G behavior for details. |
392.65 3.9265E+07 INF NAN |
NUMERIC BIGNUMERIC FLOAT64 |
p |
Produces a one-line printable string representing JSON. See %p and %P behavior. |
{"month":10,"year":2019} |
JSON |
P |
Produces a multi-line printable string representing JSON. See %p and %P behavior. |
{ "month": 10, "year": 2019 } |
JSON |
s |
String of characters | sample |
STRING |
t |
Returns a printable string representing the value. Often looks
similar to casting the argument to STRING .
See %t and %T behavior.
|
sample 2014‑01‑01 |
Any type |
T |
Produces a string that is a valid GoogleSQL constant with a similar type to the value's type (maybe wider, or maybe string). See %t and %T behavior. |
'sample' b'bytes sample' 1234 2.3 date '2014‑01‑01' |
Any type |
% |
'%%' produces a single '%' | % | n/a |
The format specifier can optionally contain the sub-specifiers identified above in the specifier prototype.
These sub-specifiers must comply with the following specifications.
Flags
Flags | Description |
- |
Left-justify within the given field width; Right justification is the default (see width sub-specifier) |
+ |
Forces to precede the result with a plus or minus sign (+
or - ) even for positive numbers. By default, only negative numbers
are preceded with a - sign |
<space> | If no sign is going to be written, a blank space is inserted before the value |
# |
|
0 |
Left-pads the number with zeroes (0) instead of spaces when padding is specified (see width sub-specifier) |
' |
Formats integers using the appropriating grouping character. For example:
This flag is only relevant for decimal, hex, and octal values. |
Flags may be specified in any order. Duplicate flags are not an error. When flags are not relevant for some element type, they are ignored.
Width
Width | Description |
<number> | Minimum number of characters to be printed. If the value to be printed is shorter than this number, the result is padded with blank spaces. The value is not truncated even if the result is larger |
* |
The width is not specified in the format string, but as an additional integer value argument preceding the argument that has to be formatted |
Precision
Precision | Description |
. <number> |
|
.* |
The precision is not specified in the format string, but as an additional integer value argument preceding the argument that has to be formatted |
%g and %G behavior
The %g
and %G
format specifiers choose either the decimal notation (like
the %f
and %F
specifiers) or the scientific notation (like the %e
and %E
specifiers), depending on the input value's exponent and the specified
precision.
Let p stand for the specified precision (defaults to 6; 1 if the specified precision is less than 1). The input value is first converted to scientific notation with precision = (p - 1). If the resulting exponent part x is less than -4 or no less than p, the scientific notation with precision = (p - 1) is used; otherwise the decimal notation with precision = (p - 1 - x) is used.
Unless #
flag is present, the trailing zeros after the decimal point
are removed, and the decimal point is also removed if there is no digit after
it.
%p and %P behavior
The %p
format specifier produces a one-line printable string. The %P
format specifier produces a multi-line printable string. You can use these
format specifiers with the following data types:
Type | %p | %P |
JSON |
JSON input: JSON ' { "month": 10, "year": 2019 } ' Produces a one-line printable string representing JSON: {"month":10,"year":2019} |
JSON input: JSON ' { "month": 10, "year": 2019 } ' Produces a multi-line printable string representing JSON: { "month": 10, "year": 2019 } |
%t and %T behavior
The %t
and %T
format specifiers are defined for all types. The
width, precision, and flags act as they do
for %s
: the width is the minimum width and the STRING
will be
padded to that size, and precision is the maximum width
of content to show and the STRING
will be truncated to that size, prior to
padding to width.
The %t
specifier is always meant to be a readable form of the value.
The %T
specifier is always a valid SQL literal of a similar type, such as a
wider numeric type.
The literal will not include casts or a type name, except for the special case
of non-finite floating point values.
The STRING
is formatted as follows:
Type | %t | %T |
NULL of any type |
NULL | NULL |
INT64 |
123 | 123 |
NUMERIC | 123.0 (always with .0) | NUMERIC "123.0" |
FLOAT64 |
123.0 (always with .0) 123e+10 inf -inf NaN
|
123.0 (always with .0) 123e+10 CAST("inf" AS <type>) CAST("-inf" AS <type>) CAST("nan" AS <type>) |
STRING | unquoted string value | quoted string literal |
BYTES |
unquoted escaped bytes e.g., abc\x01\x02 |
quoted bytes literal e.g., b"abc\x01\x02" |
BOOL | boolean value | boolean value |
DATE | 2011-02-03 | DATE "2011-02-03" |
TIMESTAMP | 2011-02-03 04:05:06+00 | TIMESTAMP "2011-02-03 04:05:06+00" |
INTERVAL | 1-2 3 4:5:6.789 | INTERVAL "1-2 3 4:5:6.789" YEAR TO SECOND |
ARRAY | [value, value, ...] where values are formatted with %t |
[value, value, ...] where values are formatted with %T |
STRUCT | (value, value, ...) where fields are formatted with %t |
(value, value, ...) where fields are formatted with %T Special cases: Zero fields: STRUCT() One field: STRUCT(value) |
JSON |
one-line printable string representing JSON.{"name":"apple","stock":3} |
one-line printable string representing a JSON literal.JSON '{"name":"apple","stock":3}' |
Error conditions
If a format specifier is invalid, or is not compatible with the related
argument type, or the wrong number or arguments are provided, then an error is
produced. For example, the following <format_string>
expressions are invalid:
FORMAT('%s', 1)
FORMAT('%')
NULL argument handling
A NULL
format string results in a NULL
output STRING
. Any other arguments
are ignored in this case.
The function generally produces a NULL
value if a NULL
argument is present.
For example, FORMAT('%i', NULL_expression)
produces a NULL STRING
as
output.
However, there are some exceptions: if the format specifier is %t or %T
(both of which produce STRING
s that effectively match CAST and literal value
semantics), a NULL
value produces 'NULL' (without the quotes) in the result
STRING
. For example, the function:
FORMAT('00-%t-00', NULL_expression);
Returns
00-NULL-00
Additional semantic rules
FLOAT64
values can be +/-inf
or NaN
.
When an argument has one of those values, the result of the format specifiers
%f
, %F
, %e
, %E
, %g
, %G
, and %t
are inf
, -inf
, or nan
(or the same in uppercase) as appropriate. This is consistent with how
GoogleSQL casts these values to STRING
. For %T
,
GoogleSQL returns quoted strings for
FLOAT64
values that don't have non-string literal
representations.
FROM_BASE32
FROM_BASE32(string_expr)
Description
Converts the base32-encoded input string_expr
into BYTES
format. To convert
BYTES
to a base32-encoded STRING
, use TO_BASE32.
Return type
BYTES
Example
SELECT FROM_BASE32('MFRGGZDF74======') AS byte_data;
-- Note that the result of FROM_BASE32 is of type BYTES, displayed as a base64-encoded string.
/*-----------*
| byte_data |
+-----------+
| YWJjZGX/ |
*-----------*/
FROM_BASE64
FROM_BASE64(string_expr)
Description
Converts the base64-encoded input string_expr
into
BYTES
format. To convert
BYTES
to a base64-encoded STRING
,
use [TO_BASE64][string-link-to-base64].
There are several base64 encodings in common use that vary in exactly which
alphabet of 65 ASCII characters are used to encode the 64 digits and padding.
See RFC 4648 for details. This
function expects the alphabet [A-Za-z0-9+/=]
.
Return type
BYTES
Example
SELECT FROM_BASE64('/+A=') AS byte_data;
-- Note that the result of FROM_BASE64 is of type BYTES, displayed as a base64-encoded string.
/*-----------*
| byte_data |
+-----------+
| /+A= |
*-----------*/
To work with an encoding using a different base64 alphabet, you might need to
compose FROM_BASE64
with the REPLACE
function. For instance, the
base64url
url-safe and filename-safe encoding commonly used in web programming
uses -_=
as the last characters rather than +/=
. To decode a
base64url
-encoded string, replace -
and _
with +
and /
respectively.
SELECT FROM_BASE64(REPLACE(REPLACE('_-A=', '-', '+'), '_', '/')) AS binary;
-- Note that the result of FROM_BASE64 is of type BYTES, displayed as a base64-encoded string.
/*--------*
| binary |
+--------+
| /+A= |
*--------*/
FROM_HEX
FROM_HEX(string)
Description
Converts a hexadecimal-encoded STRING
into BYTES
format. Returns an error
if the input STRING
contains characters outside the range
(0..9, A..F, a..f)
. The lettercase of the characters does not matter. If the
input STRING
has an odd number of characters, the function acts as if the
input has an additional leading 0
. To convert BYTES
to a hexadecimal-encoded
STRING
, use TO_HEX.
Return type
BYTES
Example
WITH Input AS (
SELECT '00010203aaeeefff' AS hex_str UNION ALL
SELECT '0AF' UNION ALL
SELECT '666f6f626172'
)
SELECT hex_str, FROM_HEX(hex_str) AS bytes_str
FROM Input;
-- Note that the result of FROM_HEX is of type BYTES, displayed as a base64-encoded string.
/*------------------+--------------*
| hex_str | bytes_str |
+------------------+--------------+
| 0AF | AAECA6ru7/8= |
| 00010203aaeeefff | AK8= |
| 666f6f626172 | Zm9vYmFy |
*------------------+--------------*/
INITCAP
INITCAP(value[, delimiters])
Description
Takes a STRING
and returns it with the first character in each word in
uppercase and all other characters in lowercase. Non-alphabetic characters
remain the same.
delimiters
is an optional string argument that is used to override the default
set of characters used to separate words. If delimiters
is not specified, it
defaults to the following characters:
<whitespace> [ ] ( ) { } / | \ < > ! ? @ " ^ # $ & ~ _ , . : ; * % + -
If value
or delimiters
is NULL
, the function returns NULL
.
Return type
STRING
Examples
SELECT
'Hello World-everyone!' AS value,
INITCAP('Hello World-everyone!') AS initcap_value
/*-------------------------------+-------------------------------*
| value | initcap_value |
+-------------------------------+-------------------------------+
| Hello World-everyone! | Hello World-Everyone! |
*-------------------------------+-------------------------------*/
SELECT
'Apples1oranges2pears' as value,
'12' AS delimiters,
INITCAP('Apples1oranges2pears' , '12') AS initcap_value
/*----------------------+------------+----------------------*
| value | delimiters | initcap_value |
+----------------------+------------+----------------------+
| Apples1oranges2pears | 12 | Apples1Oranges2Pears |
*----------------------+------------+----------------------*/
INSTR
INSTR(value, subvalue[, position[, occurrence]])
Description
Returns the lowest 1-based position of subvalue
in value
.
value
and subvalue
must be the same type, either
STRING
or BYTES
.
If position
is specified, the search starts at this position in
value
, otherwise it starts at 1
, which is the beginning of
value
. If position
is negative, the function searches backwards
from the end of value
, with -1
indicating the last character.
position
is of type INT64
and cannot be 0
.
If occurrence
is specified, the search returns the position of a specific
instance of subvalue
in value
. If not specified, occurrence
defaults to 1
and returns the position of the first occurrence.
For occurrence
> 1
, the function includes overlapping occurrences.
occurrence
is of type INT64
and must be positive.
This function supports specifying collation.
Returns 0
if:
- No match is found.
- If
occurrence
is greater than the number of matches found. - If
position
is greater than the length ofvalue
.
Returns NULL
if:
- Any input argument is
NULL
.
Returns an error if:
position
is0
.occurrence
is0
or negative.
Return type
INT64
Examples
SELECT
'banana' AS value, 'an' AS subvalue, 1 AS position, 1 AS occurrence,
INSTR('banana', 'an', 1, 1) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| banana | an | 1 | 1 | 2 |
*--------------+--------------+----------+------------+-------*/
SELECT
'banana' AS value, 'an' AS subvalue, 1 AS position, 2 AS occurrence,
INSTR('banana', 'an', 1, 2) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| banana | an | 1 | 2 | 4 |
*--------------+--------------+----------+------------+-------*/
SELECT
'banana' AS value, 'an' AS subvalue, 1 AS position, 3 AS occurrence,
INSTR('banana', 'an', 1, 3) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| banana | an | 1 | 3 | 0 |
*--------------+--------------+----------+------------+-------*/
SELECT
'banana' AS value, 'an' AS subvalue, 3 AS position, 1 AS occurrence,
INSTR('banana', 'an', 3, 1) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| banana | an | 3 | 1 | 4 |
*--------------+--------------+----------+------------+-------*/
SELECT
'banana' AS value, 'an' AS subvalue, -1 AS position, 1 AS occurrence,
INSTR('banana', 'an', -1, 1) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| banana | an | -1 | 1 | 4 |
*--------------+--------------+----------+------------+-------*/
SELECT
'banana' AS value, 'an' AS subvalue, -3 AS position, 1 AS occurrence,
INSTR('banana', 'an', -3, 1) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| banana | an | -3 | 1 | 4 |
*--------------+--------------+----------+------------+-------*/
SELECT
'banana' AS value, 'ann' AS subvalue, 1 AS position, 1 AS occurrence,
INSTR('banana', 'ann', 1, 1) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| banana | ann | 1 | 1 | 0 |
*--------------+--------------+----------+------------+-------*/
SELECT
'helloooo' AS value, 'oo' AS subvalue, 1 AS position, 1 AS occurrence,
INSTR('helloooo', 'oo', 1, 1) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| helloooo | oo | 1 | 1 | 5 |
*--------------+--------------+----------+------------+-------*/
SELECT
'helloooo' AS value, 'oo' AS subvalue, 1 AS position, 2 AS occurrence,
INSTR('helloooo', 'oo', 1, 2) AS instr;
/*--------------+--------------+----------+------------+-------*
| value | subvalue | position | occurrence | instr |
+--------------+--------------+----------+------------+-------+
| helloooo | oo | 1 | 2 | 6 |
*--------------+--------------+----------+------------+-------*/
LEFT
LEFT(value, length)
Description
Returns a STRING
or BYTES
value that consists of the specified
number of leftmost characters or bytes from value
. The length
is an
INT64
that specifies the length of the returned
value. If value
is of type BYTES
, length
is the number of leftmost bytes
to return. If value
is STRING
, length
is the number of leftmost characters
to return.
If length
is 0, an empty STRING
or BYTES
value will be
returned. If length
is negative, an error will be returned. If length
exceeds the number of characters or bytes from value
, the original value
will be returned.
Return type
STRING
or BYTES
Examples
SELECT LEFT('banana', 3) AS results
/*---------*
| results |
+--------+
| ban |
*---------*/
SELECT LEFT(b'\xab\xcd\xef\xaa\xbb', 3) AS results
-- Note that the result of LEFT is of type BYTES, displayed as a base64-encoded string.
/*---------*
| results |
+---------+
| q83v |
*---------*/
LENGTH
LENGTH(value)
Description
Returns the length of the STRING
or BYTES
value. The returned
value is in characters for STRING
arguments and in bytes for the BYTES
argument.
Return type
INT64
Examples
SELECT
LENGTH('абвгд') AS string_example,
LENGTH(CAST('абвгд' AS BYTES)) AS bytes_example;
/*----------------+---------------*
| string_example | bytes_example |
+----------------+---------------+
| 5 | 10 |
*----------------+---------------*/
LOWER
LOWER(value)
Description
For STRING
arguments, returns the original string with all alphabetic
characters in lowercase. Mapping between lowercase and uppercase is done
according to the
Unicode Character Database
without taking into account language-specific mappings.
For BYTES
arguments, the argument is treated as ASCII text, with all bytes
greater than 127 left intact.
Return type
STRING
or BYTES
Examples
SELECT
LOWER('FOO BAR BAZ') AS example
FROM items;
/*-------------*
| example |
+-------------+
| foo bar baz |
*-------------*/
LPAD
LPAD(original_value, return_length[, pattern])
Description
Returns a STRING
or BYTES
value that consists of original_value
prepended
with pattern
. The return_length
is an INT64
that
specifies the length of the returned value. If original_value
is of type
BYTES
, return_length
is the number of bytes. If original_value
is
of type STRING
, return_length
is the number of characters.
The default value of pattern
is a blank space.
Both original_value
and pattern
must be the same data type.
If return_length
is less than or equal to the original_value
length, this
function returns the original_value
value, truncated to the value of
return_length
. For example, LPAD('hello world', 7);
returns 'hello w'
.
If original_value
, return_length
, or pattern
is NULL
, this function
returns NULL
.
This function returns an error if:
return_length
is negativepattern
is empty
Return type
STRING
or BYTES
Examples
SELECT FORMAT('%T', LPAD('c', 5)) AS results
/*---------*
| results |
+---------+
| " c" |
*---------*/
SELECT LPAD('b', 5, 'a') AS results
/*---------*
| results |
+---------+
| aaaab |
*---------*/
SELECT LPAD('abc', 10, 'ghd') AS results
/*------------*
| results |
+------------+
| ghdghdgabc |
*------------*/
SELECT LPAD('abc', 2, 'd') AS results
/*---------*
| results |
+---------+
| ab |
*---------*/
SELECT FORMAT('%T', LPAD(b'abc', 10, b'ghd')) AS results
/*---------------*
| results |
+---------------+
| b"ghdghdgabc" |
*---------------*/
LTRIM
LTRIM(value1[, value2])
Description
Identical to TRIM, but only removes leading characters.
Return type
STRING
or BYTES
Examples
SELECT CONCAT('#', LTRIM(' apple '), '#') AS example
/*-------------*
| example |
+-------------+
| #apple # |
*-------------*/
SELECT LTRIM('***apple***', '*') AS example
/*-----------*
| example |
+-----------+
| apple*** |
*-----------*/
SELECT LTRIM('xxxapplexxx', 'xyz') AS example
/*-----------*
| example |
+-----------+
| applexxx |
*-----------*/
NORMALIZE
NORMALIZE(value[, normalization_mode])
Description
Takes a string value and returns it as a normalized string. If you do not
provide a normalization mode, NFC
is used.
Normalization is used to ensure that two strings are equivalent. Normalization is often used in situations in which two strings render the same on the screen but have different Unicode code points.
NORMALIZE
supports four optional normalization modes:
Value | Name | Description |
---|---|---|
NFC |
Normalization Form Canonical Composition | Decomposes and recomposes characters by canonical equivalence. |
NFKC |
Normalization Form Compatibility Composition | Decomposes characters by compatibility, then recomposes them by canonical equivalence. |
NFD |
Normalization Form Canonical Decomposition | Decomposes characters by canonical equivalence, and multiple combining characters are arranged in a specific order. |
NFKD |
Normalization Form Compatibility Decomposition | Decomposes characters by compatibility, and multiple combining characters are arranged in a specific order. |
Return type
STRING
Examples
The following example normalizes different language characters:
SELECT
NORMALIZE('\u00ea') as a,
NORMALIZE('\u0065\u0302') as b,
NORMALIZE('\u00ea') = NORMALIZE('\u0065\u0302') as normalized;
/*---+---+------------*
| a | b | normalized |
+---+---+------------+
| ê | ê | TRUE |
*---+---+------------*/
The following examples normalize different space characters:
SELECT NORMALIZE('Raha\u2004Mahan', NFKC) AS normalized_name
/*-----------------*
| normalized_name |
+-----------------+
| Raha Mahan |
*-----------------*/
SELECT NORMALIZE('Raha\u2005Mahan', NFKC) AS normalized_name
/*-----------------*
| normalized_name |
+-----------------+
| Raha Mahan |
*-----------------*/
SELECT NORMALIZE('Raha\u2006Mahan', NFKC) AS normalized_name
/*-----------------*
| normalized_name |
+-----------------+
| Raha Mahan |
*-----------------*/
SELECT NORMALIZE('Raha Mahan', NFKC) AS normalized_name
/*-----------------*
| normalized_name |
+-----------------+
| Raha Mahan |
*-----------------*/
NORMALIZE_AND_CASEFOLD
NORMALIZE_AND_CASEFOLD(value[, normalization_mode])
Description
Takes a string value and returns it as a normalized string. If you do not
provide a normalization mode, NFC
is used.
Normalization is used to ensure that two strings are equivalent. Normalization is often used in situations in which two strings render the same on the screen but have different Unicode code points.
Case folding is used for the caseless
comparison of strings. If you need to compare strings and case should not be
considered, use NORMALIZE_AND_CASEFOLD
, otherwise use
NORMALIZE
.
NORMALIZE_AND_CASEFOLD
supports four optional normalization modes:
Value | Name | Description |
---|---|---|
NFC |
Normalization Form Canonical Composition | Decomposes and recomposes characters by canonical equivalence. |
NFKC |
Normalization Form Compatibility Composition | Decomposes characters by compatibility, then recomposes them by canonical equivalence. |
NFD |
Normalization Form Canonical Decomposition | Decomposes characters by canonical equivalence, and multiple combining characters are arranged in a specific order. |
NFKD |
Normalization Form Compatibility Decomposition | Decomposes characters by compatibility, and multiple combining characters are arranged in a specific order. |
Return type
STRING
Examples
SELECT
NORMALIZE('The red barn') = NORMALIZE('The Red Barn') AS normalized,
NORMALIZE_AND_CASEFOLD('The red barn')
= NORMALIZE_AND_CASEFOLD('The Red Barn') AS normalized_with_case_folding;
/*------------+------------------------------*
| normalized | normalized_with_case_folding |
+------------+------------------------------+
| FALSE | TRUE |
*------------+------------------------------*/
SELECT
'\u2168' AS a,
'IX' AS b,
NORMALIZE_AND_CASEFOLD('\u2168', NFD)=NORMALIZE_AND_CASEFOLD('IX', NFD) AS nfd,
NORMALIZE_AND_CASEFOLD('\u2168', NFC)=NORMALIZE_AND_CASEFOLD('IX', NFC) AS nfc,
NORMALIZE_AND_CASEFOLD('\u2168', NFKD)=NORMALIZE_AND_CASEFOLD('IX', NFKD) AS nkfd,
NORMALIZE_AND_CASEFOLD('\u2168', NFKC)=NORMALIZE_AND_CASEFOLD('IX', NFKC) AS nkfc;
/*---+----+-------+-------+------+------*
| a | b | nfd | nfc | nkfd | nkfc |
+---+----+-------+-------+------+------+
| Ⅸ | IX | false | false | true | true |
*---+----+-------+-------+------+------*/
SELECT
'\u0041\u030A' AS a,
'\u00C5' AS b,
NORMALIZE_AND_CASEFOLD('\u0041\u030A', NFD)=NORMALIZE_AND_CASEFOLD('\u00C5', NFD) AS nfd,
NORMALIZE_AND_CASEFOLD('\u0041\u030A', NFC)=NORMALIZE_AND_CASEFOLD('\u00C5', NFC) AS nfc,
NORMALIZE_AND_CASEFOLD('\u0041\u030A', NFKD)=NORMALIZE_AND_CASEFOLD('\u00C5', NFKD) AS nkfd,
NORMALIZE_AND_CASEFOLD('\u0041\u030A', NFKC)=NORMALIZE_AND_CASEFOLD('\u00C5', NFKC) AS nkfc;
/*---+----+-------+-------+------+------*
| a | b | nfd | nfc | nkfd | nkfc |
+---+----+-------+-------+------+------+
| Å | Å | true | true | true | true |
*---+----+-------+-------+------+------*/
OCTET_LENGTH
OCTET_LENGTH(value)
Alias for BYTE_LENGTH
.
REGEXP_CONTAINS
REGEXP_CONTAINS(value, regexp)
Description
Returns TRUE
if value
is a partial match for the regular expression,
regexp
.
If the regexp
argument is invalid, the function returns an error.
You can search for a full match by using ^
(beginning of text) and $
(end of
text). Due to regular expression operator precedence, it is good practice to use
parentheses around everything between ^
and $
.
Return type
BOOL
Examples
The following queries check to see if an email is valid:
SELECT
'foo@example.com' AS email,
REGEXP_CONTAINS('foo@example.com', r'@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+') AS is_valid
/*-----------------+----------*
| email | is_valid |
+-----------------+----------+
| foo@example.com | TRUE |
*-----------------+----------*/
```
```sql
SELECT
'www.example.net' AS email,
REGEXP_CONTAINS('www.example.net', r'@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+') AS is_valid
/*-----------------+----------*
| email | is_valid |
+-----------------+----------+
| www.example.net | FALSE |
*-----------------+----------*/
```
The following queries check to see if an email is valid. They
perform a full match, using `^` and `$`. Due to regular expression operator
precedence, it is good practice to use parentheses around everything between `^`
and `$`.
```sql
SELECT
'a@foo.com' AS email,
REGEXP_CONTAINS('a@foo.com', r'^([\w.+-]+@foo\.com|[\w.+-]+@bar\.org)$') AS valid_email_address,
REGEXP_CONTAINS('a@foo.com', r'^[\w.+-]+@foo\.com|[\w.+-]+@bar\.org$') AS without_parentheses;
/*----------------+---------------------+---------------------*
| email | valid_email_address | without_parentheses |
+----------------+---------------------+---------------------+
| a@foo.com | true | true |
*----------------+---------------------+---------------------*/
SELECT
'a@foo.computer' AS email,
REGEXP_CONTAINS('a@foo.computer', r'^([\w.+-]+@foo\.com|[\w.+-]+@bar\.org)$') AS valid_email_address,
REGEXP_CONTAINS('a@foo.computer', r'^[\w.+-]+@foo\.com|[\w.+-]+@bar\.org$') AS without_parentheses;
/*----------------+---------------------+---------------------*
| email | valid_email_address | without_parentheses |
+----------------+---------------------+---------------------+
| a@foo.computer | false | true |
*----------------+---------------------+---------------------*/
SELECT
'b@bar.org' AS email,
REGEXP_CONTAINS('b@bar.org', r'^([\w.+-]+@foo\.com|[\w.+-]+@bar\.org)$') AS valid_email_address,
REGEXP_CONTAINS('b@bar.org', r'^[\w.+-]+@foo\.com|[\w.+-]+@bar\.org$') AS without_parentheses;
/*----------------+---------------------+---------------------*
| email | valid_email_address | without_parentheses |
+----------------+---------------------+---------------------+
| b@bar.org | true | true |
*----------------+---------------------+---------------------*/
SELECT
'!b@bar.org' AS email,
REGEXP_CONTAINS('!b@bar.org', r'^([\w.+-]+@foo\.com|[\w.+-]+@bar\.org)$') AS valid_email_address,
REGEXP_CONTAINS('!b@bar.org', r'^[\w.+-]+@foo\.com|[\w.+-]+@bar\.org$') AS without_parentheses;
/*----------------+---------------------+---------------------*
| email | valid_email_address | without_parentheses |
+----------------+---------------------+---------------------+
| !b@bar.org | false | true |
*----------------+---------------------+---------------------*/
SELECT
'c@buz.net' AS email,
REGEXP_CONTAINS('c@buz.net', r'^([\w.+-]+@foo\.com|[\w.+-]+@bar\.org)$') AS valid_email_address,
REGEXP_CONTAINS('c@buz.net', r'^[\w.+-]+@foo\.com|[\w.+-]+@bar\.org$') AS without_parentheses;
/*----------------+---------------------+---------------------*
| email | valid_email_address | without_parentheses |
+----------------+---------------------+---------------------+
| c@buz.net | false | false |
*----------------+---------------------+---------------------*/
REGEXP_EXTRACT
REGEXP_EXTRACT(value, regexp[, position[, occurrence]])
Description
Returns the substring in value
that matches the
re2 regular expression, regexp
.
Returns NULL
if there is no match.
If the regular expression contains a capturing group ((...)
), and there is a
match for that capturing group, that match is returned. If there
are multiple matches for a capturing group, the first match is returned.
If position
is specified, the search starts at this
position in value
, otherwise it starts at the beginning of value
. The
position
must be a positive integer and cannot be 0. If position
is greater
than the length of value
, NULL
is returned.
If occurrence
is specified, the search returns a specific occurrence of the
regexp
in value
, otherwise returns the first match. If occurrence
is
greater than the number of matches found, NULL
is returned. For
occurrence
> 1, the function searches for additional occurrences beginning
with the character following the previous occurrence.
Returns an error if:
- The regular expression is invalid
- The regular expression has more than one capturing group
- The
position
is not a positive integer - The
occurrence
is not a positive integer
Return type
STRING
or BYTES
Examples
SELECT REGEXP_EXTRACT('foo@example.com', r'^[a-zA-Z0-9_.+-]+') AS user_name
/*-----------*
| user_name |
+-----------+
| foo |
*-----------*/
SELECT REGEXP_EXTRACT('foo@example.com', r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.([a-zA-Z0-9-.]+$)')
/*------------------*
| top_level_domain |
+------------------+
| com |
*------------------*/
SELECT
REGEXP_EXTRACT('ab', '.b') AS result_a,
REGEXP_EXTRACT('ab', '(.)b') AS result_b,
REGEXP_EXTRACT('xyztb', '(.)+b') AS result_c,
REGEXP_EXTRACT('ab', '(z)?b') AS result_d
/*-------------------------------------------*
| result_a | result_b | result_c | result_d |
+-------------------------------------------+
| ab | a | t | NULL |
*-------------------------------------------*/
WITH example AS
(SELECT 'Hello Helloo and Hellooo' AS value, 'H?ello+' AS regex, 1 as position,
1 AS occurrence UNION ALL
SELECT 'Hello Helloo and Hellooo', 'H?ello+', 1, 2 UNION ALL
SELECT 'Hello Helloo and Hellooo', 'H?ello+', 1, 3 UNION ALL
SELECT 'Hello Helloo and Hellooo', 'H?ello+', 1, 4 UNION ALL
SELECT 'Hello Helloo and Hellooo', 'H?ello+', 2, 1 UNION ALL
SELECT 'Hello Helloo and Hellooo', 'H?ello+', 3, 1 UNION ALL
SELECT 'Hello Helloo and Hellooo', 'H?ello+', 3, 2 UNION ALL
SELECT 'Hello Helloo and Hellooo', 'H?ello+', 3, 3 UNION ALL
SELECT 'Hello Helloo and Hellooo', 'H?ello+', 20, 1 UNION ALL
SELECT 'cats&dogs&rabbits' ,'\\w+&', 1, 2 UNION ALL
SELECT 'cats&dogs&rabbits', '\\w+&', 2, 3
)
SELECT value, regex, position, occurrence, REGEXP_EXTRACT(value, regex,
position, occurrence) AS regexp_value FROM example;
/*--------------------------+---------+----------+------------+--------------*
| value | regex | position | occurrence | regexp_value |
+--------------------------+---------+----------+------------+--------------+
| Hello Helloo and Hellooo | H?ello+ | 1 | 1 | Hello |
| Hello Helloo and Hellooo | H?ello+ | 1 | 2 | Helloo |
| Hello Helloo and Hellooo | H?ello+ | 1 | 3 | Hellooo |
| Hello Helloo and Hellooo | H?ello+ | 1 | 4 | NULL |
| Hello Helloo and Hellooo | H?ello+ | 2 | 1 | ello |
| Hello Helloo and Hellooo | H?ello+ | 3 | 1 | Helloo |
| Hello Helloo and Hellooo | H?ello+ | 3 | 2 | Hellooo |
| Hello Helloo and Hellooo | H?ello+ | 3 | 3 | NULL |
| Hello Helloo and Hellooo | H?ello+ | 20 | 1 | NULL |
| cats&dogs&rabbits | \w+& | 1 | 2 | dogs& |
| cats&dogs&rabbits | \w+& | 2 | 3 | NULL |
*--------------------------+---------+----------+------------+--------------*/
REGEXP_EXTRACT_ALL
REGEXP_EXTRACT_ALL(value, regexp)
Description
Returns an array of all substrings of value
that match the
re2 regular expression, regexp
. Returns an empty array
if there is no match.
If the regular expression contains a capturing group ((...)
), and there is a
match for that capturing group, that match is added to the results.
The REGEXP_EXTRACT_ALL
function only returns non-overlapping matches. For
example, using this function to extract ana
from banana
returns only one
substring, not two.
Returns an error if:
- The regular expression is invalid
- The regular expression has more than one capturing group
Return type
ARRAY<STRING>
or ARRAY<BYTES>
Examples
SELECT REGEXP_EXTRACT_ALL('Try `func(x)` or `func(y)`', '`(.+?)`') AS example
/*--------------------*
| example |
+--------------------+
| [func(x), func(y)] |
*--------------------*/
REGEXP_INSTR
REGEXP_INSTR(source_value, regexp [, position[, occurrence, [occurrence_position]]])
Description
Returns the lowest 1-based position of a regular expression, regexp
, in
source_value
. source_value
and regexp
must be the same type, either
STRING
or BYTES
.
If position
is specified, the search starts at this position in
source_value
, otherwise it starts at 1
, which is the beginning of
source_value
. position
is of type INT64
and must be positive.
If occurrence
is specified, the search returns the position of a specific
instance of regexp
in source_value
. If not specified, occurrence
defaults
to 1
and returns the position of the first occurrence. For occurrence
> 1,
the function searches for the next, non-overlapping occurrence.
occurrence
is of type INT64
and must be positive.
You can optionally use occurrence_position
to specify where a position
in relation to an occurrence
starts. Your choices are:
0
: Returns the start position ofoccurrence
.1
: Returns the end position ofoccurrence
+1
. If the end of the occurrence is at the end ofsource_value
,LENGTH(source_value) + 1
is returned.
Returns 0
if:
- No match is found.
- If
occurrence
is greater than the number of matches found. - If
position
is greater than the length ofsource_value
. - The regular expression is empty.
Returns NULL
if:
position
isNULL
.occurrence
isNULL
.
Returns an error if:
position
is0
or negative.occurrence
is0
or negative.occurrence_position
is neither0
nor1
.- The regular expression is invalid.
- The regular expression has more than one capturing group.
Return type
INT64
Examples
SELECT
REGEXP_INSTR('ab@cd-ef', '@[^-]*') AS instr_a,
REGEXP_INSTR('ab@d-ef', '@[^-]*') AS instr_b,
REGEXP_INSTR('abc@cd-ef', '@[^-]*') AS instr_c,
REGEXP_INSTR('abc-ef', '@[^-]*') AS instr_d,
/*---------------------------------------*
| instr_a | instr_b | instr_c | instr_d |
+---------------------------------------+
| 3 | 3 | 4 | 0 |
*---------------------------------------*/
SELECT
REGEXP_INSTR('a@cd-ef b@cd-ef', '@[^-]*', 1) AS instr_a,
REGEXP_INSTR('a@cd-ef b@cd-ef', '@[^-]*', 2) AS instr_b,
REGEXP_INSTR('a@cd-ef b@cd-ef', '@[^-]*', 3) AS instr_c,
REGEXP_INSTR('a@cd-ef b@cd-ef', '@[^-]*', 4) AS instr_d,
/*---------------------------------------*
| instr_a | instr_b | instr_c | instr_d |
+---------------------------------------+
| 2 | 2 | 10 | 10 |
*---------------------------------------*/
SELECT
REGEXP_INSTR('a@cd-ef b@cd-ef c@cd-ef', '@[^-]*', 1, 1) AS instr_a,
REGEXP_INSTR('a@cd-ef b@cd-ef c@cd-ef', '@[^-]*', 1, 2) AS instr_b,
REGEXP_INSTR('a@cd-ef b@cd-ef c@cd-ef', '@[^-]*', 1, 3) AS instr_c
/*-----------------------------*
| instr_a | instr_b | instr_c |
+-----------------------------+
| 2 | 10 | 18 |
*-----------------------------*/
SELECT
REGEXP_INSTR('a@cd-ef', '@[^-]*', 1, 1, 0) AS instr_a,
REGEXP_INSTR('a@cd-ef', '@[^-]*', 1, 1, 1) AS instr_b
/*-------------------*
| instr_a | instr_b |
+-------------------+
| 2 | 5 |
*-------------------*/
REGEXP_REPLACE
REGEXP_REPLACE(value, regexp, replacement)
Description
Returns a STRING
where all substrings of value
that
match regular expression regexp
are replaced with replacement
.
You can use backslashed-escaped digits (\1 to \9) within the replacement
argument to insert text matching the corresponding parenthesized group in the
regexp
pattern. Use \0 to refer to the entire matching text.
To add a backslash in your regular expression, you must first escape it. For
example, SELECT REGEXP_REPLACE('abc', 'b(.)', 'X\\1');
returns aXc
. You can
also use raw strings to remove one layer of
escaping, for example SELECT REGEXP_REPLACE('abc', 'b(.)', r'X\1');
.
The REGEXP_REPLACE
function only replaces non-overlapping matches. For
example, replacing ana
within banana
results in only one replacement, not
two.
If the regexp
argument is not a valid regular expression, this function
returns an error.
Return type
STRING
or BYTES
Examples
SELECT REGEXP_REPLACE('# Heading', r'^# ([a-zA-Z0-9\s]+$)', '<h1>\\1</h1>') AS html
/*--------------------------*
| html |
+--------------------------+
| <h1>Heading</h1> |
*--------------------------*/
REGEXP_SUBSTR
REGEXP_SUBSTR(value, regexp[, position[, occurrence]])
Description
Synonym for REGEXP_EXTRACT.
Return type
STRING
or BYTES
Examples
WITH example AS
(SELECT 'Hello World Helloo' AS value, 'H?ello+' AS regex, 1 AS position, 1 AS
occurrence
)
SELECT value, regex, position, occurrence, REGEXP_SUBSTR(value, regex,
position, occurrence) AS regexp_value FROM example;
/*--------------------+---------+----------+------------+--------------*
| value | regex | position | occurrence | regexp_value |
+--------------------+---------+----------+------------+--------------+
| Hello World Helloo | H?ello+ | 1 | 1 | Hello |
*--------------------+---------+----------+------------+--------------*/
REPEAT
REPEAT(original_value, repetitions)
Description
Returns a STRING
or BYTES
value that consists of original_value
, repeated.
The repetitions
parameter specifies the number of times to repeat
original_value
. Returns NULL
if either original_value
or repetitions
are NULL
.
This function returns an error if the repetitions
value is negative.
Return type
STRING
or BYTES
Examples
SELECT REPEAT('abc', 3) AS results
/*-----------*
| results |
|-----------|
| abcabcabc |
*-----------*/
SELECT REPEAT('abc', NULL) AS results
/*---------*
| results |
|---------|
| NULL |
*---------*/
SELECT REPEAT(NULL, 3) AS results
/*---------*
| results |
|---------|
| NULL |
*---------*/
REPLACE
REPLACE(original_value, from_pattern, to_pattern)
Description
Replaces all occurrences of from_pattern
with to_pattern
in
original_value
. If from_pattern
is empty, no replacement is made.
This function supports specifying collation.
Return type
STRING
or BYTES
Examples
WITH desserts AS
(SELECT 'apple pie' as dessert
UNION ALL
SELECT 'blackberry pie' as dessert
UNION ALL
SELECT 'cherry pie' as dessert)
SELECT
REPLACE (dessert, 'pie', 'cobbler') as example
FROM desserts;
/*--------------------*
| example |
+--------------------+
| apple cobbler |
| blackberry cobbler |
| cherry cobbler |
*--------------------*/
REVERSE
REVERSE(value)
Description
Returns the reverse of the input STRING
or BYTES
.
Return type
STRING
or BYTES
Examples
SELECT REVERSE('abc') AS results
/*---------*
| results |
+---------+
| cba |
*---------*/
SELECT FORMAT('%T', REVERSE(b'1a3')) AS results
/*---------*
| results |
+---------+
| b"3a1" |
*---------*/
RIGHT
RIGHT(value, length)
Description
Returns a STRING
or BYTES
value that consists of the specified
number of rightmost characters or bytes from value
. The length
is an
INT64
that specifies the length of the returned
value. If value
is BYTES
, length
is the number of rightmost bytes to
return. If value
is STRING
, length
is the number of rightmost characters
to return.
If length
is 0, an empty STRING
or BYTES
value will be
returned. If length
is negative, an error will be returned. If length
exceeds the number of characters or bytes from value
, the original value
will be returned.
Return type
STRING
or BYTES
Examples
SELECT 'apple' AS example, RIGHT('apple', 3) AS right_example
/*---------+---------------*
| example | right_example |
+---------+---------------+
| apple | ple |
*---------+---------------*/
SELECT b'apple' AS example, RIGHT(b'apple', 3) AS right_example
-- Note that the result of RIGHT is of type BYTES, displayed as a base64-encoded string.
/*----------+---------------*
| example | right_example |
+----------+---------------+
| YXBwbGU= | cGxl |
*----------+---------------*/
RPAD
RPAD(original_value, return_length[, pattern])
Description
Returns a STRING
or BYTES
value that consists of original_value
appended
with pattern
. The return_length
parameter is an
INT64
that specifies the length of the
returned value. If original_value
is BYTES
,
return_length
is the number of bytes. If original_value
is STRING
,
return_length
is the number of characters.
The default value of pattern
is a blank space.
Both original_value
and pattern
must be the same data type.
If return_length
is less than or equal to the original_value
length, this
function returns the original_value
value, truncated to the value of
return_length
. For example, RPAD('hello world', 7);
returns 'hello w'
.
If original_value
, return_length
, or pattern
is NULL
, this function
returns NULL
.
This function returns an error if:
return_length
is negativepattern
is empty
Return type
STRING
or BYTES
Examples
SELECT FORMAT('%T', RPAD('c', 5)) AS results
/*---------*
| results |
+---------+
| "c " |
*---------*/
SELECT RPAD('b', 5, 'a') AS results
/*---------*
| results |
+---------+
| baaaa |
*---------*/
SELECT RPAD('abc', 10, 'ghd') AS results
/*------------*
| results |
+------------+
| abcghdghdg |
*------------*/
SELECT RPAD('abc', 2, 'd') AS results
/*---------*
| results |
+---------+
| ab |
*---------*/
SELECT FORMAT('%T', RPAD(b'abc', 10, b'ghd')) AS results
/*---------------*
| results |
+---------------+
| b"abcghdghdg" |
*---------------*/
RTRIM
RTRIM(value1[, value2])
Description
Identical to TRIM, but only removes trailing characters.
Return type
STRING
or BYTES
Examples
SELECT RTRIM('***apple***', '*') AS example
/*-----------*
| example |
+-----------+
| ***apple |
*-----------*/
SELECT RTRIM('applexxz', 'xyz') AS example
/*---------*
| example |
+---------+
| apple |
*---------*/
SAFE_CONVERT_BYTES_TO_STRING
SAFE_CONVERT_BYTES_TO_STRING(value)
Description
Converts a sequence of BYTES
to a STRING
. Any invalid UTF-8 characters are
replaced with the Unicode replacement character, U+FFFD
.
Return type
STRING
Examples
The following statement returns the Unicode replacement character, �.
SELECT SAFE_CONVERT_BYTES_TO_STRING(b'\xc2') as safe_convert;
SOUNDEX
SOUNDEX(value)
Description
Returns a STRING
that represents the
Soundex code for value
.
SOUNDEX produces a phonetic representation of a string. It indexes words by sound, as pronounced in English. It is typically used to help determine whether two strings, such as the family names Levine and Lavine, or the words to and too, have similar English-language pronunciation.
The result of the SOUNDEX consists of a letter followed by 3 digits. Non-latin
characters are ignored. If the remaining string is empty after removing
non-Latin characters, an empty STRING
is returned.
Return type
STRING
Examples
SELECT 'Ashcraft' AS value, SOUNDEX('Ashcraft') AS soundex
/*----------------------+---------*
| value | soundex |
+----------------------+---------+
| Ashcraft | A261 |
*----------------------+---------*/
SPLIT
SPLIT(value[, delimiter])
Description
Splits a STRING
or BYTES
value, using a delimiter. The delimiter
argument
must be a literal character or sequence of characters. You can't split with a
regular expression.
For STRING
, the default delimiter is the comma ,
.
For BYTES
, you must specify a delimiter.
Splitting on an empty delimiter produces an array of UTF-8 characters for
STRING
values, and an array of BYTES
for BYTES
values.
Splitting an empty STRING
returns an
ARRAY
with a single empty
STRING
.
This function supports specifying collation.
Return type
ARRAY<STRING>
or ARRAY<BYTES>
Examples
WITH letters AS
(SELECT '' as letter_group
UNION ALL
SELECT 'a' as letter_group
UNION ALL
SELECT 'b c d' as letter_group)
SELECT SPLIT(letter_group, ' ') as example
FROM letters;
/*----------------------*
| example |
+----------------------+
| [] |
| [a] |
| [b, c, d] |
*----------------------*/
STARTS_WITH
STARTS_WITH(value, prefix)
Description
Takes two STRING
or BYTES
values. Returns TRUE
if prefix
is a
prefix of value
.
This function supports specifying collation.
Return type
BOOL
Examples
SELECT STARTS_WITH('bar', 'b') AS example
/*---------*
| example |
+---------+
| True |
*---------*/
STRPOS
STRPOS(value, subvalue)
Description
Takes two STRING
or BYTES
values. Returns the 1-based position of the first
occurrence of subvalue
inside value
. Returns 0
if subvalue
is not found.
This function supports specifying collation.
Return type
INT64
Examples
SELECT STRPOS('foo@example.com', '@') AS example
/*---------*
| example |
+---------+
| 4 |
*---------*/
SUBSTR
SUBSTR(value, position[, length])
Description
Gets a portion (substring) of the supplied STRING
or BYTES
value.
The position
argument is an integer specifying the starting position of the
substring.
- If
position
is1
, the substring starts from the first character or byte. - If
position
is0
or less than-LENGTH(value)
,position
is set to1
, and the substring starts from the first character or byte. - If
position
is greater than the length ofvalue
, the function produces an empty substring. - If
position
is negative, the function counts from the end ofvalue
, with-1
indicating the last character or byte.
The length
argument specifies the maximum number of characters or bytes to
return.
- If
length
is not specified, the function produces a substring that starts at the specified position and ends at the last character or byte ofvalue
. - If
length
is0
, the function produces an empty substring. - If
length
is negative, the function produces an error. - The returned substring may be shorter than
length
, for example, whenlength
exceeds the length ofvalue
, or when the starting position of the substring pluslength
is greater than the length ofvalue
.
Return type
STRING
or BYTES
Examples
SELECT SUBSTR('apple', 2) AS example
/*---------*
| example |
+---------+
| pple |
*---------*/
SELECT SUBSTR('apple', 2, 2) AS example
/*---------*
| example |
+---------+
| pp |
*---------*/
SELECT SUBSTR('apple', -2) AS example
/*---------*
| example |
+---------+
| le |
*---------*/
SELECT SUBSTR('apple', 1, 123) AS example
/*---------*
| example |
+---------+
| apple |
*---------*/
SELECT SUBSTR('apple', 123) AS example
/*---------*
| example |
+---------+
| |
*---------*/
SELECT SUBSTR('apple', 123, 5) AS example
/*---------*
| example |
+---------+
| |
*---------*/
SUBSTRING
SUBSTRING(value, position[, length])
Alias for SUBSTR
.
TO_BASE32
TO_BASE32(bytes_expr)
Description
Converts a sequence of BYTES
into a base32-encoded STRING
. To convert a
base32-encoded STRING
into BYTES
, use FROM_BASE32.
Return type
STRING
Example
SELECT TO_BASE32(b'abcde\xFF') AS base32_string;
/*------------------*
| base32_string |
+------------------+
| MFRGGZDF74====== |
*------------------*/
TO_BASE64
TO_BASE64(bytes_expr)
Description
Converts a sequence of BYTES
into a base64-encoded STRING
. To convert a
base64-encoded STRING
into BYTES
, use FROM_BASE64.
There are several base64 encodings in common use that vary in exactly which
alphabet of 65 ASCII characters are used to encode the 64 digits and padding.
See RFC 4648 for details. This
function adds padding and uses the alphabet [A-Za-z0-9+/=]
.
Return type
STRING
Example
SELECT TO_BASE64(b'\377\340') AS base64_string;
/*---------------*
| base64_string |
+---------------+
| /+A= |
*---------------*/
To work with an encoding using a different base64 alphabet, you might need to
compose TO_BASE64
with the REPLACE
function. For instance, the
base64url
url-safe and filename-safe encoding commonly used in web programming
uses -_=
as the last characters rather than +/=
. To encode a
base64url
-encoded string, replace +
and /
with -
and _
respectively.
SELECT REPLACE(REPLACE(TO_BASE64(b'\377\340'), '+', '-'), '/', '_') as websafe_base64;
/*----------------*
| websafe_base64 |
+----------------+
| _-A= |
*----------------*/
TO_CODE_POINTS
TO_CODE_POINTS(value)
Description
Takes a STRING
or BYTES
value and returns an array of INT64
values that
represent code points or extended ASCII character values.
- If
value
is aSTRING
, each element in the returned array represents a code point. Each code point falls within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. - If
value
isBYTES
, each element in the array is an extended ASCII character value in the range of [0, 255].
To convert from an array of code points to a STRING
or BYTES
, see
CODE_POINTS_TO_STRING or
CODE_POINTS_TO_BYTES.
Return type
ARRAY<INT64>
Examples
The following examples get the code points for each element in an array of words.
SELECT
'foo' AS word,
TO_CODE_POINTS('foo') AS code_points
/*---------+------------------------------------*
| word | code_points |
+---------+------------------------------------+
| foo | [102, 111, 111] |
*---------+------------------------------------*/
SELECT
'bar' AS word,
TO_CODE_POINTS('bar') AS code_points
/*---------+------------------------------------*
| word | code_points |
+---------+------------------------------------+
| bar | [98, 97, 114] |
*---------+------------------------------------*/
SELECT
'baz' AS word,
TO_CODE_POINTS('baz') AS code_points
/*---------+------------------------------------*
| word | code_points |
+---------+------------------------------------+
| baz | [98, 97, 122] |
*---------+------------------------------------*/
SELECT
'giraffe' AS word,
TO_CODE_POINTS('giraffe') AS code_points
/*---------+------------------------------------*
| word | code_points |
+---------+------------------------------------+
| giraffe | [103, 105, 114, 97, 102, 102, 101] |
*---------+------------------------------------*/
SELECT
'llama' AS word,
TO_CODE_POINTS('llama') AS code_points
/*---------+------------------------------------*
| word | code_points |
+---------+------------------------------------+
| llama | [108, 108, 97, 109, 97] |
*---------+------------------------------------*/
The following examples convert integer representations of BYTES
to their
corresponding ASCII character values.
SELECT
b'\x66\x6f\x6f' AS bytes_value,
TO_CODE_POINTS(b'\x66\x6f\x6f') AS bytes_value_as_integer
/*------------------+------------------------*
| bytes_value | bytes_value_as_integer |
+------------------+------------------------+
| foo | [102, 111, 111] |
*------------------+------------------------*/
SELECT
b'\x00\x01\x10\xff' AS bytes_value,
TO_CODE_POINTS(b'\x00\x01\x10\xff') AS bytes_value_as_integer
/*------------------+------------------------*
| bytes_value | bytes_value_as_integer |
+------------------+------------------------+
| \x00\x01\x10\xff | [0, 1, 16, 255] |
*------------------+------------------------*/
The following example demonstrates the difference between a BYTES
result and a
STRING
result. Notice that the character Ā
is represented as a two-byte
Unicode sequence. As a result, the BYTES
version of TO_CODE_POINTS
returns
an array with two elements, while the STRING
version returns an array with a
single element.
SELECT TO_CODE_POINTS(b'Ā') AS b_result, TO_CODE_POINTS('Ā') AS s_result;
/*------------+----------*
| b_result | s_result |
+------------+----------+
| [196, 128] | [256] |
*------------+----------*/
TO_HEX
TO_HEX(bytes)
Description
Converts a sequence of BYTES
into a hexadecimal STRING
. Converts each byte
in the STRING
as two hexadecimal characters in the range
(0..9, a..f)
. To convert a hexadecimal-encoded
STRING
to BYTES
, use FROM_HEX.
Return type
STRING
Example
SELECT
b'\x00\x01\x02\x03\xAA\xEE\xEF\xFF' AS byte_string,
TO_HEX(b'\x00\x01\x02\x03\xAA\xEE\xEF\xFF') AS hex_string
/*----------------------------------+------------------*
| byte_string | hex_string |
+----------------------------------+------------------+
| \x00\x01\x02\x03\xaa\xee\xef\xff | 00010203aaeeefff |
*----------------------------------+------------------*/
TRANSLATE
TRANSLATE(expression, source_characters, target_characters)
Description
In expression
, replaces each character in source_characters
with the
corresponding character in target_characters
. All inputs must be the same
type, either STRING
or BYTES
.
- Each character in
expression
is translated at most once. - A character in
expression
that is not present insource_characters
is left unchanged inexpression
. - A character in
source_characters
without a corresponding character intarget_characters
is omitted from the result. - A duplicate character in
source_characters
results in an error.
Return type
STRING
or BYTES
Examples
SELECT TRANSLATE('This is a cookie', 'sco', 'zku') AS translate
/*------------------*
| translate |
+------------------+
| Thiz iz a kuukie |
*------------------*/
TRIM
TRIM(value_to_trim[, set_of_characters_to_remove])
Description
Takes a STRING
or BYTES
value to trim.
If the value to trim is a STRING
, removes from this value all leading and
trailing Unicode code points in set_of_characters_to_remove
.
The set of code points is optional. If it is not specified, all
whitespace characters are removed from the beginning and end of the
value to trim.
If the value to trim is BYTES
, removes from this value all leading and
trailing bytes in set_of_characters_to_remove
. The set of bytes is required.
Return type
STRING
ifvalue_to_trim
is aSTRING
value.BYTES
ifvalue_to_trim
is aBYTES
value.
Examples
In the following example, all leading and trailing whitespace characters are
removed from item
because set_of_characters_to_remove
is not specified.
SELECT CONCAT('#', TRIM( ' apple '), '#') AS example
/*----------*
| example |
+----------+
| #apple# |
*----------*/
In the following example, all leading and trailing *
characters are removed
from 'apple'.
SELECT TRIM('***apple***', '*') AS example
/*---------*
| example |
+---------+
| apple |
*---------*/
In the following example, all leading and trailing x
, y
, and z
characters
are removed from 'xzxapplexxy'.
SELECT TRIM('xzxapplexxy', 'xyz') as example
/*---------*
| example |
+---------+
| apple |
*---------*/
In the following example, examine how TRIM
interprets characters as
Unicode code-points. If your trailing character set contains a combining
diacritic mark over a particular letter, TRIM
might strip the
same diacritic mark from a different letter.
SELECT
TRIM('abaW̊', 'Y̊') AS a,
TRIM('W̊aba', 'Y̊') AS b,
TRIM('abaŪ̊', 'Y̊') AS c,
TRIM('Ū̊aba', 'Y̊') AS d
/*------+------+------+------*
| a | b | c | d |
+------+------+------+------+
| abaW | W̊aba | abaŪ | Ūaba |
*------+------+------+------*/
In the following example, all leading and trailing b'n'
, b'a'
, b'\xab'
bytes are removed from item
.
SELECT b'apple', TRIM(b'apple', b'na\xab') AS example
-- Note that the result of TRIM is of type BYTES, displayed as a base64-encoded string.
/*----------------------+------------------*
| item | example |
+----------------------+------------------+
| YXBwbGU= | cHBsZQ== |
*----------------------+------------------*/
UNICODE
UNICODE(value)
Description
Returns the Unicode code point for the first character in
value
. Returns 0
if value
is empty, or if the resulting Unicode code
point is 0
.
Return type
INT64
Examples
SELECT UNICODE('âbcd') as A, UNICODE('â') as B, UNICODE('') as C, UNICODE(NULL) as D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| 226 | 226 | 0 | NULL |
*-------+-------+-------+-------*/
UPPER
UPPER(value)
Description
For STRING
arguments, returns the original string with all alphabetic
characters in uppercase. Mapping between uppercase and lowercase is done
according to the
Unicode Character Database
without taking into account language-specific mappings.
For BYTES
arguments, the argument is treated as ASCII text, with all bytes
greater than 127 left intact.
Return type
STRING
or BYTES
Examples
SELECT UPPER('foo') AS example
/*---------*
| example |
+---------+
| FOO |
*---------*/
Table functions (built in)
GoogleSQL for BigQuery supports built-in table functions.
This topic includes functions that produce columns of a table.
You can only use these functions in the FROM
clause.
Function list
Name | Summary |
---|---|
APPENDS
|
Returns all rows appended to a table for a given time range. |
CHANGES
|
Returns all rows that have changed in a table for a given time range. |
EXTERNAL_OBJECT_TRANSFORM
|
Produces an object table with the original columns plus one or more additional columns. |
GAP_FILL
|
Finds and fills gaps in a time series. |
RANGE_SESSIONIZE
|
Produces a table of session ranges. |
APPENDS
APPENDS(
TABLE table,
start_timestamp DEFAULT NULL,
end_timestamp DEFAULT NULL)
Description
The APPENDS
function returns all rows appended to a table for a given
time range.
The following operations add rows to the APPENDS
change history:
CREATE TABLE
DDL statementINSERT
DML statement- Data appended as part of a
MERGE
DML statement - Loading data into BigQuery
- Streaming ingestion
Definitions
table
: the BigQuery table name. This must be a regular BigQuery table. This argument must be preceded by the wordTABLE
.start_timestamp
: aTIMESTAMP
value indicating the earliest time at which a change is included in the output. If the value isNULL
, all changes since the table creation are returned. If the table was created after thestart_timestamp
value, the actual table creation time is used instead. An error is returned if the time specified is earlier than allowed by time travel, or if the table was created earlier than allowed by time travel if thestart_timestamp
value isNULL
. For standard tables, this window is seven days, but you can configure the time travel window to be less than that.end_timestamp
: aTIMESTAMP
value indicating the latest time at which a change is included in the output.end_timestamp
is exclusive; for example, if you specify2023-12-31 08:00:00
forstart_timestamp
and2023-12-31 12:00:00
forend_timestamp
, all changes made from 8 AM December 31, 2023 through 11:59 AM December 31, 2023 are returned.If the
end_timestamp
value isNULL
, all changes made until the start of the query are included.
Details
Records of inserted rows persist even if that data is later deleted. Deletions
are not reflected in the APPENDS
function. If a table
is copied, calling the APPENDS
function on the copied table returns every row
as inserted at the time of table creation. If a row is modified due to an
UPDATE
operation, there's no effect.
Output
The APPENDS
function returns a table with the following columns:
- All columns of the input table at the time the query is run. If a column is
added after the
end_timestamp
value, it appears withNULL
values populated in any of the rows that were inserted before the addition of the column. _CHANGE_TYPE
: aSTRING
value indicating the type of change that produced the row. ForAPPENDS
, the only supported value isINSERT
._CHANGE_TIMESTAMP
: aTIMESTAMP
value indicating the commit time of the transaction that made the change.
Limitations
- The data returned by the
APPENDS
function is limited to the time travel window of the table. - You can't call the
APPENDS
function within a multi-statement transaction. - You can only use the
APPENDS
function with regular BigQuery tables. Clones, snapshots, views, materialized views, external tables, and wildcard tables aren't supported. - Partition pseudo-columns for ingestion-time partitioned tables, such as
_PARTITIONTIME
and_PARTITIONDATE
, aren't included in the function's output.
Example
This example shows the change history returned by the APPENDS
function as various
changes are made to a table called Produce
. First, create the table:
CREATE TABLE mydataset.Produce (product STRING, inventory INT64) AS (
SELECT 'apples' AS product, 10 AS inventory);
Next, insert two rows into the table:
INSERT INTO mydataset.Produce
VALUES
('bananas', 20),
('carrots', 30);
To view the full change history of appends, use NULL
values to get the full
history within the time travel window:
SELECT
product,
inventory,
_CHANGE_TYPE AS change_type,
_CHANGE_TIMESTAMP AS change_time
FROM
APPENDS(TABLE mydataset.Produce, NULL, NULL);
The output is similar to the following:
+---------+-----------+-------------+--------------------------------+ | product | inventory | change_type | change_time | +---------+-----------+-------------+--------------------------------+ | apples | 10 | INSERT | 2022-04-15 20:06:00.488000 UTC | | bananas | 20 | INSERT | 2022-04-15 20:06:08.490000 UTC | | carrots | 30 | INSERT | 2022-04-15 20:06:08.490000 UTC | +---------+-----------+-------------+--------------------------------+
Next, add a column, insert a new row of values, update the inventory, and delete
the bananas
row:
ALTER TABLE mydataset.Produce ADD COLUMN color STRING;
INSERT INTO mydataset.Produce VALUES ('grapes', 40, 'purple');
UPDATE mydataset.Produce SET inventory = inventory + 5 WHERE TRUE;
DELETE mydataset.Produce WHERE product = 'bananas';
View the new table:
SELECT * FROM mydataset.Produce;
The output is similar to the following:
+---------+-----------+--------+ | product | inventory | color | +---------+-----------+--------+ | apples | 15 | NULL | | carrots | 35 | NULL | | grapes | 45 | purple | +---------+-----------+--------+
View the full change history of appends:
SELECT
product,
inventory,
color,
_CHANGE_TYPE AS change_type,
_CHANGE_TIMESTAMP AS change_time
FROM
APPENDS(TABLE mydataset.Produce, NULL, NULL);
The output is similar to the following:
+---------+-----------+--------+-------------+--------------------------------+ | product | inventory | color | change_type | change_time | +---------+-----------+--------+-------------+--------------------------------+ | apples | 10 | NULL | INSERT | 2022-04-15 20:06:00.488000 UTC | | bananas | 20 | NULL | INSERT | 2022-04-15 20:06:08.490000 UTC | | carrots | 30 | NULL | INSERT | 2022-04-15 20:06:08.490000 UTC | | grapes | 40 | purple | INSERT | 2022-04-15 20:07:45.751000 UTC | +---------+-----------+--------+-------------+--------------------------------+
The inventory
column displays the values that were set when the rows were
originally inserted into to the table. It does not show the changes from the
UPDATE
statement. The row with information on bananas is still present because
the APPENDS
function only captures additions to tables, not deletions.
CHANGES
CHANGES(
TABLE table,
start_timestamp DEFAULT NULL,
end_timestamp)
Description
The CHANGES
function returns all rows that have changed in a table for a given
time range. To use the CHANGES
function on a table, you must set the table's
enable_change_history
option
to TRUE
.
The following operations add rows to the CHANGES
change history:
CREATE TABLE
DDL statementINSERT
DML statement- Data appended, changed or deleted as part of a
MERGE
DML statement UPDATE
DML statementDELETE
DML statement- Loading data into BigQuery
- Streaming ingestion
TRUNCATE TABLE
DML statement- Jobs configured with a
writeDisposition
ofWRITE_TRUNCATE
- Individual table partition deletions
Definitions
table
: the BigQuery table name. This must be a regular BigQuery table, and must have theenable_change_history
option set toTRUE
. Enabling this table option has an impact on costs; for more information see Pricing and costs. This argument must be preceded by the wordTABLE
.start_timestamp
: aTIMESTAMP
value indicating the earliest time at which a change is included in the output. If the value isNULL
, all changes since the table creation are returned. If you set theenable_change_history
option after setting thestart_timestamp
option, the history before the enablement time might be incomplete. If the table was created after thestart_timestamp
value, the actual table creation time is used instead. An error is returned if the time specified is earlier than allowed by time travel, or if the table was created earlier than allowed by time travel if thestart_timestamp
value isNULL
. For standard tables, this window is seven days, but you can configure the time travel window to be less than that.end_timestamp
: aTIMESTAMP
value indicating the latest time at which a change is included in the output.end_timestamp
is exclusive; for example, if you specify2023-12-31 08:00:00
forstart_timestamp
and2023-12-31 12:00:00
forend_timestamp
, all changes made from 8 AM December 31, 2023 through 11:59 AM December 31, 2023 are returned. Theend_timestamp
value must be at least ten minutes prior to the current time. The maximum time range allowed betweenstart_timestamp
andend_timestamp
is one day.
Details
If a row is inserted, a record of the new row with an INSERT
change type is
produced.
If a row is deleted, a record of the deleted row with a DELETE
change type is
produced.
If a row is updated, a record of the old row with a DELETE
change type and a
record of the new row with an UPDATE
change type are produced.
Output
The CHANGES
function returns a table with the following columns:
- All columns of the input table at the time that the query is run. If a
column is added after the
end_timestamp
value, it appears withNULL
values populated in of the any rows that were changed before the addition of the column. _CHANGE_TYPE
: aSTRING
value indicating the type of change that produced the row. ForCHANGES
, the supported values areINSERT
,UPDATE
, andDELETE
._CHANGE_TIMESTAMP
: aTIMESTAMP
value indicating the commit time of the transaction that made the change.
Limitations
- The data returned by the
CHANGES
function is limited to the time travel window of the table. - The maximum allowed time range between the
start_timestamp
andend_timestamp
arguments you specify for the function is one day. - The
CHANGES
function can't query the last ten minutes of table history. Therefore, theend_timestamp
argument value must be at least ten minutes prior to the current time. - You can't call the
CHANGES
function within a multi-statement transaction. - You can't use the
CHANGES
function with tables that have had multi-statement transactions committed to them within the requested time window. - You can only use the
CHANGES
function with regular BigQuery tables. Views, materialized views, external tables, and wildcard tables aren't supported. - For tables that have been cloned or snapshotted, and for tables that are restored from a clone or snapshot, change history from the source table isn't carried over to the new table, clone, or snapshot.
- You can't use the
CHANGES
function with a table that has change data capture enabled. - Partition pseudo-columns for ingestion-time partitioned tables, such as
_PARTITIONTIME
and_PARTITIONDATE
, aren't included in the function's output. - Change history isn't captured for table deletions made due to table partition expiration.
- Performing
data manipulation language (DML) statements over recently streamed data
fails on tables that have the
enable_change_history
option set toTRUE
.
Example
This example shows the change history returned by the CHANGES
function as
various changes are made to a table called Produce
. First, create the table:
CREATE TABLE mydataset.Produce (
product STRING,
inventory INT64)
OPTIONS(enable_change_history=true);
Insert two rows into the table:
INSERT INTO mydataset.Produce
VALUES
('bananas', 20),
('carrots', 30);
Delete one row from the table:
DELETE mydataset.Produce
WHERE product = 'bananas';
Update one row of the table:
UPDATE mydataset.Produce
SET inventory = inventory - 10
WHERE product = 'carrots';
Wait for 10 minutes and view the full change history of the changes:
SELECT
product,
inventory,
_CHANGE_TYPE AS change_type,
_CHANGE_TIMESTAMP AS change_time
FROM
CHANGES(TABLE mydataset.Produce, NULL, TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 601 SECOND))
ORDER BY change_time, product;
The output is similar to the following:
+---------+-----------+-------------+---------------------+ | product | inventory | change_type | change_time | +---------+-----------+-------------+---------------------+ | bananas | 20 | INSERT | 2024-01-09 17:13:58 | | carrots | 30 | INSERT | 2024-01-09 17:13:58 | | bananas | 20 | DELETE | 2024-01-09 17:14:30 | | carrots | 30 | DELETE | 2024-01-09 17:15:24 | | carrots | 20 | UPDATE | 2024-01-09 17:15:24 | +---------+-----------+-------------+---------------------+
EXTERNAL_OBJECT_TRANSFORM
EXTERNAL_OBJECT_TRANSFORM(TABLE object_table_name, transform_types_array)
Description
This function returns a transformed object table with the original columns plus
one or more additional columns, depending on the transform_types
values
specified.
This function only supports object tables as inputs. Subqueries or any other types of tables are not supported.
object_table_name
is the name of the object table to be transformed, in
the format dataset_name.object_table_name
.
transform_types_array
is an array of STRING
literals. Currently, the only
supported transform_types_array
value is SIGNED_URL
. Specifying SIGNED_URL
creates read-only signed URLs for the objects in the identified object table,
which are returned in a signed_url
column. Generated signed URLs are
valid for 6 hours.
Return Type
TABLE
Example
Run the following query to return URIs and signed URLs for the objects in the
mydataset.myobjecttable
object table.
SELECT uri, signed_url
FROM EXTERNAL_OBJECT_TRANSFORM(TABLE mydataset.myobjecttable, ['SIGNED_URL']);
--The preceding statement returns results similar to the following:
/*-----------------------------------------------------------------------------------------------------------------------------*
| uri | signed_url |
+-----------------------------------------------------------------------------------------------------------------------------+
| gs://myobjecttable/1234_Main_St.jpeg | https://storage.googleapis.com/mybucket/1234_Main_St.jpeg?X-Goog-Algorithm=1234abcd… |
+-----------------------------------------------------------------------------------------------------------------------------+
| gs://myobjecttable/345_River_Rd.jpeg | https://storage.googleapis.com/mybucket/345_River_Rd.jpeg?X-Goog-Algorithm=2345bcde… |
*-----------------------------------------------------------------------------------------------------------------------------*/
GAP_FILL
Finds and fills gaps in a time series.
For more information, see GAP_FILL
in
Time series functions.
RANGE_SESSIONIZE
For more information, see RANGE_SESSIONIZE
in
Range functions.
Time functions
GoogleSQL for BigQuery supports the following time functions.
Function list
Name | Summary |
---|---|
CURRENT_TIME
|
Returns the current time as a TIME value.
|
EXTRACT
|
Extracts part of a TIME value.
|
FORMAT_TIME
|
Formats a TIME value according to the specified format string.
|
PARSE_TIME
|
Converts a STRING value to a TIME value.
|
TIME
|
Constructs a TIME value.
|
TIME_ADD
|
Adds a specified time interval to a TIME value.
|
TIME_DIFF
|
Gets the number of unit boundaries between two TIME values at
a particular time granularity.
|
TIME_SUB
|
Subtracts a specified time interval from a TIME value.
|
TIME_TRUNC
|
Truncates a TIME value at a particular granularity.
|
CURRENT_TIME
CURRENT_TIME([time_zone])
CURRENT_TIME
Description
Returns the current time as a TIME
object. Parentheses are optional when
called with no arguments.
This function supports an optional time_zone
parameter.
See Time zone definitions for information
on how to specify a time zone.
The current time is recorded at the start of the query statement which contains this function, not when this specific function is evaluated.
Return Data Type
TIME
Example
SELECT CURRENT_TIME() as now;
/*----------------------------*
| now |
+----------------------------+
| 15:31:38.776361 |
*----------------------------*/
EXTRACT
EXTRACT(part FROM time_expression)
Description
Returns a value that corresponds to the specified part
from
a supplied time_expression
.
Allowed part
values are:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
Returned values truncate lower order time periods. For example, when extracting
seconds, EXTRACT
truncates the millisecond and microsecond values.
Return Data Type
INT64
Example
In the following example, EXTRACT
returns a value corresponding to the HOUR
time part.
SELECT EXTRACT(HOUR FROM TIME "15:30:00") as hour;
/*------------------*
| hour |
+------------------+
| 15 |
*------------------*/
FORMAT_TIME
FORMAT_TIME(format_string, time_expr)
Description
Formats a TIME
value according to the specified format string.
Definitions
format_string
: ASTRING
value that contains the format elements to use withtime_expr
.time_expr
: ATIME
value that represents the time to format.
Return Data Type
STRING
Example
SELECT FORMAT_TIME("%R", TIME "15:30:00") as formatted_time;
/*----------------*
| formatted_time |
+----------------+
| 15:30 |
*----------------*/
PARSE_TIME
PARSE_TIME(format_string, time_string)
Description
Converts a STRING
value to a TIME
value.
Definitions
format_string
: ASTRING
value that contains the format elements to use withtime_string
.time_string
: ASTRING
value that represents the time to parse.
Details
Each element in time_string
must have a corresponding element in
format_string
. The location of each element in format_string
must match the
location of each element in time_string
.
-- This works because elements on both sides match.
SELECT PARSE_TIME("%I:%M:%S", "07:30:00");
-- This produces an error because the seconds element is in different locations.
SELECT PARSE_TIME("%S:%I:%M", "07:30:00");
-- This produces an error because one of the seconds elements is missing.
SELECT PARSE_TIME("%I:%M", "07:30:00");
-- This works because %T can find all matching elements in time_string.
SELECT PARSE_TIME("%T", "07:30:00");
The format string fully supports most format elements except for %P
.
When using PARSE_TIME
, keep the following in mind:
- Unspecified fields. Any unspecified field is initialized from
00:00:00.0
. For instance, ifseconds
is unspecified then it defaults to00
, and so on. - Whitespace. One or more consecutive white spaces in the format string
matches zero or more consecutive white spaces in the
TIME
string. In addition, leading and trailing white spaces in theTIME
string are always allowed, even if they are not in the format string. - Format precedence. When two (or more) format elements have overlapping information, the last one generally overrides any earlier ones.
- Format divergence.
%p
can be used witham
,AM
,pm
, andPM
.
Return Data Type
TIME
Example
SELECT PARSE_TIME("%H", "15") as parsed_time;
/*-------------*
| parsed_time |
+-------------+
| 15:00:00 |
*-------------*/
SELECT PARSE_TIME('%I:%M:%S %p', '2:23:38 pm') AS parsed_time;
/*-------------*
| parsed_time |
+-------------+
| 14:23:38 |
*-------------*/
TIME
1. TIME(hour, minute, second)
2. TIME(timestamp, [time_zone])
3. TIME(datetime)
Description
- Constructs a
TIME
object usingINT64
values representing the hour, minute, and second. - Constructs a
TIME
object using aTIMESTAMP
object. It supports an optional parameter to specify a time zone. If no time zone is specified, the default time zone, UTC, is used. - Constructs a
TIME
object using aDATETIME
object.
Return Data Type
TIME
Example
SELECT
TIME(15, 30, 00) as time_hms,
TIME(TIMESTAMP "2008-12-25 15:30:00+08", "America/Los_Angeles") as time_tstz;
/*----------+-----------*
| time_hms | time_tstz |
+----------+-----------+
| 15:30:00 | 23:30:00 |
*----------+-----------*/
SELECT TIME(DATETIME "2008-12-25 15:30:00.000000") AS time_dt;
/*----------*
| time_dt |
+----------+
| 15:30:00 |
*----------*/
TIME_ADD
TIME_ADD(time_expression, INTERVAL int64_expression part)
Description
Adds int64_expression
units of part
to the TIME
object.
TIME_ADD
supports the following values for part
:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
This function automatically adjusts when values fall outside of the 00:00:00 to
24:00:00 boundary. For example, if you add an hour to 23:30:00
, the returned
value is 00:30:00
.
Return Data Types
TIME
Example
SELECT
TIME "15:30:00" as original_time,
TIME_ADD(TIME "15:30:00", INTERVAL 10 MINUTE) as later;
/*-----------------------------+------------------------*
| original_time | later |
+-----------------------------+------------------------+
| 15:30:00 | 15:40:00 |
*-----------------------------+------------------------*/
TIME_DIFF
TIME_DIFF(start_time, end_time, granularity)
Description
Gets the number of unit boundaries between two TIME
values (end_time
-
start_time
) at a particular time granularity.
Definitions
start_time
: The startingTIME
value.end_time
: The endingTIME
value.granularity
: The time part that represents the granularity. If you passed inTIME
values for the first arguments,granularity
can be:MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
Details
If end_time
is earlier than start_time
, the output is negative.
Produces an error if the computation overflows, such as if the difference
in microseconds
between the two TIME
values overflows.
Return Data Type
INT64
Example
SELECT
TIME "15:30:00" as first_time,
TIME "14:35:00" as second_time,
TIME_DIFF(TIME "15:30:00", TIME "14:35:00", MINUTE) as difference;
/*----------------------------+------------------------+------------------------*
| first_time | second_time | difference |
+----------------------------+------------------------+------------------------+
| 15:30:00 | 14:35:00 | 55 |
*----------------------------+------------------------+------------------------*/
TIME_SUB
TIME_SUB(time_expression, INTERVAL int64_expression part)
Description
Subtracts int64_expression
units of part
from the TIME
object.
TIME_SUB
supports the following values for part
:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
This function automatically adjusts when values fall outside of the 00:00:00 to
24:00:00 boundary. For example, if you subtract an hour from 00:30:00
, the
returned value is 23:30:00
.
Return Data Type
TIME
Example
SELECT
TIME "15:30:00" as original_date,
TIME_SUB(TIME "15:30:00", INTERVAL 10 MINUTE) as earlier;
/*-----------------------------+------------------------*
| original_date | earlier |
+-----------------------------+------------------------+
| 15:30:00 | 15:20:00 |
*-----------------------------+------------------------*/
TIME_TRUNC
TIME_TRUNC(time_value, time_granularity)
Description
Truncates a TIME
value at a particular granularity.
Definitions
time_value
: TheTIME
value to truncate.time_granularity
: The truncation granularity for aTIME
value. Time granularities can be used.
Time granularity definitions
MICROSECOND
: If used, nothing is truncated from the value.MILLISECOND
: The nearest lesser than or equal millisecond.SECOND
: The nearest lesser than or equal second.MINUTE
: The nearest lesser than or equal minute.HOUR
: The nearest lesser than or equal hour.
Details
The resulting value is always rounded to the beginning of granularity
.
Return Data Type
TIME
Example
SELECT
TIME "15:30:00" as original,
TIME_TRUNC(TIME "15:30:00", HOUR) as truncated;
/*----------------------------+------------------------*
| original | truncated |
+----------------------------+------------------------+
| 15:30:00 | 15:00:00 |
*----------------------------+------------------------*/
Time series functions
GoogleSQL for BigQuery supports the following time series functions.
Function list
Name | Summary |
---|---|
DATE_BUCKET
|
Gets the lower bound of the date bucket that contains a date. |
DATETIME_BUCKET
|
Gets the lower bound of the datetime bucket that contains a datetime. |
GAP_FILL
|
Finds and fills gaps in a time series. |
TIMESTAMP_BUCKET
|
Gets the lower bound of the timestamp bucket that contains a timestamp. |
DATE_BUCKET
DATE_BUCKET(date_in_bucket, bucket_width)
DATE_BUCKET(date_in_bucket, bucket_width, bucket_origin_date)
Description
Gets the lower bound of the date bucket that contains a date.
Definitions
date_in_bucket
: ADATE
value that you can use to look up a date bucket.bucket_width
: AnINTERVAL
value that represents the width of a date bucket. A single interval with date parts is supported.bucket_origin_date
: ADATE
value that represents a point in time. All buckets expand left and right from this point. If this argument is not set,1950-01-01
is used by default.
Return type
DATE
Examples
In the following example, the origin is omitted and the default origin,
1950-01-01
is used. All buckets expand in both directions from the origin,
and the size of each bucket is two days. The lower bound of the bucket in
which my_date
belongs is returned.
WITH some_dates AS (
SELECT DATE '1949-12-29' AS my_date UNION ALL
SELECT DATE '1949-12-30' UNION ALL
SELECT DATE '1949-12-31' UNION ALL
SELECT DATE '1950-01-01' UNION ALL
SELECT DATE '1950-01-02' UNION ALL
SELECT DATE '1950-01-03'
)
SELECT DATE_BUCKET(my_date, INTERVAL 2 DAY) AS bucket_lower_bound
FROM some_dates;
/*--------------------+
| bucket_lower_bound |
+--------------------+
| 1949-12-28 |
| 1949-12-30 |
| 1949-12-30 |
| 1950-12-01 |
| 1950-12-01 |
| 1950-12-03 |
+--------------------*/
-- Some date buckets that originate from 1950-01-01:
-- + Bucket: ...
-- + Bucket: [1949-12-28, 1949-12-30)
-- + Bucket: [1949-12-30, 1950-01-01)
-- + Origin: [1950-01-01]
-- + Bucket: [1950-01-01, 1950-01-03)
-- + Bucket: [1950-01-03, 1950-01-05)
-- + Bucket: ...
In the following example, the origin has been changed to 2000-12-24
,
and all buckets expand in both directions from this point. The size of each
bucket is seven days. The lower bound of the bucket in which my_date
belongs
is returned:
WITH some_dates AS (
SELECT DATE '2000-12-20' AS my_date UNION ALL
SELECT DATE '2000-12-21' UNION ALL
SELECT DATE '2000-12-22' UNION ALL
SELECT DATE '2000-12-23' UNION ALL
SELECT DATE '2000-12-24' UNION ALL
SELECT DATE '2000-12-25'
)
SELECT DATE_BUCKET(
my_date,
INTERVAL 7 DAY,
DATE '2000-12-24') AS bucket_lower_bound
FROM some_dates;
/*--------------------+
| bucket_lower_bound |
+--------------------+
| 2000-12-17 |
| 2000-12-17 |
| 2000-12-17 |
| 2000-12-17 |
| 2000-12-24 |
| 2000-12-24 |
+--------------------*/
-- Some date buckets that originate from 2000-12-24:
-- + Bucket: ...
-- + Bucket: [2000-12-10, 2000-12-17)
-- + Bucket: [2000-12-17, 2000-12-24)
-- + Origin: [2000-12-24]
-- + Bucket: [2000-12-24, 2000-12-31)
-- + Bucket: [2000-12-31, 2000-01-07)
-- + Bucket: ...
DATETIME_BUCKET
DATETIME_BUCKET(datetime_in_bucket, bucket_width)
DATETIME_BUCKET(datetime_in_bucket, bucket_width, bucket_origin_datetime)
Description
Gets the lower bound of the datetime bucket that contains a datetime.
Definitions
datetime_in_bucket
: ADATETIME
value that you can use to look up a datetime bucket.bucket_width
: AnINTERVAL
value that represents the width of a datetime bucket. A single interval with date and time parts is supported.bucket_origin_datetime
: ADATETIME
value that represents a point in time. All buckets expand left and right from this point. If this argument is not set,1950-01-01 00:00:00
is used by default.
Return type
DATETIME
Examples
In the following example, the origin is omitted and the default origin,
1950-01-01 00:00:00
is used. All buckets expand in both directions from the
origin, and the size of each bucket is 12 hours. The lower bound of the bucket
in which my_datetime
belongs is returned:
WITH some_datetimes AS (
SELECT DATETIME '1949-12-30 13:00:00' AS my_datetime UNION ALL
SELECT DATETIME '1949-12-31 00:00:00' UNION ALL
SELECT DATETIME '1949-12-31 13:00:00' UNION ALL
SELECT DATETIME '1950-01-01 00:00:00' UNION ALL
SELECT DATETIME '1950-01-01 13:00:00' UNION ALL
SELECT DATETIME '1950-01-02 00:00:00'
)
SELECT DATETIME_BUCKET(my_datetime, INTERVAL 12 HOUR) AS bucket_lower_bound
FROM some_datetimes;
/*---------------------+
| bucket_lower_bound |
+---------------------+
| 1949-12-30T12:00:00 |
| 1949-12-31T00:00:00 |
| 1949-12-31T12:00:00 |
| 1950-01-01T00:00:00 |
| 1950-01-01T12:00:00 |
| 1950-01-02T00:00:00 |
+---------------------*/
-- Some datetime buckets that originate from 1950-01-01 00:00:00:
-- + Bucket: ...
-- + Bucket: [1949-12-30 00:00:00, 1949-12-30 12:00:00)
-- + Bucket: [1949-12-30 12:00:00, 1950-01-01 00:00:00)
-- + Origin: [1950-01-01 00:00:00]
-- + Bucket: [1950-01-01 00:00:00, 1950-01-01 12:00:00)
-- + Bucket: [1950-01-01 12:00:00, 1950-02-00 00:00:00)
-- + Bucket: ...
In the following example, the origin has been changed to 2000-12-24 12:00:00
,
and all buckets expand in both directions from this point. The size of each
bucket is seven days. The lower bound of the bucket in which my_datetime
belongs is returned:
WITH some_datetimes AS (
SELECT DATETIME '2000-12-20 00:00:00' AS my_datetime UNION ALL
SELECT DATETIME '2000-12-21 00:00:00' UNION ALL
SELECT DATETIME '2000-12-22 00:00:00' UNION ALL
SELECT DATETIME '2000-12-23 00:00:00' UNION ALL
SELECT DATETIME '2000-12-24 00:00:00' UNION ALL
SELECT DATETIME '2000-12-25 00:00:00'
)
SELECT DATETIME_BUCKET(
my_datetime,
INTERVAL 7 DAY,
DATETIME '2000-12-22 12:00:00') AS bucket_lower_bound
FROM some_datetimes;
/*--------------------+
| bucket_lower_bound |
+--------------------+
| 2000-12-15T12:00:00 |
| 2000-12-15T12:00:00 |
| 2000-12-15T12:00:00 |
| 2000-12-22T12:00:00 |
| 2000-12-22T12:00:00 |
| 2000-12-22T12:00:00 |
+--------------------*/
-- Some datetime buckets that originate from 2000-12-22 12:00:00:
-- + Bucket: ...
-- + Bucket: [2000-12-08 12:00:00, 2000-12-15 12:00:00)
-- + Bucket: [2000-12-15 12:00:00, 2000-12-22 12:00:00)
-- + Origin: [2000-12-22 12:00:00]
-- + Bucket: [2000-12-22 12:00:00, 2000-12-29 12:00:00)
-- + Bucket: [2000-12-29 12:00:00, 2000-01-05 12:00:00)
-- + Bucket: ...
GAP_FILL
GAP_FILL (
TABLE time_series_table,
time_series_column,
bucket_width,
[, partitioning_columns => value ]
[, value_columns => value ]
[, origin => value ]
[, ignore_null_values => { TRUE | FALSE } ]
)
GAP_FILL (
(time_series_subquery),
time_series_column,
bucket_width,
[, partitioning_columns => values ]
[, value_columns => value ]
[, origin => value ]
[, ignore_null_values => { TRUE | FALSE } ]
)
Description
Finds and fills gaps in a time series.
Definitions
time_series_table
: The name of the table that contains the time series data.time_series_subquery
: The subquery that contains the time series data.time_series_column
: The name of the column intime_series_table
ortime_series_subquery
that contains the time points of the time series data. This column must represent aDATE
,DATETIME
, orTIMESTAMP
type.bucket_width
: TheINTERVAL
value that represents the selected width of the time buckets. The interval can represent aDATE
,DATETIME
, orTIMESTAMP
type.partitioning_columns
: A named argument with anARRAY<STRING>
value. Represents an array of zero or more column names used to partition data into individual time series (time series identity). This has the same column type requirements as thePARTITION BY
clause.value_columns
: A named argument with anARRAY<STRUCT<STRING, STRING>>
value. Represents an array of column name and gap-filling method pairs in the following format:[(column_name, gap_filling_method), ...]
column_name
: ASTRING
value that represents a valid column fromtime_series_table
. A column name can only be used once invalue_columns
.gap_filling_method
: ASTRING
value that can be one of the following gap-filling methods:null
(default): Fill in missing values withNULL
values.linear
: Fill in missing values using linear interpolation. So, when a new value is added, it's based on a linear slope for a specific time bucket. When this method is used,column_name
must be a numeric data type.locf
: Fill in missing values by carrying the last observed value forward. So, when a new value is added, it's based on the previous value.
origin
: ADATE
,DATETIME
orTIMESTAMP
optional named argument. Represents a point in time from which all time buckets expand in each direction.If
origin
is not provided, the data type fortime_series_column
is assumed, and the corresponding default value is used:DATE '1950-01-01'
DATETIME '1950-01-01 00:00:00'
TIMESTAMP '1950-01-01 00:00:00'
ignore_null_values
: A named argument with aBOOL
value. Indicates whether the function ignoresNULL
values in the input data when performing gap filling. By default, this value isTRUE
.If
TRUE
(default),NULL
values are skipped during gap filling.null
is the gap-filling method for a column: If a value in a column isNULL
, the output isNULL
for that column.locf
orlinear
is the gap-filling method for a column: The previous or next non-NULL
value is used. The side effect of this is that output value columns are neverNULL
, except for the edges.
If
FALSE
,NULL
values are included during gap filling.null
is the gap-filling method for a column: If a value in a column isNULL
, the output isNULL
for that column.locf
is the gap-filling method for a column: If the previous value in that column isNULL
, the output isNULL
for that column.linear
is the gap-filling method for a column: If either of the endpoints in that column isNULL
, the output isNULL
for that column.
Details
Sometimes the fixed time intervals produced by time bucket functions have gaps,
either due to irregular sampling intervals or an event that caused data loss
for some time period. This can cause irregularities in reporting. For example,
a plot with irregular intervals might have visible discontinuity. You can use
the GAP_FILL
function to employ various gap-filling methods to fill in
those missing data points.
time_series_column
and origin
must be of the same data type.
Return type
TABLE
Examples
In the following query, the locf
gap-filling method is applied to gaps:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:00', 78, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:38:01', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'locf')
]
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:35:00 | 74 |
| 2023-11-01T09:36:00 | 77 |
| 2023-11-01T09:37:00 | 78 |
| 2023-11-01T09:38:00 | 78 |
+---------------------+--------*/
In the following query, the linear
gap-filling method is applied to gaps:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:00', 78, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:38:01', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'linear')
]
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:35:00 | 75 |
| 2023-11-01T09:36:00 | 77 |
| 2023-11-01T09:37:00 | 78 |
| 2023-11-01T09:38:00 | 80 |
+---------------------+--------*/
In the following query, the null
gap-filling method is applied to gaps:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:00', 78, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:38:01', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'null')
]
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:35:00 | NULL |
| 2023-11-01T09:36:00 | 77 |
| 2023-11-01T09:37:00 | 78 |
| 2023-11-01T09:38:00 | NULL |
+---------------------+--------*/
In the following query, NULL
values in the input data are ignored by default:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:00', NULL, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:38:01', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'linear')
]
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:35:00 | 75 |
| 2023-11-01T09:36:00 | 77 |
| 2023-11-01T09:37:00 | 78 |
| 2023-11-01T09:38:00 | 80 |
+---------------------+--------*/
In the following query, NULL
values in the input data are not ignored, using
the ignore_null_values
argument:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:00', NULL, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:38:01', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'linear')
],
ignore_null_values => FALSE
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:35:00 | 75 |
| 2023-11-01T09:36:00 | 77 |
| 2023-11-01T09:37:00 | NULL |
| 2023-11-01T09:38:00 | NULL |
+---------------------+--------*/
In the following query, when the value_columns
argument is not passed in,
the null
gap-filling method is used on all columns:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:00', 79, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:38:01', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE
)
ORDER BY time;
/*---------------------+-----------+--------+----------+
| time | device_id | signal | state |
+---------------------+-----------+--------+----------+
| 2023-11-01T09:35:00 | NULL | NULL | NULL |
| 2023-11-01T09:36:00 | 2 | 77 | ACTIVE |
| 2023-11-01T09:37:00 | 3 | 79 | ACTIVE |
| 2023-11-01T09:38:00 | NULL | NULL | NULL |
+---------------------+-----------+--------+----------*/
In the following query, rows (buckets) are added for gaps that are found:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:35:39', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:37:39', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:38:00', 77, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:40:00', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'locf')
]
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:36:00 | 74 |
| 2023-11-01T09:37:00 | 74 |
| 2023-11-01T09:38:00 | 74 |
| 2023-11-01T09:39:00 | 77 |
| 2023-11-01T09:40:00 | 77 |
+---------------------+--------*/
In the following query, data is condensed when it fits in the same bucket and has the same values:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:35:39', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:60', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:00', 77, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:37:20', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'locf')
]
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:36:00 | 74 |
| 2023-11-01T09:37:00 | 77 |
+---------------------+--------*/
In the following query, gap filling is applied to partitions:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(2, DATETIME '2023-11-01 09:35:07', 87, 'ACTIVE'),
STRUCT(1, DATETIME '2023-11-01 09:35:26', 82, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:35:39', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:07', 88, 'ACTIVE'),
STRUCT(1, DATETIME '2023-11-01 09:36:26', 82, 'ACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:37:07', 88, 'ACTIVE'),
STRUCT(1, DATETIME '2023-11-01 09:37:28', 80, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:39', 77, 'ACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:38:07', 86, 'ACTIVE'),
STRUCT(1, DATETIME '2023-11-01 09:38:26', 81, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:38:39', 77, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
partitioning_columns => ['device_id'],
value_columns => [
('signal', 'locf')
]
)
ORDER BY device_id;
/*---------------------+-----------+--------+
| time | device_id | signal |
+---------------------+-----------+--------+
| 2023-11-01T09:36:00 | 1 | 82 |
| 2023-11-01T09:37:00 | 1 | 82 |
| 2023-11-01T09:38:00 | 1 | 80 |
| 2023-11-01T09:36:00 | 2 | 87 |
| 2023-11-01T09:37:00 | 2 | 88 |
| 2023-11-01T09:38:00 | 2 | 88 |
| 2023-11-01T09:36:00 | 3 | 74 |
| 2023-11-01T09:37:00 | 3 | 74 |
| 2023-11-01T09:38:00 | 3 | 77 |
+---------------------+-----------+--------*/
In the following query, gap filling is applied to multiple columns, and each column uses a different gap-filling method:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'ACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'INACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:38:00', 78, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:39:01', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'linear'),
('state', 'locf')
]
)
ORDER BY time;
/*---------------------+--------+----------+
| time | signal | state |
+---------------------+--------+----------+
| 2023-11-01T09:35:00 | 75 | ACTIVE |
| 2023-11-01T09:36:00 | 77 | INACTIVE |
| 2023-11-01T09:37:00 | 78 | INACTIVE |
| 2023-11-01T09:38:00 | 78 | ACTIVE |
| 2023-11-01T09:39:00 | 80 | ACTIVE |
+---------------------+--------+----------*/
In the following query, the point of origin is changed in the gap-filling
results to a custom origin, using the origin
argument:
CREATE TEMP TABLE device_data AS
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'ACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'INACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:38:00', 78, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:39:01', 80, 'ACTIVE')
]);
SELECT *
FROM GAP_FILL(
TABLE device_data,
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'null')
],
origin => DATETIME '2023-11-01 09:30:01'
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:34:01 | 74 |
| 2023-11-01T09:35:01 | NULL |
| 2023-11-01T09:36:01 | NULL |
| 2023-11-01T09:37:01 | NULL |
| 2023-11-01T09:38:01 | NULL |
| 2023-11-01T09:39:01 | 80 |
+---------------------+--------*/
In the following query, a subquery is passed into the function instead of a table:
SELECT *
FROM GAP_FILL(
(
SELECT * FROM UNNEST(
ARRAY<STRUCT<device_id INT64, time DATETIME, signal INT64, state STRING>>[
STRUCT(1, DATETIME '2023-11-01 09:34:01', 74, 'INACTIVE'),
STRUCT(2, DATETIME '2023-11-01 09:36:00', 77, 'ACTIVE'),
STRUCT(3, DATETIME '2023-11-01 09:37:00', 78, 'ACTIVE'),
STRUCT(4, DATETIME '2023-11-01 09:38:01', 80, 'ACTIVE')
])
),
ts_column => 'time',
bucket_width => INTERVAL 1 MINUTE,
value_columns => [
('signal', 'linear')
]
)
ORDER BY time;
/*---------------------+--------+
| time | signal |
+---------------------+--------+
| 2023-11-01T09:35:00 | 75 |
| 2023-11-01T09:36:00 | 77 |
| 2023-11-01T09:37:00 | 78 |
| 2023-11-01T09:38:00 | 80 |
+---------------------+--------*/
TIMESTAMP_BUCKET
TIMESTAMP_BUCKET(timestamp_in_bucket, bucket_width)
TIMESTAMP_BUCKET(timestamp_in_bucket, bucket_width, bucket_origin_timestamp)
Description
Gets the lower bound of the timestamp bucket that contains a timestamp.
Definitions
timestamp_in_bucket
: ATIMESTAMP
value that you can use to look up a timestamp bucket.bucket_width
: AnINTERVAL
value that represents the width of a timestamp bucket. A single interval with date and time parts is supported.bucket_origin_timestamp
: ATIMESTAMP
value that represents a point in time. All buckets expand left and right from this point. If this argument is not set,1950-01-01 00:00:00
is used by default.
Return type
TIMESTAMP
Examples
In the following example, the origin is omitted and the default origin,
1950-01-01 00:00:00
is used. All buckets expand in both directions from the
origin, and the size of each bucket is 12 hours. The lower bound of the bucket
in which my_timestamp
belongs is returned:
WITH some_timestamps AS (
SELECT TIMESTAMP '1949-12-30 13:00:00.00' AS my_timestamp UNION ALL
SELECT TIMESTAMP '1949-12-31 00:00:00.00' UNION ALL
SELECT TIMESTAMP '1949-12-31 13:00:00.00' UNION ALL
SELECT TIMESTAMP '1950-01-01 00:00:00.00' UNION ALL
SELECT TIMESTAMP '1950-01-01 13:00:00.00' UNION ALL
SELECT TIMESTAMP '1950-01-02 00:00:00.00'
)
SELECT TIMESTAMP_BUCKET(my_timestamp, INTERVAL 12 HOUR) AS bucket_lower_bound
FROM some_timestamps;
-- Display of results may differ, depending upon the environment and
-- time zone where this query was executed.
/*------------------------+
| bucket_lower_bound |
+-------------------------+
| 2000-12-30 12:00:00 UTC |
| 2000-12-31 00:00:00 UTC |
| 2000-12-31 12:00:00 UTC |
| 2000-01-01 00:00:00 UTC |
| 2000-01-01 12:00:00 UTC |
| 2000-01-01 00:00:00 UTC |
+-------------------------*/
-- Some timestamp buckets that originate from 1950-01-01 00:00:00:
-- + Bucket: ...
-- + Bucket: [1949-12-30 00:00:00.00 UTC, 1949-12-30 12:00:00.00 UTC)
-- + Bucket: [1949-12-30 12:00:00.00 UTC, 1950-01-01 00:00:00.00 UTC)
-- + Origin: [1950-01-01 00:00:00.00 UTC]
-- + Bucket: [1950-01-01 00:00:00.00 UTC, 1950-01-01 12:00:00.00 UTC)
-- + Bucket: [1950-01-01 12:00:00.00 UTC, 1950-02-00 00:00:00.00 UTC)
-- + Bucket: ...
In the following example, the origin has been changed to 2000-12-24 12:00:00
,
and all buckets expand in both directions from this point. The size of each
bucket is seven days. The lower bound of the bucket in which my_timestamp
belongs is returned:
WITH some_timestamps AS (
SELECT TIMESTAMP '2000-12-20 00:00:00.00' AS my_timestamp UNION ALL
SELECT TIMESTAMP '2000-12-21 00:00:00.00' UNION ALL
SELECT TIMESTAMP '2000-12-22 00:00:00.00' UNION ALL
SELECT TIMESTAMP '2000-12-23 00:00:00.00' UNION ALL
SELECT TIMESTAMP '2000-12-24 00:00:00.00' UNION ALL
SELECT TIMESTAMP '2000-12-25 00:00:00.00'
)
SELECT TIMESTAMP_BUCKET(
my_timestamp,
INTERVAL 7 DAY,
TIMESTAMP '2000-12-22 12:00:00.00') AS bucket_lower_bound
FROM some_timestamps;
-- Display of results may differ, depending upon the environment and
-- time zone where this query was executed.
/*------------------------+
| bucket_lower_bound |
+-------------------------+
| 2000-12-15 12:00:00 UTC |
| 2000-12-15 12:00:00 UTC |
| 2000-12-15 12:00:00 UTC |
| 2000-12-22 12:00:00 UTC |
| 2000-12-22 12:00:00 UTC |
| 2000-12-22 12:00:00 UTC |
+-------------------------*/
-- Some timestamp buckets that originate from 2000-12-22 12:00:00:
-- + Bucket: ...
-- + Bucket: [2000-12-08 12:00:00.00 UTC, 2000-12-15 12:00:00.00 UTC)
-- + Bucket: [2000-12-15 12:00:00.00 UTC, 2000-12-22 12:00:00.00 UTC)
-- + Origin: [2000-12-22 12:00:00.00 UTC]
-- + Bucket: [2000-12-22 12:00:00.00 UTC, 2000-12-29 12:00:00.00 UTC)
-- + Bucket: [2000-12-29 12:00:00.00 UTC, 2000-01-05 12:00:00.00 UTC)
-- + Bucket: ...
Timestamp functions
GoogleSQL for BigQuery supports the following timestamp functions.
IMPORTANT: Before working with these functions, you need to understand the difference between the formats in which timestamps are stored and displayed, and how time zones are used for the conversion between these formats. To learn more, see How time zones work with timestamp functions.
NOTE: These functions return a runtime error if overflow occurs; result
values are bounded by the defined DATE
range
and TIMESTAMP
range.
Function list
Name | Summary |
---|---|
CURRENT_TIMESTAMP
|
Returns the current date and time as a TIMESTAMP object.
|
EXTRACT
|
Extracts part of a TIMESTAMP value.
|
FORMAT_TIMESTAMP
|
Formats a TIMESTAMP value according to the specified
format string.
|
PARSE_TIMESTAMP
|
Converts a STRING value to a TIMESTAMP value.
|
STRING
|
Converts a TIMESTAMP value to a STRING value.
|
TIMESTAMP
|
Constructs a TIMESTAMP value.
|
TIMESTAMP_ADD
|
Adds a specified time interval to a TIMESTAMP value.
|
TIMESTAMP_DIFF
|
Gets the number of unit boundaries between two TIMESTAMP values
at a particular time granularity.
|
TIMESTAMP_MICROS
|
Converts the number of microseconds since
1970-01-01 00:00:00 UTC to a TIMESTAMP.
|
TIMESTAMP_MILLIS
|
Converts the number of milliseconds since
1970-01-01 00:00:00 UTC to a TIMESTAMP.
|
TIMESTAMP_SECONDS
|
Converts the number of seconds since
1970-01-01 00:00:00 UTC to a TIMESTAMP.
|
TIMESTAMP_SUB
|
Subtracts a specified time interval from a TIMESTAMP value.
|
TIMESTAMP_TRUNC
|
Truncates a TIMESTAMP or
DATETIME value at a particular
granularity.
|
UNIX_MICROS
|
Converts a TIMESTAMP value to the number of microseconds since
1970-01-01 00:00:00 UTC.
|
UNIX_MILLIS
|
Converts a TIMESTAMP value to the number of milliseconds
since 1970-01-01 00:00:00 UTC.
|
UNIX_SECONDS
|
Converts a TIMESTAMP value to the number of seconds since
1970-01-01 00:00:00 UTC.
|
CURRENT_TIMESTAMP
CURRENT_TIMESTAMP()
CURRENT_TIMESTAMP
Description
Returns the current date and time as a timestamp object. The timestamp is continuous, non-ambiguous, has exactly 60 seconds per minute and does not repeat values over the leap second. Parentheses are optional.
This function handles leap seconds by smearing them across a window of 20 hours around the inserted leap second.
The current date and time is recorded at the start of the query statement which contains this function, not when this specific function is evaluated.
Supported Input Types
Not applicable
Result Data Type
TIMESTAMP
Examples
SELECT CURRENT_TIMESTAMP() AS now;
/*--------------------------------*
| now |
+--------------------------------+
| 2020-06-02 23:57:12.120174 UTC |
*--------------------------------*/
EXTRACT
EXTRACT(part FROM timestamp_expression [AT TIME ZONE time_zone])
Description
Returns a value that corresponds to the specified part
from
a supplied timestamp_expression
. This function supports an optional
time_zone
parameter. See
Time zone definitions for information
on how to specify a time zone.
Allowed part
values are:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAYOFWEEK
: Returns values in the range [1,7] with Sunday as the first day of of the week.DAY
DAYOFYEAR
WEEK
: Returns the week number of the date in the range [0, 53]. Weeks begin with Sunday, and dates prior to the first Sunday of the year are in week 0.WEEK(<WEEKDAY>)
: Returns the week number oftimestamp_expression
in the range [0, 53]. Weeks begin onWEEKDAY
.datetime
s prior to the firstWEEKDAY
of the year are in week 0. Valid values forWEEKDAY
areSUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, andSATURDAY
.ISOWEEK
: Returns the ISO 8601 week number of thedatetime_expression
.ISOWEEK
s begin on Monday. Return values are in the range [1, 53]. The firstISOWEEK
of each ISO year begins on the Monday before the first Thursday of the Gregorian calendar year.MONTH
QUARTER
YEAR
ISOYEAR
: Returns the ISO 8601 week-numbering year, which is the Gregorian calendar year containing the Thursday of the week to whichdate_expression
belongs.DATE
DATETIME
TIME
Returned values truncate lower order time periods. For example, when extracting
seconds, EXTRACT
truncates the millisecond and microsecond values.
Return Data Type
INT64
, except in the following cases:
- If
part
isDATE
, the function returns aDATE
object.
Examples
In the following example, EXTRACT
returns a value corresponding to the DAY
time part.
SELECT
EXTRACT(
DAY
FROM TIMESTAMP('2008-12-25 05:30:00+00') AT TIME ZONE 'UTC')
AS the_day_utc,
EXTRACT(
DAY
FROM TIMESTAMP('2008-12-25 05:30:00+00') AT TIME ZONE 'America/Los_Angeles')
AS the_day_california
/*-------------+--------------------*
| the_day_utc | the_day_california |
+-------------+--------------------+
| 25 | 24 |
*-------------+--------------------*/
In the following examples, EXTRACT
returns values corresponding to different
time parts from a column of type TIMESTAMP
.
SELECT
EXTRACT(ISOYEAR FROM TIMESTAMP("2005-01-03 12:34:56+00")) AS isoyear,
EXTRACT(ISOWEEK FROM TIMESTAMP("2005-01-03 12:34:56+00")) AS isoweek,
EXTRACT(YEAR FROM TIMESTAMP("2005-01-03 12:34:56+00")) AS year,
EXTRACT(WEEK FROM TIMESTAMP("2005-01-03 12:34:56+00")) AS week
-- Display of results may differ, depending upon the environment and
-- time zone where this query was executed.
/*---------+---------+------+------*
| isoyear | isoweek | year | week |
+---------+---------+------+------+
| 2005 | 1 | 2005 | 1 |
*---------+---------+------+------*/
SELECT
TIMESTAMP("2007-12-31 12:00:00+00") AS timestamp_value,
EXTRACT(ISOYEAR FROM TIMESTAMP("2007-12-31 12:00:00+00")) AS isoyear,
EXTRACT(ISOWEEK FROM TIMESTAMP("2007-12-31 12:00:00+00")) AS isoweek,
EXTRACT(YEAR FROM TIMESTAMP("2007-12-31 12:00:00+00")) AS year,
EXTRACT(WEEK FROM TIMESTAMP("2007-12-31 12:00:00+00")) AS week
-- Display of results may differ, depending upon the environment and time zone
-- where this query was executed.
/*---------+---------+------+------*
| isoyear | isoweek | year | week |
+---------+---------+------+------+
| 2008 | 1 | 2007 | 52 |
*---------+---------+------+------*/
SELECT
TIMESTAMP("2009-01-01 12:00:00+00") AS timestamp_value,
EXTRACT(ISOYEAR FROM TIMESTAMP("2009-01-01 12:00:00+00")) AS isoyear,
EXTRACT(ISOWEEK FROM TIMESTAMP("2009-01-01 12:00:00+00")) AS isoweek,
EXTRACT(YEAR FROM TIMESTAMP("2009-01-01 12:00:00+00")) AS year,
EXTRACT(WEEK FROM TIMESTAMP("2009-01-01 12:00:00+00")) AS week
-- Display of results may differ, depending upon the environment and time zone
-- where this query was executed.
/*---------+---------+------+------*
| isoyear | isoweek | year | week |
+---------+---------+------+------+
| 2009 | 1 | 2009 | 0 |
*---------+---------+------+------*/
SELECT
TIMESTAMP("2009-12-31 12:00:00+00") AS timestamp_value,
EXTRACT(ISOYEAR FROM TIMESTAMP("2009-12-31 12:00:00+00")) AS isoyear,
EXTRACT(ISOWEEK FROM TIMESTAMP("2009-12-31 12:00:00+00")) AS isoweek,
EXTRACT(YEAR FROM TIMESTAMP("2009-12-31 12:00:00+00")) AS year,
EXTRACT(WEEK FROM TIMESTAMP("2009-12-31 12:00:00+00")) AS week
-- Display of results may differ, depending upon the environment and time zone
-- where this query was executed.
/*---------+---------+------+------*
| isoyear | isoweek | year | week |
+---------+---------+------+------+
| 2009 | 53 | 2009 | 52 |
*---------+---------+------+------*/
SELECT
TIMESTAMP("2017-01-02 12:00:00+00") AS timestamp_value,
EXTRACT(ISOYEAR FROM TIMESTAMP("2017-01-02 12:00:00+00")) AS isoyear,
EXTRACT(ISOWEEK FROM TIMESTAMP("2017-01-02 12:00:00+00")) AS isoweek,
EXTRACT(YEAR FROM TIMESTAMP("2017-01-02 12:00:00+00")) AS year,
EXTRACT(WEEK FROM TIMESTAMP("2017-01-02 12:00:00+00")) AS week
-- Display of results may differ, depending upon the environment and time zone
-- where this query was executed.
/*---------+---------+------+------*
| isoyear | isoweek | year | week |
+---------+---------+------+------+
| 2017 | 1 | 2017 | 1 |
*---------+---------+------+------*/
SELECT
TIMESTAMP("2017-05-26 12:00:00+00") AS timestamp_value,
EXTRACT(ISOYEAR FROM TIMESTAMP("2017-05-26 12:00:00+00")) AS isoyear,
EXTRACT(ISOWEEK FROM TIMESTAMP("2017-05-26 12:00:00+00")) AS isoweek,
EXTRACT(YEAR FROM TIMESTAMP("2017-05-26 12:00:00+00")) AS year,
EXTRACT(WEEK FROM TIMESTAMP("2017-05-26 12:00:00+00")) AS week
-- Display of results may differ, depending upon the environment and time zone
-- where this query was executed.
/*---------+---------+------+------*
| isoyear | isoweek | year | week |
+---------+---------+------+------+
| 2017 | 21 | 2017 | 21 |
*---------+---------+------+------*/
In the following example, timestamp_expression
falls on a Monday. EXTRACT
calculates the first column using weeks that begin on Sunday, and it calculates
the second column using weeks that begin on Monday.
SELECT
EXTRACT(WEEK(SUNDAY) FROM TIMESTAMP("2017-11-06 00:00:00+00")) AS week_sunday,
EXTRACT(WEEK(MONDAY) FROM TIMESTAMP("2017-11-06 00:00:00+00")) AS week_monday
-- Display of results may differ, depending upon the environment and time zone
-- where this query was executed.
/*-------------+---------------*
| week_sunday | week_monday |
+-------------+---------------+
| 45 | 44 |
*-------------+---------------*/
FORMAT_TIMESTAMP
FORMAT_TIMESTAMP(format_string, timestamp_expr[, time_zone])
Description
Formats a TIMESTAMP
value according to the specified format string.
Definitions
format_string
: ASTRING
value that contains the format elements to use withtimestamp_expr
.timestamp_expr
: ATIMESTAMP
value that represents the timestamp to format.time_zone
: ASTRING
value that represents a time zone. For more information about how to use a time zone with a timestamp, see Time zone definitions.
Return Data Type
STRING
Examples
SELECT FORMAT_TIMESTAMP("%c", TIMESTAMP "2050-12-25 15:30:55+00", "UTC")
AS formatted;
/*--------------------------*
| formatted |
+--------------------------+
| Sun Dec 25 15:30:55 2050 |
*--------------------------*/
SELECT FORMAT_TIMESTAMP("%b-%d-%Y", TIMESTAMP "2050-12-25 15:30:55+00")
AS formatted;
/*-------------*
| formatted |
+-------------+
| Dec-25-2050 |
*-------------*/
SELECT FORMAT_TIMESTAMP("%b %Y", TIMESTAMP "2050-12-25 15:30:55+00")
AS formatted;
/*-------------*
| formatted |
+-------------+
| Dec 2050 |
*-------------*/
SELECT FORMAT_TIMESTAMP("%Y-%m-%dT%H:%M:%SZ", TIMESTAMP "2050-12-25 15:30:55", "UTC")
AS formatted;
/*+---------------------*
| formatted |
+----------------------+
| 2050-12-25T15:30:55Z |
*----------------------*/
PARSE_TIMESTAMP
PARSE_TIMESTAMP(format_string, timestamp_string[, time_zone])
Description
Converts a STRING
value to a TIMESTAMP
value.
Definitions
format_string
: ASTRING
value that contains the format elements to use withtimestamp_string
.timestamp_string
: ASTRING
value that represents the timestamp to parse.time_zone
: ASTRING
value that represents a time zone. For more information about how to use a time zone with a timestamp, see Time zone definitions.
Details
Each element in timestamp_string
must have a corresponding element in
format_string
. The location of each element in format_string
must match the
location of each element in timestamp_string
.
-- This works because elements on both sides match.
SELECT PARSE_TIMESTAMP("%a %b %e %I:%M:%S %Y", "Thu Dec 25 07:30:00 2008");
-- This produces an error because the year element is in different locations.
SELECT PARSE_TIMESTAMP("%a %b %e %Y %I:%M:%S", "Thu Dec 25 07:30:00 2008");
-- This produces an error because one of the year elements is missing.
SELECT PARSE_TIMESTAMP("%a %b %e %I:%M:%S", "Thu Dec 25 07:30:00 2008");
-- This works because %c can find all matching elements in timestamp_string.
SELECT PARSE_TIMESTAMP("%c", "Thu Dec 25 07:30:00 2008");
The format string fully supports most format elements, except for
%P
.
When using PARSE_TIMESTAMP
, keep the following in mind:
- Unspecified fields. Any unspecified field is initialized from
1970-01-01 00:00:00.0
. This initialization value uses the time zone specified by the function's time zone argument, if present. If not, the initialization value uses the default time zone, UTC. For instance, if the year is unspecified then it defaults to1970
, and so on. - Case insensitivity. Names, such as
Monday
,February
, and so on, are case insensitive. - Whitespace. One or more consecutive white spaces in the format string matches zero or more consecutive white spaces in the timestamp string. In addition, leading and trailing white spaces in the timestamp string are always allowed, even if they are not in the format string.
- Format precedence. When two (or more) format elements have overlapping
information (for example both
%F
and%Y
affect the year), the last one generally overrides any earlier ones, with some exceptions (see the descriptions of%s
,%C
, and%y
). - Format divergence.
%p
can be used witham
,AM
,pm
, andPM
.
Return Data Type
TIMESTAMP
Example
SELECT PARSE_TIMESTAMP("%c", "Thu Dec 25 07:30:00 2008") AS parsed;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| parsed |
+-------------------------+
| 2008-12-25 07:30:00 UTC |
*-------------------------*/
STRING
STRING(timestamp_expression[, time_zone])
Description
Converts a timestamp to a string. Supports an optional parameter to specify a time zone. See Time zone definitions for information on how to specify a time zone.
Return Data Type
STRING
Example
SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00", "UTC") AS string;
/*-------------------------------*
| string |
+-------------------------------+
| 2008-12-25 15:30:00+00 |
*-------------------------------*/
TIMESTAMP
TIMESTAMP(string_expression[, time_zone])
TIMESTAMP(date_expression[, time_zone])
TIMESTAMP(datetime_expression[, time_zone])
Description
string_expression[, time_zone]
: Converts a string to a timestamp.string_expression
must include a timestamp literal. Ifstring_expression
includes a time zone in the timestamp literal, do not include an explicittime_zone
argument.date_expression[, time_zone]
: Converts a date to a timestamp. The value returned is the earliest timestamp that falls within the given date.datetime_expression[, time_zone]
: Converts a datetime to a timestamp.
This function supports an optional parameter to specify a time zone. If no time zone is specified, the default time zone, UTC, is used.
Return Data Type
TIMESTAMP
Examples
SELECT TIMESTAMP("2008-12-25 15:30:00+00") AS timestamp_str;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| timestamp_str |
+-------------------------+
| 2008-12-25 15:30:00 UTC |
*-------------------------*/
SELECT TIMESTAMP("2008-12-25 15:30:00", "America/Los_Angeles") AS timestamp_str;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| timestamp_str |
+-------------------------+
| 2008-12-25 23:30:00 UTC |
*-------------------------*/
SELECT TIMESTAMP("2008-12-25 15:30:00 UTC") AS timestamp_str;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| timestamp_str |
+-------------------------+
| 2008-12-25 15:30:00 UTC |
*-------------------------*/
SELECT TIMESTAMP(DATETIME "2008-12-25 15:30:00") AS timestamp_datetime;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| timestamp_datetime |
+-------------------------+
| 2008-12-25 15:30:00 UTC |
*-------------------------*/
SELECT TIMESTAMP(DATE "2008-12-25") AS timestamp_date;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| timestamp_date |
+-------------------------+
| 2008-12-25 00:00:00 UTC |
*-------------------------*/
TIMESTAMP_ADD
TIMESTAMP_ADD(timestamp_expression, INTERVAL int64_expression date_part)
Description
Adds int64_expression
units of date_part
to the timestamp, independent of
any time zone.
TIMESTAMP_ADD
supports the following values for date_part
:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
. Equivalent to 60MINUTE
parts.DAY
. Equivalent to 24HOUR
parts.
Return Data Types
TIMESTAMP
Example
SELECT
TIMESTAMP("2008-12-25 15:30:00+00") AS original,
TIMESTAMP_ADD(TIMESTAMP "2008-12-25 15:30:00+00", INTERVAL 10 MINUTE) AS later;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------+-------------------------*
| original | later |
+-------------------------+-------------------------+
| 2008-12-25 15:30:00 UTC | 2008-12-25 15:40:00 UTC |
*-------------------------+-------------------------*/
TIMESTAMP_DIFF
TIMESTAMP_DIFF(end_timestamp, start_timestamp, granularity)
Description
Gets the number of unit boundaries between two TIMESTAMP
values
(end_timestamp
- start_timestamp
) at a particular time granularity.
Definitions
start_timestamp
: The startingTIMESTAMP
value.end_timestamp
: The endingTIMESTAMP
value.granularity
: The timestamp part that represents the granularity. If you passed inTIMESTAMP
values for the first arguments,granularity
can be:MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
. Equivalent to 60MINUTE
s.DAY
. Equivalent to 24HOUR
s.
Details
If end_timestamp
is earlier than start_timestamp
, the output is negative.
Produces an error if the computation overflows, such as if the difference
in microseconds
between the two TIMESTAMP
values overflows.
Return Data Type
INT64
Example
SELECT
TIMESTAMP("2010-07-07 10:20:00+00") AS later_timestamp,
TIMESTAMP("2008-12-25 15:30:00+00") AS earlier_timestamp,
TIMESTAMP_DIFF(TIMESTAMP "2010-07-07 10:20:00+00", TIMESTAMP "2008-12-25 15:30:00+00", HOUR) AS hours;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------+-------------------------+-------*
| later_timestamp | earlier_timestamp | hours |
+-------------------------+-------------------------+-------+
| 2010-07-07 10:20:00 UTC | 2008-12-25 15:30:00 UTC | 13410 |
*-------------------------+-------------------------+-------*/
In the following example, the first timestamp occurs before the second timestamp, resulting in a negative output.
SELECT TIMESTAMP_DIFF(TIMESTAMP "2018-08-14", TIMESTAMP "2018-10-14", DAY) AS negative_diff;
/*---------------*
| negative_diff |
+---------------+
| -61 |
*---------------*/
In this example, the result is 0 because only the number of whole specified
HOUR
intervals are included.
SELECT TIMESTAMP_DIFF("2001-02-01 01:00:00", "2001-02-01 00:00:01", HOUR) AS diff;
/*---------------*
| diff |
+---------------+
| 0 |
*---------------*/
TIMESTAMP_MICROS
TIMESTAMP_MICROS(int64_expression)
Description
Interprets int64_expression
as the number of microseconds since 1970-01-01
00:00:00 UTC and returns a timestamp.
Return Data Type
TIMESTAMP
Example
SELECT TIMESTAMP_MICROS(1230219000000000) AS timestamp_value;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| timestamp_value |
+-------------------------+
| 2008-12-25 15:30:00 UTC |
*-------------------------*/
TIMESTAMP_MILLIS
TIMESTAMP_MILLIS(int64_expression)
Description
Interprets int64_expression
as the number of milliseconds since 1970-01-01
00:00:00 UTC and returns a timestamp.
Return Data Type
TIMESTAMP
Example
SELECT TIMESTAMP_MILLIS(1230219000000) AS timestamp_value;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| timestamp_value |
+-------------------------+
| 2008-12-25 15:30:00 UTC |
*-------------------------*/
TIMESTAMP_SECONDS
TIMESTAMP_SECONDS(int64_expression)
Description
Interprets int64_expression
as the number of seconds since 1970-01-01 00:00:00
UTC and returns a timestamp.
Return Data Type
TIMESTAMP
Example
SELECT TIMESTAMP_SECONDS(1230219000) AS timestamp_value;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------*
| timestamp_value |
+-------------------------+
| 2008-12-25 15:30:00 UTC |
*-------------------------*/
TIMESTAMP_SUB
TIMESTAMP_SUB(timestamp_expression, INTERVAL int64_expression date_part)
Description
Subtracts int64_expression
units of date_part
from the timestamp,
independent of any time zone.
TIMESTAMP_SUB
supports the following values for date_part
:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
. Equivalent to 60MINUTE
parts.DAY
. Equivalent to 24HOUR
parts.
Return Data Type
TIMESTAMP
Example
SELECT
TIMESTAMP("2008-12-25 15:30:00+00") AS original,
TIMESTAMP_SUB(TIMESTAMP "2008-12-25 15:30:00+00", INTERVAL 10 MINUTE) AS earlier;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------+-------------------------*
| original | earlier |
+-------------------------+-------------------------+
| 2008-12-25 15:30:00 UTC | 2008-12-25 15:20:00 UTC |
*-------------------------+-------------------------*/
TIMESTAMP_TRUNC
TIMESTAMP_TRUNC(timestamp_value, timestamp_granularity[, time_zone])
TIMESTAMP_TRUNC(datetime_value, datetime_granularity)
Description
Truncates a TIMESTAMP
or DATETIME
value at a particular granularity.
Definitions
timestamp_value
: ATIMESTAMP
value to truncate.timestamp_granularity
: The truncation granularity for aTIMESTAMP
value. Date granularities and time granularities can be used.time_zone
: A time zone to use with theTIMESTAMP
value. Time zone parts can be used. Use this argument if you want to use a time zone other than the default time zone, UTC, as part of the truncate operation.datetime_value
: ADATETIME
value to truncate.datetime_granularity
: The truncation granularity for aDATETIME
value. Date granularities and time granularities can be used.
Date granularity definitions
DAY
: The day in the Gregorian calendar year that contains the value to truncate.WEEK
: The first day in the week that contains the value to truncate. Weeks begin on Sundays.WEEK
is equivalent toWEEK(SUNDAY)
.WEEK(WEEKDAY)
: The first day in the week that contains the value to truncate. Weeks begin onWEEKDAY
.WEEKDAY
must be one of the following:SUNDAY
,MONDAY
,TUESDAY
,WEDNESDAY
,THURSDAY
,FRIDAY
, orSATURDAY
.ISOWEEK
: The first day in the ISO 8601 week that contains the value to truncate. The ISO week begins on Monday. The first ISO week of each ISO year contains the first Thursday of the corresponding Gregorian calendar year.MONTH
: The first day in the month that contains the value to truncate.QUARTER
: The first day in the quarter that contains the value to truncate.YEAR
: The first day in the year that contains the value to truncate.ISOYEAR
: The first day in the ISO 8601 week-numbering year that contains the value to truncate. The ISO year is the Monday of the first week where Thursday belongs to the corresponding Gregorian calendar year.
Time granularity definitions
MICROSECOND
: If used, nothing is truncated from the value.MILLISECOND
: The nearest lesser than or equal millisecond.SECOND
: The nearest lesser than or equal second.MINUTE
: The nearest lesser than or equal minute.HOUR
: The nearest lesser than or equal hour.
Time zone part definitions
MINUTE
HOUR
DAY
WEEK
WEEK(<WEEKDAY>)
ISOWEEK
MONTH
QUARTER
YEAR
ISOYEAR
Details
The resulting value is always rounded to the beginning of granularity
.
Return Data Type
The same data type as the first argument passed into this function.
Examples
SELECT
TIMESTAMP_TRUNC(TIMESTAMP "2008-12-25 15:30:00+00", DAY, "UTC") AS utc,
TIMESTAMP_TRUNC(TIMESTAMP "2008-12-25 15:30:00+00", DAY, "America/Los_Angeles") AS la;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------+-------------------------*
| utc | la |
+-------------------------+-------------------------+
| 2008-12-25 00:00:00 UTC | 2008-12-25 08:00:00 UTC |
*-------------------------+-------------------------*/
In the following example, timestamp_expression
has a time zone offset of +12.
The first column shows the timestamp_expression
in UTC time. The second
column shows the output of TIMESTAMP_TRUNC
using weeks that start on Monday.
Because the timestamp_expression
falls on a Sunday in UTC, TIMESTAMP_TRUNC
truncates it to the preceding Monday. The third column shows the same function
with the optional Time zone definition
argument 'Pacific/Auckland'. Here, the function truncates the
timestamp_expression
using New Zealand Daylight Time, where it falls on a
Monday.
SELECT
timestamp_value AS timestamp_value,
TIMESTAMP_TRUNC(timestamp_value, WEEK(MONDAY), "UTC") AS utc_truncated,
TIMESTAMP_TRUNC(timestamp_value, WEEK(MONDAY), "Pacific/Auckland") AS nzdt_truncated
FROM (SELECT TIMESTAMP("2017-11-06 00:00:00+12") AS timestamp_value);
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------+-------------------------+-------------------------*
| timestamp_value | utc_truncated | nzdt_truncated |
+-------------------------+-------------------------+-------------------------+
| 2017-11-05 12:00:00 UTC | 2017-10-30 00:00:00 UTC | 2017-11-05 11:00:00 UTC |
*-------------------------+-------------------------+-------------------------*/
In the following example, the original timestamp_expression
is in the
Gregorian calendar year 2015. However, TIMESTAMP_TRUNC
with the ISOYEAR
date
part truncates the timestamp_expression
to the beginning of the ISO year, not
the Gregorian calendar year. The first Thursday of the 2015 calendar year was
2015-01-01, so the ISO year 2015 begins on the preceding Monday, 2014-12-29.
Therefore the ISO year boundary preceding the timestamp_expression
2015-06-15 00:00:00+00 is 2014-12-29.
SELECT
TIMESTAMP_TRUNC("2015-06-15 00:00:00+00", ISOYEAR) AS isoyear_boundary,
EXTRACT(ISOYEAR FROM TIMESTAMP "2015-06-15 00:00:00+00") AS isoyear_number;
-- Display of results may differ, depending upon the environment and time zone where this query was executed.
/*-------------------------+----------------*
| isoyear_boundary | isoyear_number |
+-------------------------+----------------+
| 2014-12-29 00:00:00 UTC | 2015 |
*-------------------------+----------------*/
UNIX_MICROS
UNIX_MICROS(timestamp_expression)
Description
Returns the number of microseconds since 1970-01-01 00:00:00 UTC
.
Return Data Type
INT64
Examples
SELECT UNIX_MICROS(TIMESTAMP "2008-12-25 15:30:00+00") AS micros;
/*------------------*
| micros |
+------------------+
| 1230219000000000 |
*------------------*/
UNIX_MILLIS
UNIX_MILLIS(timestamp_expression)
Description
Returns the number of milliseconds since 1970-01-01 00:00:00 UTC
. Truncates
higher levels of precision by rounding down to the beginning of the millisecond.
Return Data Type
INT64
Examples
SELECT UNIX_MILLIS(TIMESTAMP "2008-12-25 15:30:00+00") AS millis;
/*---------------*
| millis |
+---------------+
| 1230219000000 |
*---------------*/
SELECT UNIX_MILLIS(TIMESTAMP "1970-01-01 00:00:00.0018+00") AS millis;
/*---------------*
| millis |
+---------------+
| 1 |
*---------------*/
UNIX_SECONDS
UNIX_SECONDS(timestamp_expression)
Description
Returns the number of seconds since 1970-01-01 00:00:00 UTC
. Truncates higher
levels of precision by rounding down to the beginning of the second.
Return Data Type
INT64
Examples
SELECT UNIX_SECONDS(TIMESTAMP "2008-12-25 15:30:00+00") AS seconds;
/*------------*
| seconds |
+------------+
| 1230219000 |
*------------*/
SELECT UNIX_SECONDS(TIMESTAMP "1970-01-01 00:00:01.8+00") AS seconds;
/*------------*
| seconds |
+------------+
| 1 |
*------------*/
Supplemental materials
How time zones work with timestamp functions
A timestamp represents an absolute point in time, independent of any time
zone. However, when a timestamp value is displayed, it is usually converted to
a human-readable format consisting of a civil date and time
(YYYY-MM-DD HH:MM:SS)
and a time zone. This is not the internal representation of the
TIMESTAMP
; it is only a human-understandable way to describe the point in time
that the timestamp represents.
Some timestamp functions have a time zone argument. A time zone is needed to
convert between civil time (YYYY-MM-DD HH:MM:SS) and the absolute time
represented by a timestamp.
A function like PARSE_TIMESTAMP
takes an input string that represents a
civil time and returns a timestamp that represents an absolute time. A
time zone is needed for this conversion. A function like EXTRACT
takes an
input timestamp (absolute time) and converts it to civil time in order to
extract a part of that civil time. This conversion requires a time zone.
If no time zone is specified, the default time zone, UTC,
is used.
Certain date and timestamp functions allow you to override the default time zone
and specify a different one. You can specify a time zone by either supplying
the time zone name (for example, America/Los_Angeles
)
or time zone offset from UTC (for example, -08).
To learn more about how time zones work with the TIMESTAMP
type, see
Time zones.
Utility functions
GoogleSQL for BigQuery supports the following utility functions.
Function list
Name | Summary |
---|---|
GENERATE_UUID
|
Produces a random universally unique identifier (UUID) as a
STRING value.
|
GENERATE_UUID
GENERATE_UUID()
Description
Returns a random universally unique identifier (UUID) as a STRING
.
The returned STRING
consists of 32 hexadecimal
digits in five groups separated by hyphens in the form 8-4-4-4-12. The
hexadecimal digits represent 122 random bits and 6 fixed bits, in compliance
with RFC 4122 section 4.4.
The returned STRING
is lowercase.
Return Data Type
STRING
Example
The following query generates a random UUID.
SELECT GENERATE_UUID() AS uuid;
/*--------------------------------------*
| uuid |
+--------------------------------------+
| 4192bff0-e1e0-43ce-a4db-912808c32493 |
*--------------------------------------*/