Range functions

GoogleSQL for BigQuery supports the following range functions.

Function list

Name Summary
GENERATE_RANGE_ARRAY Splits a range into an array of subranges.
RANGE Constructs a range of DATE, DATETIME, or TIMESTAMP values.
RANGE_CONTAINS Signature 1: Checks if one range is in another range.

Signature 2: Checks if a value is in a range.
RANGE_END Gets the upper bound of a range.
RANGE_INTERSECT Gets a segment of two ranges that intersect.
RANGE_OVERLAPS Checks if two ranges overlap.
RANGE_SESSIONIZE Produces a table of sessionized ranges.
RANGE_START Gets the lower bound of a range.

GENERATE_RANGE_ARRAY

GENERATE_RANGE_ARRAY(range_to_split, step_interval)
GENERATE_RANGE_ARRAY(range_to_split, step_interval, include_last_partial_range)

Description

Splits a range into an array of subranges.

Definitions

  • range_to_split: The RANGE<T> value to split.
  • step_interval: The INTERVAL value, which determines the maximum size of each subrange in the resulting array. An interval single date and time part is supported, but an interval range of date and time parts is not.

    • If range_to_split is RANGE<DATE>, these interval date parts are supported: YEAR to DAY.

    • If range_to_split is RANGE<DATETIME>, these interval date and time parts are supported: YEAR to SECOND.

    • If range_to_split is RANGE<TIMESTAMP>, these interval date and time parts are supported: DAY to SECOND.

  • include_last_partial_range: A BOOL value, which determines whether or not to include the last subrange if it's a partial subrange. If this argument is not specified, the default value is TRUE.

    • TRUE (default): The last subrange is included, even if it's smaller than step_interval.

    • FALSE: Exclude the last subrange if it's smaller than step_interval.

Details

Returns NULL if any input isNULL.

Return type

ARRAY<RANGE<T>>

Examples

In the following example, a date range between 2020-01-01 and 2020-01-06 is split into an array of subranges that are one day long. There are no partial ranges.

SELECT GENERATE_RANGE_ARRAY(
  RANGE(DATE '2020-01-01', DATE '2020-01-06'),
  INTERVAL 1 DAY) AS results;

/*----------------------------+
 | results                    |
 +----------------------------+
 | [                          |
 |  [2020-01-01, 2020-01-02), |
 |  [2020-01-02, 2020-01-03), |
 |  [2020-01-03, 2020-01-04), |
 |  [2020-01-04, 2020-01-05), |
 |  [2020-01-05, 2020-01-06), |
 | ]                          |
 +----------------------------*/

In the following examples, a date range between 2020-01-01 and 2020-01-06 is split into an array of subranges that are two days long. The final subrange is smaller than two days:

SELECT GENERATE_RANGE_ARRAY(
  RANGE(DATE '2020-01-01', DATE '2020-01-06'),
  INTERVAL 2 DAY) AS results;

/*----------------------------+
 | results                    |
 +----------------------------+
 | [                          |
 |  [2020-01-01, 2020-01-03), |
 |  [2020-01-03, 2020-01-05), |
 |  [2020-01-05, 2020-01-06)  |
 | ]                          |
 +----------------------------*/
SELECT GENERATE_RANGE_ARRAY(
  RANGE(DATE '2020-01-01', DATE '2020-01-06'),
  INTERVAL 2 DAY,
  TRUE) AS results;

/*----------------------------+
 | results                    |
 +----------------------------+
 | [                          |
 |  [2020-01-01, 2020-01-03), |
 |  [2020-01-03, 2020-01-05), |
 |  [2020-01-05, 2020-01-06)  |
 | ]                          |
 +----------------------------*/

In the following example, a date range between 2020-01-01 and 2020-01-06 is split into an array of subranges that are two days long, but the final subrange is excluded because it's smaller than two days:

SELECT GENERATE_RANGE_ARRAY(
  RANGE(DATE '2020-01-01', DATE '2020-01-06'),
  INTERVAL 2 DAY,
  FALSE) AS results;

/*----------------------------+
 | results                    |
 +----------------------------+
 | [                          |
 |  [2020-01-01, 2020-01-03), |
 |  [2020-01-03, 2020-01-05)  |
 | ]                          |
 +----------------------------*/

RANGE

RANGE(lower_bound, upper_bound)

Description

Constructs a range of DATE, DATETIME, or TIMESTAMP values.

Definitions

  • lower_bound: The range starts from this value. This can be a DATE, DATETIME, or TIMESTAMP value. If this value is NULL, the range doesn't include a lower bound.
  • upper_bound: The range ends before this value. This can be a DATE, DATETIME, or TIMESTAMP value. If this value is NULL, the range doesn't include an upper bound.

Details

lower_bound and upper_bound must be of the same data type.

Produces an error if lower_bound is greater than or equal to upper_bound. To return NULL instead, add the SAFE. prefix to the function name.

Return type

RANGE<T>, where T is the same data type as the input.

Examples

The following query constructs a date range:

SELECT RANGE(DATE '2022-12-01', DATE '2022-12-31') AS results;

/*--------------------------+
 | results                  |
 +--------------------------+
 | [2022-12-01, 2022-12-31) |
 +--------------------------*/

The following query constructs a datetime range:

SELECT RANGE(DATETIME '2022-10-01 14:53:27',
             DATETIME '2022-10-01 16:00:00') AS results;

/*---------------------------------------------+
 | results                                     |
 +---------------------------------------------+
 | [2022-10-01T14:53:27, 2022-10-01T16:00:00)  |
 +---------------------------------------------*/

The following query constructs a timestamp range:

SELECT RANGE(TIMESTAMP '2022-10-01 14:53:27 America/Los_Angeles',
             TIMESTAMP '2022-10-01 16:00:00 America/Los_Angeles') AS results;

-- Results depend upon where this query was executed.
/*------------------------------------------------------------------+
 | results                                                          |
 +------------------------------------------------------------------+
 | [2022-10-01 21:53:27.000000 UTC, 2022-10-01 23:00:00.000000 UTC) |
 +------------------------------------------------------------------*/

The following query constructs a date range with no lower bound:

SELECT RANGE(NULL, DATE '2022-12-31') AS results;

/*-------------------------+
 | results                 |
 +-------------------------+
 | [UNBOUNDED, 2022-12-31) |
 +-------------------------*/

The following query constructs a date range with no upper bound:

SELECT RANGE(DATE '2022-10-01', NULL) AS results;

/*--------------------------+
 | results                  |
 +--------------------------+
 | [2022-10-01, UNBOUNDED)  |
 +--------------------------*/

RANGE_CONTAINS

  • Signature 1: Checks if every value in one range is in another range.
  • Signature 2: Checks if a value is in a range.

Signature 1

RANGE_CONTAINS(outer_range, inner_range)

Description

Checks if the inner range is in the outer range.

Definitions

  • outer_range: The RANGE<T> value to search within.
  • inner_range: The RANGE<T> value to search for in outer_range.

Details

Returns TRUE if inner_range exists in outer_range. Otherwise, returns FALSE.

T must be of the same type for all inputs.

Return type

BOOL

Examples

In the following query, the inner range is in the outer range:

SELECT RANGE_CONTAINS(
  RANGE<DATE> '[2022-01-01, 2023-01-01)',
  RANGE<DATE> '[2022-04-01, 2022-07-01)') AS results;

/*---------+
 | results |
 +---------+
 | TRUE    |
 +---------*/

In the following query, the inner range is not in the outer range:

SELECT RANGE_CONTAINS(
  RANGE<DATE> '[2022-01-01, 2023-01-01)',
  RANGE<DATE> '[2023-01-01, 2023-04-01)') AS results;

/*---------+
 | results |
 +---------+
 | FALSE   |
 +---------*/

Signature 2

RANGE_CONTAINS(range_to_search, value_to_find)

Description

Checks if a value is in a range.

Definitions

  • range_to_search: The RANGE<T> value to search within.
  • value_to_find: The value to search for in range_to_search.

Details

Returns TRUE if value_to_find exists in range_to_search. Otherwise, returns FALSE.

The data type for value_to_find must be the same data type as Tin range_to_search.

Return type

BOOL

Examples

In the following query, the value 2022-04-01 is found in the range [2022-01-01, 2023-01-01):

SELECT RANGE_CONTAINS(
  RANGE<DATE> '[2022-01-01, 2023-01-01)',
  DATE '2022-04-01') AS results;

/*---------+
 | results |
 +---------+
 | TRUE    |
 +---------*/

In the following query, the value 2023-04-01 is not found in the range [2022-01-01, 2023-01-01):

SELECT RANGE_CONTAINS(
  RANGE<DATE> '[2022-01-01, 2023-01-01)',
  DATE '2023-04-01') AS results;

/*---------+
 | results |
 +---------+
 | FALSE   |
 +---------*/

RANGE_END

RANGE_END(range_to_check)

Description

Gets the upper bound of a range.

Definitions

  • range_to_check: The RANGE<T> value.

Details

Returns NULL if the upper bound in range_value is UNBOUNDED.

Returns NULL if range_to_check is NULL.

Return type

T in range_value

Examples

In the following query, the upper bound of the range is retrieved:

SELECT RANGE_END(RANGE<DATE> '[2022-12-01, 2022-12-31)') AS results;

/*------------+
 | results    |
 +------------+
 | 2022-12-31 |
 +------------*/

In the following query, the upper bound of the range is unbounded, so NULL is returned:

SELECT RANGE_END(RANGE<DATE> '[2022-12-01, UNBOUNDED)') AS results;

/*------------+
 | results    |
 +------------+
 | NULL       |
 +------------*/

RANGE_INTERSECT

RANGE_INTERSECT(range_a, range_b)

Description

Gets a segment of two ranges that intersect.

Definitions

  • range_a: The first RANGE<T> value.
  • range_b: The second RANGE<T> value.

Details

Returns NULL if any input isNULL.

Produces an error if range_a and range_b don't overlap. To return NULL instead, add the SAFE. prefix to the function name.

T must be of the same type for all inputs.

Return type

RANGE<T>

Examples

SELECT RANGE_INTERSECT(
  RANGE<DATE> '[2022-02-01, 2022-09-01)',
  RANGE<DATE> '[2021-06-15, 2022-04-15)') AS results;

/*--------------------------+
 | results                  |
 +--------------------------+
 | [2022-02-01, 2022-04-15) |
 +--------------------------*/
SELECT RANGE_INTERSECT(
  RANGE<DATE> '[2022-02-01, UNBOUNDED)',
  RANGE<DATE> '[2021-06-15, 2022-04-15)') AS results;

/*--------------------------+
 | results                  |
 +--------------------------+
 | [2022-02-01, 2022-04-15) |
 +--------------------------*/
SELECT RANGE_INTERSECT(
  RANGE<DATE> '[2022-02-01, UNBOUNDED)',
  RANGE<DATE> '[2021-06-15, UNBOUNDED)') AS results;

/*-------------------------+
 | results                 |
 +-------------------------+
 | [2022-02-01, UNBOUNDED) |
 +-------------------------*/

RANGE_OVERLAPS

RANGE_OVERLAPS(range_a, range_b)

Description

Checks if two ranges overlap.

Definitions

  • range_a: The first RANGE<T> value.
  • range_b: The second RANGE<T> value.

Details

Returns TRUE if a part of range_a intersects with range_b, otherwise returns FALSE.

T must be of the same type for all inputs.

To get the part of the range that overlaps, use the RANGE_INTERSECT function.

Return type

BOOL

Examples

In the following query, the first and second ranges overlap between 2022-02-01 and 2022-04-15:

SELECT RANGE_OVERLAPS(
  RANGE<DATE> '[2022-02-01, 2022-09-01)',
  RANGE<DATE> '[2021-06-15, 2022-04-15)') AS results;

/*---------+
 | results |
 +---------+
 | TRUE    |
 +---------*/

In the following query, the first and second ranges don't overlap:

SELECT RANGE_OVERLAPS(
  RANGE<DATE> '[2020-02-01, 2020-09-01)',
  RANGE<DATE> '[2021-06-15, 2022-04-15)') AS results;

/*---------+
 | results |
 +---------+
 | FALSE   |
 +---------*/

In the following query, the first and second ranges overlap between 2022-02-01 and UNBOUNDED:

SELECT RANGE_OVERLAPS(
  RANGE<DATE> '[2022-02-01, UNBOUNDED)',
  RANGE<DATE> '[2021-06-15, UNBOUNDED)') AS results;

/*---------+
 | results |
 +---------+
 | TRUE    |
 +---------*/

RANGE_SESSIONIZE

RANGE_SESSIONIZE(
  TABLE table_name,
  range_column,
  partitioning_columns
)
RANGE_SESSIONIZE(
  TABLE table_name,
  range_column,
  partitioning_columns,
  sessionize_option
)

Description

Produces a table of sessionized ranges.

Definitions

  • table_name: A table expression that represents the name of the table to construct. This can represent any relation with range_column.
  • range_column: A STRING literal that indicates which RANGE column in a table contains the data to sessionize.
  • partitioning_columns: An ARRAY<STRING> literal that indicates which columns should partition the data before the data is sessionized.
  • sessionize_option: A STRING value that describes how order-adjacent ranges are sessionized. Your choices are as follows:

    • MEETS (default): Ranges that meet or overlap are sessionized.

    • OVERLAPS: Only a range that is overlapped by another range is sessionized.

    If this argument is not provided, MEETS is used by default.

Details

This function produces a table that includes all columns in the input table and an additional RANGE column called session_range, which indicates the start and end of a session. The start and end of each session is determined by the sessionize_option argument.

Return type

TABLE

Examples

The examples in this section reference the following table called my_sessionized_range_table in a dataset called mydataset:

INSERT mydataset.my_sessionized_range_table (emp_id, dept_id, duration)
VALUES(10, 1000, RANGE<DATE> '[2010-01-10, 2010-03-10)'),
      (10, 2000, RANGE<DATE> '[2010-03-10, 2010-07-15)'),
      (10, 2000, RANGE<DATE> '[2010-06-15, 2010-08-18)'),
      (20, 2000, RANGE<DATE> '[2010-03-10, 2010-07-20)'),
      (20, 1000, RANGE<DATE> '[2020-05-10, 2020-09-20)');

SELECT * FROM mydataset.my_sessionized_range_table ORDER BY emp_id;

/*--------+---------+--------------------------+
 | emp_id | dept_id | duration                 |
 +--------+---------+--------------------------+
 | 10     | 1000    | [2010-01-10, 2010-03-10) |
 | 10     | 2000    | [2010-03-10, 2010-07-15) |
 | 10     | 2000    | [2010-06-15, 2010-08-18) |
 | 20     | 2000    | [2010-03-10, 2010-07-20) |
 | 20     | 1000    | [2020-05-10, 2020-09-20) |
 +--------+---------+--------------------------*/

In the following query, a table of sessionized data is produced for my_sessionized_range_table, and only ranges that meet or overlap are sessionized:

SELECT
  emp_id, duration, session_range
FROM
  RANGE_SESSIONIZE(
    TABLE mydataset.my_sessionized_range_table,
    'duration',
    ['emp_id'])
ORDER BY emp_id;

/*--------+--------------------------+--------------------------+
 | emp_id | duration                 | session_range            |
 +--------+--------------------------+--------------------------+
 | 10     | [2010-01-10, 2010-03-10) | [2010-01-10, 2010-08-18) |
 | 10     | [2010-03-10, 2010-07-15) | [2010-01-10, 2010-08-18) |
 | 10     | [2010-06-15, 2010-08-18) | [2010-01-10, 2010-08-18) |
 | 20     | [2010-03-10, 2010-07-20) | [2010-03-10, 2010-07-20) |
 | 20     | [2020-05-10, 2020-09-20) | [2020-05-10, 2020-09-20) |
 +--------+-----------------------------------------------------*/

In the following query, a table of sessionized data is produced for my_sessionized_range_table, and only a range that is overlapped by another range is sessionized:

SELECT
  emp_id, duration, session_range
FROM
  RANGE_SESSIONIZE(
    TABLE mydataset.my_sessionized_range_table,
    'duration',
    ['emp_id'],
    'OVERLAPS')
ORDER BY emp_id;

/*--------+--------------------------+--------------------------+
 | emp_id | duration                 | session_range            |
 +--------+--------------------------+--------------------------+
 | 10     | [2010-03-10, 2010-07-15) | [2010-03-10, 2010-08-18) |
 | 10     | [2010-06-15, 2010-08-18) | [2010-03-10, 2010-08-18) |
 | 10     | [2010-01-10, 2010-03-10) | [2010-01-10, 2010-03-10) |
 | 20     | [2020-05-10, 2020-09-20) | [2020-05-10, 2020-09-20) |
 | 20     | [2010-03-10, 2010-07-20) | [2010-03-10, 2010-07-20) |
 +--------+-----------------------------------------------------*/

If you need to normalize sessionized data, you can use a query similar to the following:

SELECT emp_id, session_range AS normalized FROM (
  SELECT emp_id, session_range
  FROM RANGE_SESSIONIZE(
    TABLE mydataset.my_sessionized_range_table,
    'duration',
    ['emp_id'],
    'MEETS')
)
GROUP BY emp_id, normalized;

/*--------+--------------------------+
 | emp_id | normalized               |
 +--------+--------------------------+
 | 20     | [2010-03-10, 2010-07-20) |
 | 10     | [2010-01-10, 2010-08-18) |
 | 20     | [2020-05-10, 2020-09-20) |
 +--------+--------------------------*/

RANGE_START

RANGE_START(range_to_check)

Description

Gets the lower bound of a range.

Definitions

  • range_to_check: The RANGE<T> value.

Details

Returns NULL if the lower bound of range_value is UNBOUNDED.

Returns NULL if range_to_check is NULL.

Return type

T in range_value

Examples

In the following query, the lower bound of the range is retrieved:

SELECT RANGE_START(RANGE<DATE> '[2022-12-01, 2022-12-31)') AS results;

/*------------+
 | results    |
 +------------+
 | 2022-12-01 |
 +------------*/

In the following query, the lower bound of the range is unbounded, so NULL is returned:

SELECT RANGE_START(RANGE<DATE> '[UNBOUNDED, 2022-12-31)') AS results;

/*------------+
 | results    |
 +------------+
 | NULL       |
 +------------*/