String functions

GoogleSQL for BigQuery supports string functions. These string functions work on two different values: STRING and BYTES data types. STRING values must be well-formed UTF-8.

Functions that return position values, such as STRPOS, encode those positions as INT64. The value 1 refers to the first character (or byte), 2 refers to the second, and so on. The value 0 indicates an invalid position. When working on STRING types, the returned positions refer to character positions.

All string comparisons are done byte-by-byte, without regard to Unicode canonical equivalence.

ASCII

ASCII(value)

Description

Returns the ASCII code for the first character or byte in value. Returns 0 if value is empty or the ASCII code is 0 for the first character or byte.

Return type

INT64

Examples

SELECT ASCII('abcd') as A, ASCII('a') as B, ASCII('') as C, ASCII(NULL) as D;

/*-------+-------+-------+-------*
 | A     | B     | C     | D     |
 +-------+-------+-------+-------+
 | 97    | 97    | 0     | NULL  |
 *-------+-------+-------+-------*/

BYTE_LENGTH

BYTE_LENGTH(value)

Description

Returns the length of the STRING or BYTES value in BYTES, regardless of whether the type of the value is STRING or BYTES.

Return type

INT64

Examples

WITH example AS
  (SELECT 'абвгд' AS characters, b'абвгд' AS bytes)

SELECT
  characters,
  BYTE_LENGTH(characters) AS string_example,
  bytes,
  BYTE_LENGTH(bytes) AS bytes_example
FROM example;

/*------------+----------------+-------+---------------*
 | characters | string_example | bytes | bytes_example |
 +------------+----------------+-------+---------------+
 | абвгд      | 10             | абвгд | 10            |
 *------------+----------------+-------+---------------*/

CHAR_LENGTH

CHAR_LENGTH(value)

Description

Returns the length of the STRING in characters.

Return type

INT64

Examples

WITH example AS
  (SELECT 'абвгд' AS characters)

SELECT
  characters,
  CHAR_LENGTH(characters) AS char_length_example
FROM example;

/*------------+---------------------*
 | characters | char_length_example |
 +------------+---------------------+
 | абвгд      |                   5 |
 *------------+---------------------*/

CHARACTER_LENGTH

CHARACTER_LENGTH(value)

Description

Synonym for CHAR_LENGTH.

Return type

INT64

Examples

WITH example AS
  (SELECT 'абвгд' AS characters)

SELECT
  characters,
  CHARACTER_LENGTH(characters) AS char_length_example
FROM example;

/*------------+---------------------*
 | characters | char_length_example |
 +------------+---------------------+
 | абвгд      |                   5 |
 *------------+---------------------*/

CHR

CHR(value)

Description

Takes a Unicode code point and returns the character that matches the code point. Each valid code point should fall within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. Returns an empty string if the code point is 0. If an invalid Unicode code point is specified, an error is returned.

To work with an array of Unicode code points, see CODE_POINTS_TO_STRING

Return type

STRING

Examples

SELECT CHR(65) AS A, CHR(255) AS B, CHR(513) AS C, CHR(1024)  AS D;

/*-------+-------+-------+-------*
 | A     | B     | C     | D     |
 +-------+-------+-------+-------+
 | A     | ÿ     | ȁ     | Ѐ     |
 *-------+-------+-------+-------*/
SELECT CHR(97) AS A, CHR(0xF9B5) AS B, CHR(0) AS C, CHR(NULL) AS D;

/*-------+-------+-------+-------*
 | A     | B     | C     | D     |
 +-------+-------+-------+-------+
 | a     | 例    |       | NULL  |
 *-------+-------+-------+-------*/

CODE_POINTS_TO_BYTES

CODE_POINTS_TO_BYTES(ascii_code_points)

Description

Takes an array of extended ASCII code points as ARRAY<INT64> and returns BYTES.

To convert from BYTES to an array of code points, see TO_CODE_POINTS.

Return type

BYTES

Examples

The following is a basic example using CODE_POINTS_TO_BYTES.

SELECT CODE_POINTS_TO_BYTES([65, 98, 67, 100]) AS bytes;

-- Note that the result of CODE_POINTS_TO_BYTES is of type BYTES, displayed as a base64-encoded string.
-- In BYTES format, b'AbCd' is the result.
/*----------*
 | bytes    |
 +----------+
 | QWJDZA== |
 *----------*/

The following example uses a rotate-by-13 places (ROT13) algorithm to encode a string.

SELECT CODE_POINTS_TO_BYTES(ARRAY_AGG(
  (SELECT
      CASE
        WHEN chr BETWEEN b'a' and b'z'
          THEN TO_CODE_POINTS(b'a')[offset(0)] +
            MOD(code+13-TO_CODE_POINTS(b'a')[offset(0)],26)
        WHEN chr BETWEEN b'A' and b'Z'
          THEN TO_CODE_POINTS(b'A')[offset(0)] +
            MOD(code+13-TO_CODE_POINTS(b'A')[offset(0)],26)
        ELSE code
      END
   FROM
     (SELECT code, CODE_POINTS_TO_BYTES([code]) chr)
  ) ORDER BY OFFSET)) AS encoded_string
FROM UNNEST(TO_CODE_POINTS(b'Test String!')) code WITH OFFSET;

-- Note that the result of CODE_POINTS_TO_BYTES is of type BYTES, displayed as a base64-encoded string.
-- In BYTES format, b'Grfg Fgevat!' is the result.
/*------------------*
 | encoded_string   |
 +------------------+
 | R3JmZyBGZ2V2YXQh |
 *------------------*/

CODE_POINTS_TO_STRING

CODE_POINTS_TO_STRING(unicode_code_points)

Description

Takes an array of Unicode code points as ARRAY<INT64> and returns a STRING.

To convert from a string to an array of code points, see TO_CODE_POINTS.

Return type

STRING

Examples

The following are basic examples using CODE_POINTS_TO_STRING.

SELECT CODE_POINTS_TO_STRING([65, 255, 513, 1024]) AS string;

/*--------*
 | string |
 +--------+
 | AÿȁЀ   |
 *--------*/
SELECT CODE_POINTS_TO_STRING([97, 0, 0xF9B5]) AS string;

/*--------*
 | string |
 +--------+
 | a例    |
 *--------*/
SELECT CODE_POINTS_TO_STRING([65, 255, NULL, 1024]) AS string;

/*--------*
 | string |
 +--------+
 | NULL   |
 *--------*/

The following example computes the frequency of letters in a set of words.

WITH Words AS (
  SELECT word
  FROM UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) AS word
)
SELECT
  CODE_POINTS_TO_STRING([code_point]) AS letter,
  COUNT(*) AS letter_count
FROM Words,
  UNNEST(TO_CODE_POINTS(word)) AS code_point
GROUP BY 1
ORDER BY 2 DESC;

/*--------+--------------*
 | letter | letter_count |
 +--------+--------------+
 | a      | 5            |
 | f      | 3            |
 | r      | 2            |
 | b      | 2            |
 | l      | 2            |
 | o      | 2            |
 | g      | 1            |
 | z      | 1            |
 | e      | 1            |
 | m      | 1            |
 | i      | 1            |
 *--------+--------------*/

COLLATE

COLLATE(value, collate_specification)

Takes a STRING and a collation specification. Returns a STRING with a collation specification. If collate_specification is empty, returns a value with collation removed from the STRING.

The collation specification defines how the resulting STRING can be compared and sorted. To learn more, see Working with collation.

  • collation_specification must be a string literal, otherwise an error is thrown.
  • Returns NULL if value is NULL.

Return type

STRING

Examples

In this example, the weight of a is less than the weight of Z. This is because the collate specification, und:ci assigns more weight to Z.

WITH Words AS (
  SELECT
    COLLATE('a', 'und:ci') AS char1,
    COLLATE('Z', 'und:ci') AS char2
)
SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z
FROM Words;

/*----------------*
 | a_less_than_Z  |
 +----------------+
 | TRUE           |
 *----------------*/

In this example, the weight of a is greater than the weight of Z. This is because the default collate specification assigns more weight to a.

WITH Words AS (
  SELECT
    'a' AS char1,
    'Z' AS char2
)
SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z
FROM Words;

/*----------------*
 | a_less_than_Z  |
 +----------------+
 | FALSE          |
 *----------------*/

CONCAT

CONCAT(value1[, ...])

Description

Concatenates one or more values into a single result. All values must be BYTES or data types that can be cast to STRING.

The function returns NULL if any input argument is NULL.

Return type

STRING or BYTES

Examples

SELECT CONCAT('T.P.', ' ', 'Bar') as author;

/*---------------------*
 | author              |
 +---------------------+
 | T.P. Bar            |
 *---------------------*/
SELECT CONCAT('Summer', ' ', 1923) as release_date;

/*---------------------*
 | release_date        |
 +---------------------+
 | Summer 1923         |
 *---------------------*/

With Employees AS
  (SELECT
    'John' AS first_name,
    'Doe' AS last_name
  UNION ALL
  SELECT
    'Jane' AS first_name,
    'Smith' AS last_name
  UNION ALL
  SELECT
    'Joe' AS first_name,
    'Jackson' AS last_name)

SELECT
  CONCAT(first_name, ' ', last_name)
  AS full_name
FROM Employees;

/*---------------------*
 | full_name           |
 +---------------------+
 | John Doe            |
 | Jane Smith          |
 | Joe Jackson         |
 *---------------------*/

CONTAINS_SUBSTR

CONTAINS_SUBSTR(expression, search_value_literal[, json_scope=>json_scope_value])

json_scope_value:
  { 'JSON_VALUES' | 'JSON_KEYS' | 'JSON_KEYS_AND_VALUES' }

Description

Performs a normalized, case-insensitive search to see if a value exists as a substring in an expression. Returns TRUE if the value exists, otherwise returns FALSE.

Before values are compared, they are normalized and case folded with NFKC normalization. Wildcard searches are not supported.

Arguments

  • search_value_literal: The value to search for. It must be a STRING literal.
  • expression: The data to search over. The expression can be a column or table reference. A table reference is evaluated as a STRUCT whose fields are the columns of the table. A column reference is evaluated as one the following data types:

    • STRING
    • INT64
    • BOOL
    • NUMERIC
    • BIGNUMERIC
    • TIMESTAMP
    • TIME
    • DATE
    • DATETIME
    • ARRAY
    • STRUCT

    When the expression is evaluated, the result is cast to a STRING</