GoogleSQL for BigQuery supports string functions.
These string functions work on two different values:
STRING
and BYTES
data types. STRING
values must be well-formed UTF-8.
Functions that return position values, such as STRPOS,
encode those positions as INT64
. The value 1
refers to the first character (or byte), 2
refers to the second, and so on.
The value 0
indicates an invalid position. When working on STRING
types, the
returned positions refer to character positions.
All string comparisons are done byte-by-byte, without regard to Unicode canonical equivalence.
ASCII
ASCII(value)
Description
Returns the ASCII code for the first character or byte in value
. Returns
0
if value
is empty or the ASCII code is 0
for the first character
or byte.
Return type
INT64
Examples
SELECT ASCII('abcd') as A, ASCII('a') as B, ASCII('') as C, ASCII(NULL) as D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| 97 | 97 | 0 | NULL |
*-------+-------+-------+-------*/
BYTE_LENGTH
BYTE_LENGTH(value)
Description
Returns the length of the STRING
or BYTES
value in BYTES
,
regardless of whether the type of the value is STRING
or BYTES
.
Return type
INT64
Examples
WITH example AS
(SELECT 'абвгд' AS characters, b'абвгд' AS bytes)
SELECT
characters,
BYTE_LENGTH(characters) AS string_example,
bytes,
BYTE_LENGTH(bytes) AS bytes_example
FROM example;
/*------------+----------------+-------+---------------*
| characters | string_example | bytes | bytes_example |
+------------+----------------+-------+---------------+
| абвгд | 10 | абвгд | 10 |
*------------+----------------+-------+---------------*/
CHAR_LENGTH
CHAR_LENGTH(value)
Description
Returns the length of the STRING
in characters.
Return type
INT64
Examples
WITH example AS
(SELECT 'абвгд' AS characters)
SELECT
characters,
CHAR_LENGTH(characters) AS char_length_example
FROM example;
/*------------+---------------------*
| characters | char_length_example |
+------------+---------------------+
| абвгд | 5 |
*------------+---------------------*/
CHARACTER_LENGTH
CHARACTER_LENGTH(value)
Description
Synonym for CHAR_LENGTH.
Return type
INT64
Examples
WITH example AS
(SELECT 'абвгд' AS characters)
SELECT
characters,
CHARACTER_LENGTH(characters) AS char_length_example
FROM example;
/*------------+---------------------*
| characters | char_length_example |
+------------+---------------------+
| абвгд | 5 |
*------------+---------------------*/
CHR
CHR(value)
Description
Takes a Unicode code point and returns
the character that matches the code point. Each valid code point should fall
within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. Returns an empty string
if the code point is 0
. If an invalid Unicode code point is specified, an
error is returned.
To work with an array of Unicode code points, see
CODE_POINTS_TO_STRING
Return type
STRING
Examples
SELECT CHR(65) AS A, CHR(255) AS B, CHR(513) AS C, CHR(1024) AS D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| A | ÿ | ȁ | Ѐ |
*-------+-------+-------+-------*/
SELECT CHR(97) AS A, CHR(0xF9B5) AS B, CHR(0) AS C, CHR(NULL) AS D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| a | 例 | | NULL |
*-------+-------+-------+-------*/
CODE_POINTS_TO_BYTES
CODE_POINTS_TO_BYTES(ascii_code_points)
Description
Takes an array of extended ASCII
code points
as ARRAY<INT64>
and returns BYTES
.
To convert from BYTES
to an array of code points, see
TO_CODE_POINTS.
Return type
BYTES
Examples
The following is a basic example using CODE_POINTS_TO_BYTES
.
SELECT CODE_POINTS_TO_BYTES([65, 98, 67, 100]) AS bytes;
-- Note that the result of CODE_POINTS_TO_BYTES is of type BYTES, displayed as a base64-encoded string.
-- In BYTES format, b'AbCd' is the result.
/*----------*
| bytes |
+----------+
| QWJDZA== |
*----------*/
The following example uses a rotate-by-13 places (ROT13) algorithm to encode a string.
SELECT CODE_POINTS_TO_BYTES(ARRAY_AGG(
(SELECT
CASE
WHEN chr BETWEEN b'a' and b'z'
THEN TO_CODE_POINTS(b'a')[offset(0)] +
MOD(code+13-TO_CODE_POINTS(b'a')[offset(0)],26)
WHEN chr BETWEEN b'A' and b'Z'
THEN TO_CODE_POINTS(b'A')[offset(0)] +
MOD(code+13-TO_CODE_POINTS(b'A')[offset(0)],26)
ELSE code
END
FROM
(SELECT code, CODE_POINTS_TO_BYTES([code]) chr)
) ORDER BY OFFSET)) AS encoded_string
FROM UNNEST(TO_CODE_POINTS(b'Test String!')) code WITH OFFSET;
-- Note that the result of CODE_POINTS_TO_BYTES is of type BYTES, displayed as a base64-encoded string.
-- In BYTES format, b'Grfg Fgevat!' is the result.
/*------------------*
| encoded_string |
+------------------+
| R3JmZyBGZ2V2YXQh |
*------------------*/
CODE_POINTS_TO_STRING
CODE_POINTS_TO_STRING(unicode_code_points)
Description
Takes an array of Unicode code points
as ARRAY<INT64>
and returns a STRING
.
To convert from a string to an array of code points, see TO_CODE_POINTS.
Return type
STRING
Examples
The following are basic examples using CODE_POINTS_TO_STRING
.
SELECT CODE_POINTS_TO_STRING([65, 255, 513, 1024]) AS string;
/*--------*
| string |
+--------+
| AÿȁЀ |
*--------*/
SELECT CODE_POINTS_TO_STRING([97, 0, 0xF9B5]) AS string;
/*--------*
| string |
+--------+
| a例 |
*--------*/
SELECT CODE_POINTS_TO_STRING([65, 255, NULL, 1024]) AS string;
/*--------*
| string |
+--------+
| NULL |
*--------*/
The following example computes the frequency of letters in a set of words.
WITH Words AS (
SELECT word
FROM UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) AS word
)
SELECT
CODE_POINTS_TO_STRING([code_point]) AS letter,
COUNT(*) AS letter_count
FROM Words,
UNNEST(TO_CODE_POINTS(word)) AS code_point
GROUP BY 1
ORDER BY 2 DESC;
/*--------+--------------*
| letter | letter_count |
+--------+--------------+
| a | 5 |
| f | 3 |
| r | 2 |
| b | 2 |
| l | 2 |
| o | 2 |
| g | 1 |
| z | 1 |
| e | 1 |
| m | 1 |
| i | 1 |
*--------+--------------*/
COLLATE
COLLATE(value, collate_specification)
Takes a STRING
and a collation specification. Returns
a STRING
with a collation specification. If collate_specification
is empty,
returns a value with collation removed from the STRING
.
The collation specification defines how the resulting STRING
can be compared
and sorted. To learn more, see
Working with collation.
collation_specification
must be a string literal, otherwise an error is thrown.- Returns
NULL
ifvalue
isNULL
.
Return type
STRING
Examples
In this example, the weight of a
is less than the weight of Z
. This
is because the collate specification, und:ci
assigns more weight to Z
.
WITH Words AS (
SELECT
COLLATE('a', 'und:ci') AS char1,
COLLATE('Z', 'und:ci') AS char2
)
SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z
FROM Words;
/*----------------*
| a_less_than_Z |
+----------------+
| TRUE |
*----------------*/
In this example, the weight of a
is greater than the weight of Z
. This
is because the default collate specification assigns more weight to a
.
WITH Words AS (
SELECT
'a' AS char1,
'Z' AS char2
)
SELECT ( Words.char1 < Words.char2 ) AS a_less_than_Z
FROM Words;
/*----------------*
| a_less_than_Z |
+----------------+
| FALSE |
*----------------*/
CONCAT
CONCAT(value1[, ...])
Description
Concatenates one or more values into a single result. All values must be
BYTES
or data types that can be cast to STRING
.
The function returns NULL
if any input argument is NULL
.
Return type
STRING
or BYTES
Examples
SELECT CONCAT('T.P.', ' ', 'Bar') as author;
/*---------------------*
| author |
+---------------------+
| T.P. Bar |
*---------------------*/
SELECT CONCAT('Summer', ' ', 1923) as release_date;
/*---------------------*
| release_date |
+---------------------+
| Summer 1923 |
*---------------------*/
With Employees AS
(SELECT
'John' AS first_name,
'Doe' AS last_name
UNION ALL
SELECT
'Jane' AS first_name,
'Smith' AS last_name
UNION ALL
SELECT
'Joe' AS first_name,
'Jackson' AS last_name)
SELECT
CONCAT(first_name, ' ', last_name)
AS full_name
FROM Employees;
/*---------------------*
| full_name |
+---------------------+
| John Doe |
| Jane Smith |
| Joe Jackson |
*---------------------*/
CONTAINS_SUBSTR
CONTAINS_SUBSTR(expression, search_value_literal[, json_scope=>json_scope_value])
json_scope_value:
{ 'JSON_VALUES' | 'JSON_KEYS' | 'JSON_KEYS_AND_VALUES' }
Description
Performs a normalized, case-insensitive search to see if a value exists as a
substring in an expression. Returns TRUE
if the value exists, otherwise
returns FALSE
.
Before values are compared, they are normalized and case folded with NFKC
normalization. Wildcard searches are not
supported.
Arguments
search_value_literal
: The value to search for. It must be aSTRING
literal.expression
: The data to search over. The expression can be a column or table reference. A table reference is evaluated as aSTRUCT
whose fields are the columns of the table. A column reference is evaluated as one the following data types:STRING
INT64
BOOL
NUMERIC
BIGNUMERIC
TIMESTAMP
TIME
DATE
DATETIME
ARRAY
STRUCT
When the expression is evaluated, the result is cast to a
STRING