GoogleSQL for BigQuery supports string functions.
These string functions work on two different values:
STRING
and BYTES
data types. STRING
values must be well-formed UTF-8.
Functions that return position values, such as STRPOS,
encode those positions as INT64
. The value 1
refers to the first character (or byte), 2
refers to the second, and so on.
The value 0
indicates an invalid position. When working on STRING
types, the
returned positions refer to character positions.
All string comparisons are done byte-by-byte, without regard to Unicode canonical equivalence.
Function list
Name | Summary |
---|---|
ASCII
|
Gets the ASCII code for the first character or byte in a STRING
or BYTES value.
|
BYTE_LENGTH
|
Gets the number of BYTES in a STRING or
BYTES value.
|
CHAR_LENGTH
|
Gets the number of characters in a STRING value.
|
CHARACTER_LENGTH
|
Synonym for CHAR_LENGTH .
|
CHR
|
Converts a Unicode code point to a character. |
CODE_POINTS_TO_BYTES
|
Converts an array of extended ASCII code points to a
BYTES value.
|
CODE_POINTS_TO_STRING
|
Converts an array of extended ASCII code points to a
STRING value.
|
COLLATE
|
Combines a STRING value and a collation specification into a
collation specification-supported STRING value.
|
CONCAT
|
Concatenates one or more STRING or BYTES
values into a single result.
|
CONTAINS_SUBSTR
|
Performs a normalized, case-insensitive search to see if a value exists as a substring in an expression. |
EDIT_DISTANCE
|
Computes the Levenshtein distance between two STRING
or BYTES values.
|
ENDS_WITH
|
Checks if a STRING or BYTES value is the suffix
of another value.
|
FORMAT
|
Formats data and produces the results as a STRING value.
|
FROM_BASE32
|
Converts a base32-encoded STRING value into a
BYTES value.
|
FROM_BASE64
|
Converts a base64-encoded STRING value into a
BYTES value.
|
FROM_HEX
|
Converts a hexadecimal-encoded STRING value into a
BYTES value.
|
INITCAP
|
Formats a STRING as proper case, which means that the first
character in each word is uppercase and all other characters are lowercase.
|
INSTR
|
Finds the position of a subvalue inside another value, optionally starting the search at a given offset or occurrence. |
LEFT
|
Gets the specified leftmost portion from a STRING or
BYTES value.
|
LENGTH
|
Gets the length of a STRING or BYTES value.
|
LOWER
|
Formats alphabetic characters in a STRING value as
lowercase.
Formats ASCII characters in a BYTES value as
lowercase.
|
LPAD
|
Prepends a STRING or BYTES value with a pattern.
|
LTRIM
|
Identical to the TRIM function, but only removes leading
characters.
|
NORMALIZE
|
Case-sensitively normalizes the characters in a STRING value.
|
NORMALIZE_AND_CASEFOLD
|
Case-insensitively normalizes the characters in a STRING value.
|
OCTET_LENGTH
|
Alias for BYTE_LENGTH .
|
REGEXP_CONTAINS
|
Checks if a value is a partial match for a regular expression. |
REGEXP_EXTRACT
|
Produces a substring that matches a regular expression. |
REGEXP_EXTRACT_ALL
|
Produces an array of all substrings that match a regular expression. |
REGEXP_INSTR
|
Finds the position of a regular expression match in a value, optionally starting the search at a given offset or occurrence. |
REGEXP_REPLACE
|
Produces a STRING value where all substrings that match a
regular expression are replaced with a specified value.
|
REGEXP_SUBSTR
|
Synonym for REGEXP_EXTRACT .
|
REPEAT
|
Produces a STRING or BYTES value that consists of
an original value, repeated.
|
REPLACE
|
Replaces all occurrences of a pattern with another pattern in a
STRING or BYTES value.
|
REVERSE
|
Reverses a STRING or BYTES value.
|
RIGHT
|
Gets the specified rightmost portion from a STRING or
BYTES value.
|
RPAD
|
Appends a STRING or BYTES value with a pattern.
|
RTRIM
|
Identical to the TRIM function, but only removes trailing
characters.
|
SAFE_CONVERT_BYTES_TO_STRING
|
Converts a BYTES value to a STRING value and
replace any invalid UTF-8 characters with the Unicode replacement character,
U+FFFD .
|
SOUNDEX
|
Gets the Soundex codes for words in a STRING value.
|
SPLIT
|
Splits a STRING or BYTES value, using a delimiter.
|
STARTS_WITH
|
Checks if a STRING or BYTES value is a
prefix of another value.
|
STRPOS
|
Finds the position of the first occurrence of a subvalue inside another value. |
SUBSTR
|
Gets a portion of a STRING or BYTES value.
|
SUBSTRING
|
Alias for SUBSTR |
TO_BASE32
|
Converts a BYTES value to a
base32-encoded STRING value.
|
TO_BASE64
|
Converts a BYTES value to a
base64-encoded STRING value.
|
TO_CODE_POINTS
|
Converts a STRING or BYTES value into an array of
extended ASCII code points.
|
TO_HEX
|
Converts a BYTES value to a
hexadecimal STRING value.
|
TRANSLATE
|
Within a value, replaces each source character with the corresponding target character. |
TRIM
|
Removes the specified leading and trailing Unicode code points or bytes
from a STRING or BYTES value.
|
UNICODE
|
Gets the Unicode code point for the first character in a value. |
UPPER
|
Formats alphabetic characters in a STRING value as
uppercase.
Formats ASCII characters in a BYTES value as
uppercase.
|
ASCII
ASCII(value)
Description
Returns the ASCII code for the first character or byte in value
. Returns
0
if value
is empty or the ASCII code is 0
for the first character
or byte.
Return type
INT64
Examples
SELECT ASCII('abcd') as A, ASCII('a') as B, ASCII('') as C, ASCII(NULL) as D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| 97 | 97 | 0 | NULL |
*-------+-------+-------+-------*/
BYTE_LENGTH
BYTE_LENGTH(value)
Description
Gets the number of BYTES
in a STRING
or BYTES
value,
regardless of whether the value is a STRING
or BYTES
type.
Return type
INT64
Examples
WITH example AS
(SELECT 'абвгд' AS characters, b'абвгд' AS bytes)
SELECT
characters,
BYTE_LENGTH(characters) AS string_example,
bytes,
BYTE_LENGTH(bytes) AS bytes_example
FROM example;
/*------------+----------------+-------+---------------*
| characters | string_example | bytes | bytes_example |
+------------+----------------+-------+---------------+
| абвгд | 10 | абвгд | 10 |
*------------+----------------+-------+---------------*/
CHAR_LENGTH
CHAR_LENGTH(value)
Description
Gets the number of characters in a STRING
value.
Return type
INT64
Examples
WITH example AS
(SELECT 'абвгд' AS characters)
SELECT
characters,
CHAR_LENGTH(characters) AS char_length_example
FROM example;
/*------------+---------------------*
| characters | char_length_example |
+------------+---------------------+
| абвгд | 5 |
*------------+---------------------*/
CHARACTER_LENGTH
CHARACTER_LENGTH(value)
Description
Synonym for CHAR_LENGTH.
Return type
INT64
Examples
WITH example AS
(SELECT 'абвгд' AS characters)
SELECT
characters,
CHARACTER_LENGTH(characters) AS char_length_example
FROM example;
/*------------+---------------------*
| characters | char_length_example |
+------------+---------------------+
| абвгд | 5 |
*------------+---------------------*/
CHR
CHR(value)
Description
Takes a Unicode code point and returns
the character that matches the code point. Each valid code point should fall
within the range of [0, 0xD7FF] and [0xE000, 0x10FFFF]. Returns an empty string
if the code point is 0
. If an invalid Unicode code point is specified, an
error is returned.
To work with an array of Unicode code points, see
CODE_POINTS_TO_STRING
Return type
STRING
Examples
SELECT CHR(65) AS A, CHR(255) AS B, CHR(513) AS C, CHR(1024) AS D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| A | ÿ | ȁ | Ѐ |
*-------+-------+-------+-------*/
SELECT CHR(97) AS A, CHR(0xF9B5) AS B, CHR(0) AS C, CHR(NULL) AS D;
/*-------+-------+-------+-------*
| A | B | C | D |
+-------+-------+-------+-------+
| a | 例 | | NULL |
*-------+-------+-------+-------*/
CODE_POINTS_TO_BYTES
CODE_POINTS_TO_BYTES(ascii_code_points)
Description
Takes an array of extended ASCII
code points
as ARRAY<INT64>
and returns BYTES
.
To convert from BYTES
to an array of code points, see
TO_CODE_POINTS.
Return type
BYTES
Examples
The following is a basic example using CODE_POINTS_TO_BYTES
.
SELECT CODE_POINTS_TO_BYTES([65, 98, 67, 100]) AS bytes;
-- Note that the result of CODE_POINTS_TO_BYTES is of type BYTES, displayed as a base64-encoded string.
-- In BYTES format, b'AbCd' is the result.
/*----------*
| bytes |
+----------+
| QWJDZA== |
*----------*/
The following example uses a rotate-by-13 places (ROT13) algorithm to encode a string.
SELECT CODE_POINTS_TO_BYTES(ARRAY_AGG(
(SELECT
CASE
WHEN chr BETWEEN b'a' and b'z'
THEN TO_CODE_POINTS(b'a')[offset(0)] +
MOD(code+13-TO_CODE_POINTS(b'a')[offset(0)],26)
WHEN chr BETWEEN b'A' and b'Z'
THEN TO_CODE_POINTS(b'A')[offset(0)] +
MOD(code+13-TO_CODE_POINTS(b'A')[offset(0)],26)
ELSE code
END
FROM
(SELECT code, CODE_POINTS_TO_BYTES([code]) chr)
) ORDER BY OFFSET)) AS encoded_string
FROM UNNEST(TO_CODE_POINTS(b'Test String!')) code WITH OFFSET;
-- Note that the result of CODE_POINTS_TO_BYTES is of type BYTES, displayed as a base64-encoded string.
-- In BYTES format, b'Grfg Fgevat!' is the result.
/*------------------*
| encoded_string |
+------------------+
| R3JmZyBGZ2V2YXQh |
*------------------*/
CODE_POINTS_TO_STRING
CODE_POINTS_TO_STRING(unicode_code_points)
Description
Takes an array of Unicode code points
as ARRAY<INT64>
and returns a STRING
.
To convert from a string to an array of code points, see TO_CODE_POINTS.
Return type
STRING
Examples
The following are basic examples using CODE_POINTS_TO_STRING
.
SELECT CODE_POINTS_TO_STRING([65, 255, 513, 1024]) AS string;
/*--------*
| string |
+--------+
| AÿȁЀ |
*--------*/
SELECT CODE_POINTS_TO_STRING([97, 0, 0xF9B5]) AS string;
/*--------*
| string |
+--------+
| a例 |
*--------*/
SELECT CODE_POINTS_TO_STRING([65, 255, NULL, 1024]) AS string;
/*--------*
| string |
+--------+
| NULL |
*--------*/
The following example computes the frequency of letters in a set of words.
WITH Words AS (
SELECT word
FROM UNNEST(['foo', 'bar', 'baz', 'giraffe', 'llama']) AS word
)
SELECT
CODE_POINTS_TO_STRING([code_point]) AS letter,
COUNT(*) AS letter_count
FROM Words,
UNNEST(TO_CODE_POINTS(word)) AS code_point
GROUP BY 1
ORDER BY 2 DESC;
/*--------+--------------*
| letter | letter_count |
+--------+--------------+
| a | 5 |
| f | 3 |
| r | 2 |
| b | 2 |
| l | 2 |
| o | 2 |
| g | 1 |
| z | 1 |
| e | 1 |
| m | 1 |
| i | 1 |
*--------+--------------*/
COLLATE
COLLATE(value, collate_specification)
Takes a STRING
and a collation specification. Returns
a STRING
with a collation specification. If collate_specification
is empty,
returns a value with collation removed from the STRING
.
The collation specification defines how the resulting STRING
can be compared
and sorted. To learn more, see
Working with collation.
collation_specification
must be a string literal, otherwise an error is thrown.- Returns
NULL
ifvalue
isNULL
.
Return type
STRING
Examples
In this example, the weight of a
is less than the