在 BigQuery 中,数组是由零个或零个以上相同数据类型的值组成的有序列表。您可以构建简单数据类型(如 INT64
)的数组和复杂数据类型(如 STRUCT
)的数组。目前,ARRAY
数据类型除外:不支持数组的数组。数组可以包含 NULL
值。
借助 BigQuery,您可以构建数组字面量、使用 ARRAY
函数基于子查询构建数组,还可以使用 ARRAY_AGG
函数将值聚合到数组中。
您可以使用 ARRAY_CONCAT()
等函数合并数组,也可以使用 ARRAY_TO_STRING()
将数组转换为字符串。
构建数组
使用数组字面量
您可以使用英文方括号([
和 ]
)在 BigQuery 中构建数组字面量。数组中的每个元素都用英文逗号分隔。
SELECT [1, 2, 3] as numbers;
SELECT ["apple", "pear", "orange"] as fruit;
SELECT [true, false, true] as booleans;
您也可以基于任何具有兼容类型的表达式创建数组。例如:
SELECT [a, b, c]
FROM
(SELECT 5 AS a,
37 AS b,
406 AS c);
SELECT [a, b, c]
FROM
(SELECT CAST(5 AS INT64) AS a,
CAST(37 AS FLOAT64) AS b,
406 AS c);
注意,第二个示例包含三个表达式:第一个返回 INT64
,第二个返回 FLOAT64
,第三个声明一个字面量。此表达式之所以有效,是因为所有这三个表达式都具有 FLOAT64
这一超类型。
如需为数组声明特定数据类型,请使用英文尖括号(<
和 >
)。例如:
SELECT ARRAY<FLOAT64>[1, 2, 3] as floats;
大多数数据类型(例如 INT64
或 STRING
)的数组不需要事先进行声明。
SELECT [1, 2, 3] as numbers;
您可以使用 ARRAY<type>[]
编写一个特定类型的空数组。此外,您还可以使用 []
编写一个无类型的空数组,在这种情况下,BigQuery 会尝试通过上下文环境推断数组类型。如果 BigQuery 无法推断出类型,则会使用默认类型 ARRAY<INT64>
。
使用生成的值
您还可以使用生成的值构造 ARRAY
。
生成整数数组
GENERATE_ARRAY
基于起始值和结束值以及步进值生成值数组。例如,以下查询用于生成一个包含从 11 到 33(含边界值)的所有奇整数的数组:
SELECT GENERATE_ARRAY(11, 33, 2) AS odds;
+--------------------------------------------------+
| odds |
+--------------------------------------------------+
| [11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33] |
+--------------------------------------------------+
您还可以通过提供负步进值按照降序生成值数组:
SELECT GENERATE_ARRAY(21, 14, -1) AS countdown;
+----------------------------------+
| countdown |
+----------------------------------+
| [21, 20, 19, 18, 17, 16, 15, 14] |
+----------------------------------+
生成日期数组
GENERATE_DATE_ARRAY
基于起始和结束 DATE
以及步进 INTERVAL
生成 DATE
数组。
您可以使用 GENERATE_DATE_ARRAY
生成一组 DATE
值。例如,以下查询以 1 WEEK
的间隔返回当前 DATE
和之后的若干 DATE
,直到以后的某一 DATE
(包含该日期):
SELECT
GENERATE_DATE_ARRAY('2017-11-21', '2017-12-31', INTERVAL 1 WEEK)
AS date_array;
+--------------------------------------------------------------------------+
| date_array |
+--------------------------------------------------------------------------+
| [2017-11-21, 2017-11-28, 2017-12-05, 2017-12-12, 2017-12-19, 2017-12-26] |
+--------------------------------------------------------------------------+
访问数组元素
请参考下表 sequences
:
+---------------------+
| some_numbers |
+---------------------+
| [0, 1, 1, 2, 3, 5] |
| [2, 4, 8, 16, 32] |
| [5, 10] |
+---------------------+
该表包含 ARRAY
数据类型的列 some_numbers
。如需访问此列中数组的元素,您必须指定要使用的索引类型:OFFSET
(从 0 开始的索引)或 ORDINAL
(从 1 开始的索引)。
WITH sequences AS
(SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT [5, 10] AS some_numbers)
SELECT some_numbers,
some_numbers[OFFSET(1)] AS offset_1,
some_numbers[ORDINAL(1)] AS ordinal_1
FROM sequences;
+--------------------+----------+-----------+
| some_numbers | offset_1 | ordinal_1 |
+--------------------+----------+-----------+
| [0, 1, 1, 2, 3, 5] | 1 | 0 |
| [2, 4, 8, 16, 32] | 4 | 2 |
| [5, 10] | 10 | 5 |
+--------------------+----------+-----------+
查找长度
ARRAY_LENGTH()
函数用于返回数组的长度。
WITH sequences AS
(SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT [5, 10] AS some_numbers)
SELECT some_numbers,
ARRAY_LENGTH(some_numbers) AS len
FROM sequences;
+--------------------+--------+
| some_numbers | len |
+--------------------+--------+
| [0, 1, 1, 2, 3, 5] | 6 |
| [2, 4, 8, 16, 32] | 5 |
| [5, 10] | 2 |
+--------------------+--------+
将数组展平为表
如需将 ARRAY
转换为一组行(即执行展平操作),请使用 UNNEST
运算符。UNNEST
获取一个 ARRAY
,然后返回一个表,ARRAY
中的每个元素均占该表的一行。
由于 UNNEST
破坏了 ARRAY
元素的顺序,您可能需要恢复表中的顺序。为此,请使用可选的 WITH OFFSET
子句返回另一个包含各数组元素偏移量的列,然后使用 ORDER BY
子句按偏移量对行进行排序。
示例
SELECT *
FROM UNNEST(['foo', 'bar', 'baz', 'qux', 'corge', 'garply', 'waldo', 'fred'])
AS element
WITH OFFSET AS offset
ORDER BY offset;
+----------+--------+
| element | offset |
+----------+--------+
| foo | 0 |
| bar | 1 |
| baz | 2 |
| qux | 3 |
| corge | 4 |
| garply | 5 |
| waldo | 6 |
| fred | 7 |
+----------+--------+
如需展平一整列 ARRAY
,同时保留每行中其他列的值,请使用 CROSS JOIN
将带有 ARRAY
列的表联接到该 ARRAY
列的 UNNEST
输出。
这是一种相互关联的交叉联接:UNNEST
运算符引用了源表中每一行的 ARRAY
列,该列之前曾出现在 FROM
子句中。对于源表中的每一行 N
,UNNEST
将行 N
的 ARRAY
展平成一组包含 ARRAY
元素的行,然后 CROSS JOIN
将这组新行与源表的单行 N
联接起来。
示例
下述示例使用 UNNEST
为数组列中的每个元素返回一行。使用 CROSS JOIN
后,id
列显示 sequences
中包含每个数字的行的 id
值。
WITH sequences AS
(SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers)
SELECT id, flattened_numbers
FROM sequences
CROSS JOIN UNNEST(sequences.some_numbers) AS flattened_numbers;
+------+-------------------+
| id | flattened_numbers |
+------+-------------------+
| 1 | 0 |
| 1 | 1 |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 5 |
| 2 | 2 |
| 2 | 4 |
| 2 | 8 |
| 2 | 16 |
| 2 | 32 |
| 3 | 5 |
| 3 | 10 |
+------+-------------------+
请注意,对于相关交叉联接,UNNEST
运算符为可选运算符,CROSS JOIN
可表示为逗号联接。上述示例使用此简写表示法后变为:
WITH sequences AS
(SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers)
SELECT id, flattened_numbers
FROM sequences, sequences.some_numbers AS flattened_numbers;
+------+-------------------+
| id | flattened_numbers |
+------+-------------------+
| 1 | 0 |
| 1 | 1 |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 5 |
| 2 | 2 |
| 2 | 4 |
| 2 | 8 |
| 2 | 16 |
| 2 | 32 |
| 3 | 5 |
| 3 | 10 |
+------+-------------------+
查询嵌套数组
如果一个表包含 STRUCT
的 ARRAY
,则可展平 ARRAY
以查询 STRUCT
的字段。您也可以展平 STRUCT
值的 ARRAY
类型字段。
查询 ARRAY 中的 STRUCT 元素
以下示例将 UNNEST
与 CROSS JOIN
结合使用,以展平 STRUCT
的 ARRAY
。
WITH races AS (
SELECT "800M" AS race,
[STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits),
STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as splits),
STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as splits),
STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as splits),
STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as splits),
STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as splits),
STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as splits),
STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as splits)]
AS participants)
SELECT
race,
participant
FROM races r
CROSS JOIN UNNEST(r.participants) as participant;
+------+---------------------------------------+
| race | participant |
+------+---------------------------------------+
| 800M | {Rudisha, [23.4, 26.3, 26.4, 26.1]} |
| 800M | {Makhloufi, [24.5, 25.4, 26.6, 26.1]} |
| 800M | {Murphy, [23.9, 26, 27, 26]} |
| 800M | {Bosse, [23.6, 26.2, 26.5, 27.1]} |
| 800M | {Rotich, [24.7, 25.6, 26.9, 26.4]} |
| 800M | {Lewandowski, [25, 25.7, 26.3, 27.2]} |
| 800M | {Kipketer, [23.2, 26.1, 27.3, 29.4]} |
| 800M | {Berian, [23.7, 26.1, 27, 29.3]} |
+------+---------------------------------------+
您可以从重复字段中找到特定信息。例如,以下查询用于返回 800 米赛跑中跑得最快的参赛选手。
此示例并不涉及数组展平操作,但展示了从重复字段中获取信息的一种常用方法。
示例
WITH races AS (
SELECT "800M" AS race,
[STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits),
STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as splits),
STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as splits),
STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as splits),
STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as splits),
STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as splits),
STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as splits),
STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as splits)]
AS participants)
SELECT
race,
(SELECT name
FROM UNNEST(participants)
ORDER BY (
SELECT SUM(duration)
FROM UNNEST(splits) AS duration) ASC
LIMIT 1) AS fastest_racer
FROM races;
+------+---------------+
| race | fastest_racer |
+------+---------------+
| 800M | Rudisha |
+------+---------------+
查询 STRUCT 中的 ARRAY 类型字段
您也可以从嵌套的重复字段中获取信息。例如,以下语句用于返回在 800 米赛跑中圈速最快的参赛选手。
WITH races AS (
SELECT "800M" AS race,
[STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits),
STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as splits),
STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as splits),
STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as splits),
STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as splits),
STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as splits),
STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as splits),
STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as splits)]
AS participants)
SELECT
race,
(SELECT name
FROM UNNEST(participants),
UNNEST(splits) AS duration
ORDER BY duration ASC LIMIT 1) AS runner_with_fastest_lap
FROM races;
+------+-------------------------+
| race | runner_with_fastest_lap |
+------+-------------------------+
| 800M | Kipketer |
+------+-------------------------+
请注意,前面的查询使用英文逗号运算符 (,
) 来执行隐式 CROSS JOIN
。这一做法的效果与以下示例相同,后者使用了显式 CROSS JOIN
。
WITH races AS (
SELECT "800M" AS race,
[STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits),
STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as splits),
STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as splits),
STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as splits),
STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as splits),
STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as splits),
STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as splits),
STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as splits)]
AS participants)
SELECT
race,
(SELECT name
FROM UNNEST(participants)
CROSS JOIN UNNEST(splits) AS duration
ORDER BY duration ASC LIMIT 1) AS runner_with_fastest_lap
FROM races;
+------+-------------------------+
| race | runner_with_fastest_lap |
+------+-------------------------+
| 800M | Kipketer |
+------+-------------------------+
使用 CROSS JOIN
展平数组会排除具有空数组或 NULL 数组的行。如果要包含这些行,请使用 LEFT JOIN
。
WITH races AS (
SELECT "800M" AS race,
[STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as splits),
STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as splits),
STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as splits),
STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as splits),
STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as splits),
STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as splits),
STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as splits),
STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as splits),
STRUCT("Nathan" as name, ARRAY<FLOAT64>[] as splits),
STRUCT("David" as name, NULL as splits)]
AS participants)
SELECT
name, sum(duration) AS finish_time
FROM races, races.participants LEFT JOIN participants.splits duration
GROUP BY name;
+-------------+--------------------+
| name | finish_time |
+-------------+--------------------+
| Murphy | 102.9 |
| Rudisha | 102.19999999999999 |
| David | NULL |
| Rotich | 103.6 |
| Makhloufi | 102.6 |
| Berian | 106.1 |
| Bosse | 103.4 |
| Kipketer | 106 |
| Nathan | NULL |
| Lewandowski | 104.2 |
+-------------+--------------------+
基于子查询创建数组
使用数组时,一个常见的任务是将子查询结果转换为数组。在 BigQuery 中,您可以使用 ARRAY()
函数来实现此目的。
例如,对 sequences
表执行以下操作:
WITH sequences AS
(SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT [5, 10] AS some_numbers)
SELECT some_numbers,
ARRAY(SELECT x * 2
FROM UNNEST(some_numbers) AS x) AS doubled
FROM sequences;
+--------------------+---------------------+
| some_numbers | doubled |
+--------------------+---------------------+
| [0, 1, 1, 2, 3, 5] | [0, 2, 2, 4, 6, 10] |
| [2, 4, 8, 16, 32] | [4, 8, 16, 32, 64] |
| [5, 10] | [10, 20] |
+--------------------+---------------------+
此示例以一个名为 sequences 的表开始。该表包含一个类型为 ARRAY<INT64>
的列 some_numbers
。
该查询本身包含一个子查询。该子查询会选择 some_numbers
列中的每一行,并使用 UNNEST
将数组返回为一组行。接下来,它将每个值乘以 2,然后使用 ARRAY()
运算符将这些行重新组合到一个数组中。
过滤数组
以下示例在 ARRAY()
运算符的子查询中使用 WHERE
子句,以过滤返回的行。
注意:在下列示例中,返回的结果行未进行排序。
WITH sequences AS
(SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT [5, 10] AS some_numbers)
SELECT
ARRAY(SELECT x * 2
FROM UNNEST(some_numbers) AS x
WHERE x < 5) AS doubled_less_than_five
FROM sequences;
+------------------------+
| doubled_less_than_five |
+------------------------+
| [0, 2, 2, 4, 6] |
| [4, 8] |
| [] |
+------------------------+
请注意,第三行包含一个空数组,这是因为与之对应的原始行 ([5, 10]
) 中的元素不符合 x < 5
的过滤条件。
您也可以使用 SELECT DISTINCT
来过滤数组,这样可以仅返回数组中的唯一元素。
WITH sequences AS
(SELECT [0, 1, 1, 2, 3, 5] AS some_numbers)
SELECT ARRAY(SELECT DISTINCT x
FROM UNNEST(some_numbers) AS x) AS unique_numbers
FROM sequences;
+-----------------+
| unique_numbers |
+-----------------+
| [0, 1, 2, 3, 5] |
+-----------------+
您还可以使用 IN
关键字过滤数组行。此关键字可过滤包含数组的行,方法是确定特定值是否与数组中的元素匹配。
WITH sequences AS
(SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT [5, 10] AS some_numbers)
SELECT
ARRAY(SELECT x
FROM UNNEST(some_numbers) AS x
WHERE 2 IN UNNEST(some_numbers)) AS contains_two
FROM sequences;
+--------------------+
| contains_two |
+--------------------+
| [0, 1, 1, 2, 3, 5] |
| [2, 4, 8, 16, 32] |
| [] |
+--------------------+
另请注意,第三行包含一个空数组,这是因为与之对应的原始行 ([5, 10]
) 中的数组不包含 2
。
扫描数组
如需检查数组是否包含特定值,请结合使用 IN
运算符和 UNNEST
。如需检查数组是否包含与某条件匹配的值,请结合使用 EXISTS
函数和 UNNEST
。
扫描特定值
如需扫描数组中的特定值,请将 IN
运算符与 UNNEST
结合使用。
示例
以下示例在数组中包含数字 2 时返回 true
。
SELECT 2 IN UNNEST([0, 1, 1, 2, 3, 5]) AS contains_value;
+----------------+
| contains_value |
+----------------+
| true |
+----------------+
对于一个表,要返回数组列中包含特定值的行,请使用 WHERE
子句过滤 IN UNNEST
的结果。
示例
以下示例返回数组列中包含值 2 的行的 id
值。
WITH sequences AS
(SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE 2 IN UNNEST(sequences.some_numbers)
ORDER BY matching_rows;
+---------------+
| matching_rows |
+---------------+
| 1 |
| 2 |
+---------------+
扫描满足条件的值
如需扫描数组以查找与条件相匹配的值,请使用 UNNEST
返回包含数组中元素的表,使用 WHERE
过滤子查询中的结果表,然后使用 EXISTS
检查过滤后的表是否包含任何行。
示例
以下示例返回包含大于 5 的值的数组列中行的 id
值。
WITH sequences AS
(SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers)
SELECT id AS matching_rows FROM sequences
WHERE EXISTS (SELECT *
FROM UNNEST(some_numbers) AS x
WHERE x > 5);
+---------------+
| matching_rows |
+---------------+
| 2 |
| 3 |
+---------------+
扫描满足条件的 STRUCT 字段值
如需搜索 STRUCT
数组中值与条件相匹配的字段,请使用 UNNEST
返回一个每列均与一个 STRUCT
字段相对应的表,然后使用 WHERE EXISTS
从表中过滤掉不匹配的行。
示例
以下示例返回数组列中包含 STRUCT
且其字段 b
的值大于 3 的行。
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS a, 4 AS b)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS a, 3 AS b), STRUCT (7 AS a, 4 AS b)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
+---------------+
| matching_rows |
+---------------+
| 2 |
| 3 |
+---------------+
数组和聚合
借助 BigQuery,您可以使用 ARRAY_AGG()
将值聚合到数组中。
WITH fruits AS
(SELECT "apple" AS fruit
UNION ALL SELECT "pear" AS fruit
UNION ALL SELECT "banana" AS fruit)
SELECT ARRAY_AGG(fruit) AS fruit_basket
FROM fruits;
+-----------------------+
| fruit_basket |
+-----------------------+
| [apple, pear, banana] |
+-----------------------+
由 ARRAY_AGG()
返回的数组以任意顺序排列,这是因为该函数连接值的顺序无法保证。如需对数组元素进行排序,请使用 ORDER BY
。例如:
WITH fruits AS
(SELECT "apple" AS fruit
UNION ALL SELECT "pear" AS fruit
UNION ALL SELECT "banana" AS fruit)
SELECT ARRAY_AGG(fruit ORDER BY fruit) AS fruit_basket
FROM fruits;
+-----------------------+
| fruit_basket |
+-----------------------+
| [apple, banana, pear] |
+-----------------------+
您还可以将聚合函数(如 SUM()
)应用于数组中的元素。例如,以下查询返回 sequences
表中每一行的数组元素的总和。
WITH sequences AS
(SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT [5, 10] AS some_numbers)
SELECT some_numbers,
(SELECT SUM(x)
FROM UNNEST(s.some_numbers) x) AS sums
FROM sequences s;
+--------------------+------+
| some_numbers | sums |
+--------------------+------+
| [0, 1, 1, 2, 3, 5] | 12 |
| [2, 4, 8, 16, 32] | 62 |
| [5, 10] | 15 |
+--------------------+------+
BigQuery 还支持聚合函数 ARRAY_CONCAT_AGG()
,该函数可以跨行连接数组列的元素。
WITH aggregate_example AS
(SELECT [1,2] AS numbers
UNION ALL SELECT [3,4] AS numbers
UNION ALL SELECT [5, 6] AS numbers)
SELECT ARRAY_CONCAT_AGG(numbers) AS count_to_six_agg
FROM aggregate_example;
+--------------------------------------------------+
| count_to_six_agg |
+--------------------------------------------------+
| [1, 2, 3, 4, 5, 6] |
+--------------------------------------------------+
注意:ARRAY_CONCAT_AGG()
连接值的顺序无法保证,因此该函数返回的数组是不确定的。
将数组转换为字符串
利用 ARRAY_TO_STRING()
函数,您可以将 ARRAY<STRING>
转换为单个 STRING
值,或者将 ARRAY<BYTES>
转换为单个 BYTES
值,其中,生成的值是有序连接的数组元素。
第二个参数是分隔符,相应函数将在输入之间插入该分隔符以生成输出;第二个参数必须与第一个参数的元素具有相同类型。
示例:
WITH greetings AS
(SELECT ["Hello", "World"] AS greeting)
SELECT ARRAY_TO_STRING(greeting, " ") AS greetings
FROM greetings;
+-------------+
| greetings |
+-------------+
| Hello World |
+-------------+
第三个参数(可选)取代输入数组中的 NULL
值。
如果您省略此参数,则该函数将忽略
NULL
数组元素。如果您提供空字符串,则该函数会为
NULL
数组元素插入分隔符。
示例:
SELECT
ARRAY_TO_STRING(arr, ".", "N") AS non_empty_string,
ARRAY_TO_STRING(arr, ".", "") AS empty_string,
ARRAY_TO_STRING(arr, ".") AS omitted
FROM (SELECT ["a", NULL, "b", NULL, "c", NULL] AS arr);
+------------------+--------------+---------+
| non_empty_string | empty_string | omitted |
+------------------+--------------+---------+
| a.N.b.N.c.N | a..b..c. | a.b.c |
+------------------+--------------+---------+
合并数组
在某些情况下,您可能需要将多个数组合并为单个数组。
您可以使用 ARRAY_CONCAT()
函数来完成此操作。
SELECT ARRAY_CONCAT([1, 2], [3, 4], [5, 6]) as count_to_six;
+--------------------------------------------------+
| count_to_six |
+--------------------------------------------------+
| [1, 2, 3, 4, 5, 6] |
+--------------------------------------------------+
压缩数组
假定两个大小相同的数组,您可以将其合并到一个数组中,该数组由输入数组中的元素对(取自其相应的位置)组成。此操作有时称为压缩。
您可以使用 UNNEST
和 WITH OFFSET
压缩数组。在此示例中,每个值对都以 STRUCT
形式存储在数组中。
WITH combinations AS (
SELECT
['a', 'b'] AS letters,
[1, 2, 3] AS numbers
)
SELECT ARRAY_AGG(
STRUCT(letter, numbers[OFFSET(letters_offset)] AS number)
) AS pairs
FROM combinations, UNNEST(letters) AS letter WITH OFFSET AS letters_offset;
+------------------------------+
| pairs |
+------------------------------+
| [{ letter: "a", number: 1 }, |
| { letter: "b", number: 2 }] |
+------------------------------+
您可以使用长度不同的输入数组,只要第一个数组的长度等于或小于第二个数组的长度。压缩后的数组将是最短输入数组的长度。
构建数组的数组
BigQuery 不支持直接构建数组的数组。正确做法是,您必须创建一个结构体数组,每个结构体包含一个 ARRAY
类型的字段。为便于说明,请参考以下 points
表:
+----------+
| point |
+----------+
| [1, 5] |
| [2, 8] |
| [3, 7] |
| [4, 1] |
| [5, 7] |
+----------+
现在,假设您要创建一个由 points
表中的每个 point
组成的数组。如需完成此操作,请将各行返回的数组封装在一个 STRUCT
中,如下所示。
WITH points AS
(SELECT [1, 5] as point
UNION ALL SELECT [2, 8] as point
UNION ALL SELECT [3, 7] as point
UNION ALL SELECT [4, 1] as point
UNION ALL SELECT [5, 7] as point)
SELECT ARRAY(
SELECT STRUCT(point)
FROM points)
AS coordinates;
+-------------------+
| coordinates |
+-------------------+
| [{point: [1,5]}, |
| {point: [2,8]}, |
| {point: [5,7]}, |
| {point: [3,7]}, |
| {point: [4,1]}] |
+--------------------+