標準 SQL 中的 HyperLogLog++ 函式

BigQuery 支援使用 HyperLogLog++ 演算法的下列 approximate aggregate 函式。如需 approximate aggregate 函式運作方式的說明,請參閱近似匯總

HLL_COUNT.INIT

HLL_COUNT.INIT(input [, precision])

說明

這個匯總函式會取用一或多個 input 值,並將這些值匯總到 HyperLogLog++ 草圖。每個草圖都以 BYTES 資料類型表示。然後您可以使用 HLL_COUNT.MERGEHLL_COUNT.MERGE_PARTIAL 合併這些草圖。如果不需要合併,則可使用 HLL_COUNT.EXTRACT 從草圖中擷取不同值的最終計數。

input 可以是下列其中一項:

  • INT64
  • STRING
  • BYTES

這個函式支援選用參數 precision。這個參數定義了估計處理草圖或將草圖儲存到磁碟上所需要之額外記憶體成本的準確率。下表顯示允許的精確度值、每個群組的最大草圖大小,以及典型精確度的信賴區間 (CI):

精確度 最大草圖大小 (KiB) CI CI CI
10 1 ±1.63% ±3.25% ±6.50%
11 2 ±1.15% ±2.30% ±4.60%
12 4 ±0.81% ±1.63% ±3.25%
13 8 ±0.57% ±1.15% ±1.72%
14 16 ±0.41% ±0.81% ±1.22%
15 (預設) 32 ±0.29% ±0.57% ±0.86%
16 64 ±0.20% ±0.41% ±0.61%
17 128 ±0.14% ±0.29% ±0.43%
18 256 ±0.10% ±0.20% ±0.41%
19 512 ±0.07% ±0.14% ±0.29%
20 1024 ±0.05% ±0.10% ±0.20%
21 2048 ±0.04% ±0.07% ±0.14%
22 4096 ±0.03% ±0.05% ±0.10%
23 8192 ±0.02% ±0.04% ±0.07%
24 16384 ±0.01% ±0.03% ±0.05%

如果輸入是 NULL,這個函式會傳回 NULL。

如需詳細資訊,請參閱 HyperLogLog 實踐:頂尖基數估計演算法的演算工程

支援的輸入類型

BYTES

傳回類型

BYTES

範例

SELECT
  HLL_COUNT.INIT(respondent) AS respondents_hll,
  flavor,
  country
FROM UNNEST([
  STRUCT(1 AS respondent, "Vanilla" AS flavor, "CH" AS country),
  (1, "Chocolate", "CH"),
  (2, "Chocolate", "US"),
  (2, "Strawberry", "US")])
GROUP BY flavor, country;

HLL_COUNT.MERGE

HLL_COUNT.MERGE(sketch)

說明

透過計算聯集來傳回數個 HyperLogLog++ 組草圖基數的 aggregate 函式。

每個 sketch 都必須具有相同的精確度,並對相同類型進行初始化。嘗試合併具有不同精確度的草圖,或針對不同的類型合併草圖,會導致產生錯誤。例如,您無法合併從 INT64 資料初始化的草圖與從 STRING 資料初始化的草圖。

合併草圖時,這個函式會忽略 NULL 值。如果零個資料列或只有 NULL 值發生合併,函式會傳回 0

支援的輸入類型

BYTES

傳回類型

INT64

範例

SELECT HLL_COUNT.MERGE(respondents_hll) AS num_respondents, flavor
FROM (
  SELECT
    HLL_COUNT.INIT(respondent) AS respondents_hll,
    flavor,
    country
  FROM UNNEST([
    STRUCT(1 AS respondent, "Vanilla" AS flavor, "CH" AS country),
    (1, "Chocolate", "CH"),
    (2, "Chocolate", "US"),
    (2, "Strawberry", "US")])
  GROUP BY flavor, country)
GROUP BY flavor;

HLL_COUNT.MERGE_PARTIAL

HLL_COUNT.MERGE_PARTIAL(sketch)

說明

這個匯總函式會取用一或多個 HyperLogLog++ sketch 輸入,並將這些輸入匯總到新草圖。

如果沒有輸入,或所有輸入都是 NULL,這個函式會傳回 NULL。

支援的輸入類型

BYTES

傳回類型

BYTES

範例

SELECT HLL_COUNT.MERGE_PARTIAL(respondents_hll) AS num_respondents, flavor
FROM (
  SELECT
    HLL_COUNT.INIT(respondent) AS respondents_hll,
    flavor,
    country
  FROM UNNEST([
    STRUCT(1 AS respondent, "Vanilla" AS flavor, "CH" AS country),
    (1, "Chocolate", "CH"),
    (2, "Chocolate", "US"),
    (2, "Strawberry", "US")])
  GROUP BY flavor, country)
GROUP BY flavor;

HLL_COUNT.EXTRACT

HLL_COUNT.EXTRACT(sketch)

說明

擷取單一 HyperLogLog++ 草圖之基數估計的純量函式。

如果 sketch 是 NULL,這個函式會傳回 0 的基數估計。

支援的輸入類型

BYTES

傳回類型

INT64

範例

SELECT
  flavor,
  country,
  HLL_COUNT.EXTRACT(respondents_hll) AS num_respondents
FROM (
  SELECT
    HLL_COUNT.INIT(respondent) AS respondents_hll,
    flavor,
    country
  FROM UNNEST([
    STRUCT(1 AS respondent, "Vanilla" AS flavor, "CH" AS country),
    (1, "Chocolate", "CH"),
    (2, "Chocolate", "US"),
    (2, "Strawberry", "US")])
  GROUP BY flavor, country);

+------------+---------+-----------------+
| flavor     | country | num_respondents |
+------------+---------+-----------------+
| Vanilla    | CH      | 1               |
| Chocolate  | CH      | 1               |
| Chocolate  | US      | 1               |
| Strawberry | US      | 1               |
+------------+---------+-----------------+
本頁內容對您是否有任何幫助?請提供意見:

傳送您對下列選項的寶貴意見...

這個網頁