The ML.HASH_BUCKETIZE function

This document describes the ML.HASH_BUCKETIZE function, which lets you convert a string expression to a deterministic hash and then bucketize it by the modulo value of that hash.

Syntax

ML.HASH_BUCKETIZE(string_expression, hash_bucket_size)

Arguments

ML.HASH_BUCKETIZE takes the following arguments:

  • string_expression: the STRING expression to bucketize.
  • hash_bucket_size: an INT64 value that specifies the number of buckets to create. This value must be greater than or equal to 0. If hash_bucket_size equals 0, the function only hashes the string without bucketizing the hashed value.

Output

ML.HASH_BUCKETIZE returns an INT64 value that identifies the bucket.

Example

The following example bucketizes string expressions into three buckets:

SELECT
  f, ML.HASH_BUCKETIZE(f, 3) AS bucket
FROM UNNEST(['a', 'b', 'c', 'd']) AS f;

The output looks similar to the following:

+---+--------+
| f | bucket |
+---+--------+
| a |   0    |
+---+--------+
| b |   1    |
+---+--------+
| c |   1    |
+---+--------+
| d |   2    |
+------------+

What's next