The ML.HASH_BUCKETIZE function

This document describes the ML.HASH_BUCKETIZE function, which lets you convert a string expression to a deterministic hash and then bucketize it by the modulo value of that hash.

You can use this function with models that support manual feature preprocessing. For more information, see the following documents:

Syntax

ML.HASH_BUCKETIZE(string_expression, hash_bucket_size)

Arguments

ML.HASH_BUCKETIZE takes the following arguments:

  • string_expression: the STRING expression to bucketize.
  • hash_bucket_size: an INT64 value that specifies the number of buckets to create. This value must be greater than or equal to 0. If hash_bucket_size equals 0, the function only hashes the string without bucketizing the hashed value.

Output

ML.HASH_BUCKETIZE returns an INT64 value that identifies the bucket.

Example

The following example bucketizes string expressions into three buckets:

SELECT
  f, ML.HASH_BUCKETIZE(f, 3) AS bucket
FROM UNNEST(['a', 'b', 'c', 'd']) AS f;

The output looks similar to the following:

+---+--------+
| f | bucket |
+---+--------+
| a |   0    |
+---+--------+
| b |   1    |
+---+--------+
| c |   1    |
+---+--------+
| d |   2    |
+------------+

What's next