Stay organized with collections
Save and categorize content based on your preferences.
The ML.HASH_BUCKETIZE function
This document describes the ML.HASH_BUCKETIZE function, which lets you
convert a string expression to a deterministic hash and then bucketize it by the
modulo value of that hash.
You can use this function with models that support
manual feature preprocessing. For more
information, see the following documents:
string_expression: the STRING expression to bucketize.
hash_bucket_size: an INT64 value that specifies the number of buckets to
create. This value must be greater than or equal to 0. If
hash_bucket_size equals 0, the function only hashes the string without
bucketizing the hashed value.
Output
ML.HASH_BUCKETIZE returns an INT64 value that identifies the bucket.
Example
The following example bucketizes string expressions into three buckets:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003e\u003ccode\u003eML.HASH_BUCKETIZE\u003c/code\u003e converts a string expression into a deterministic hash value.\u003c/p\u003e\n"],["\u003cp\u003eThe function then bucketizes this hash based on the modulo of the provided \u003ccode\u003ehash_bucket_size\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003ehash_bucket_size\u003c/code\u003e determines the number of buckets and must be an \u003ccode\u003eINT64\u003c/code\u003e greater than or equal to 0.\u003c/p\u003e\n"],["\u003cp\u003eWhen \u003ccode\u003ehash_bucket_size\u003c/code\u003e is 0, the function will hash the string but not bucketize the hashed value.\u003c/p\u003e\n"],["\u003cp\u003eThe function returns an \u003ccode\u003eINT64\u003c/code\u003e representing the assigned bucket for the input string.\u003c/p\u003e\n"]]],[],null,["# The ML.HASH_BUCKETIZE function\n==============================\n\nThis document describes the `ML.HASH_BUCKETIZE` function, which lets you\nconvert a string expression to a deterministic hash and then bucketize it by the\nmodulo value of that hash.\n\nSyntax\n------\n\n```sql\nML.HASH_BUCKETIZE(string_expression, hash_bucket_size)\n```\n\n### Arguments\n\n`ML.HASH_BUCKETIZE` takes the following arguments:\n\n- `string_expression`: the `STRING` expression to bucketize.\n- `hash_bucket_size`: an `INT64` value that specifies the number of buckets to create. This value must be greater than or equal to `0`. If `hash_bucket_size` equals `0`, the function only hashes the string without bucketizing the hashed value.\n\nOutput\n------\n\n`ML.HASH_BUCKETIZE` returns an `INT64` value that identifies the bucket.\n\nExample\n-------\n\nThe following example bucketizes string expressions into three buckets: \n\n```sql\nSELECT\n f, ML.HASH_BUCKETIZE(f, 3) AS bucket\nFROM UNNEST(['a', 'b', 'c', 'd']) AS f;\n```\n\nThe output looks similar to the following: \n\n```\n+---+--------+\n| f | bucket |\n+---+--------+\n| a | 0 |\n+---+--------+\n| b | 1 |\n+---+--------+\n| c | 1 |\n+---+--------+\n| d | 2 |\n+------------+\n```\n\nWhat's next\n-----------\n\n- For information about feature preprocessing, see [Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]