The ML.BUCKETIZE function

This document describes the ML.BUCKETIZE function, which lets you split a numerical expression into buckets.

Syntax

ML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries])

Arguments

ML.BUCKETIZE takes the following arguments:

  • numerical_expression: the numerical expression to bucketize.
  • array_split_points: an array of numerical values that provide the points at which to split the numerical_expression value. The numerical values in the array must be finite, so not -inf, inf, or NaN. Provide the numerical values in order, lowest to highest. The range of possible buckets is determined by the upper and lower boundaries of the array. For example, if the array_split_points value is [1, 2, 3, 4], then there are five potential buckets that the numerical_expression value can be bucketized into.
  • exclude_boundaries: a BOOL value that determines whether the upper and lower boundaries from array_split_points are used. If TRUE, then the boundary values aren't used to create buckets. For example, if the array_split_points value is [1, 2, 3, 4] and exclude_boundaries is TRUE, then there are three potential buckets that the numerical_expression value can be bucketized into. The default value is FALSE.

Output

ML.BUCKETIZE returns a STRING value that contains the name of the bucket. The returned bucket names are in the format of bin_<bucket_index>, with bucket_index starting at 1.

Example

The following example bucketizes a numerical expression both with and without boundaries:

SELECT
  ML.BUCKETIZE(2.5, [1, 2, 3]) AS bucket,
  ML.BUCKETIZE(2.5, [1, 2, 3], TRUE) AS bucket_without_boundaries;

The output looks similar to the following:

+---------+----------------------------+
| bucket  | bucket_without_boundaries  |
+---------+----------------------------+
| bin_3   | bin_2                      |
+---------+----------------------------+

What's next