The ML.BUCKETIZE function
This document describes the ML.BUCKETIZE
function, which lets you split
a numerical expression into buckets.
Syntax
ML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries])
Arguments
ML.BUCKETIZE
takes the following arguments:
numerical_expression
: the numerical expression to bucketize.array_split_points
: an array of numerical values that provide the points at which to split thenumerical_expression
value. The numerical values in the array must be finite, so not-inf
,inf
, orNaN
. Provide the numerical values in order, lowest to highest. The range of possible buckets is determined by the upper and lower boundaries of the array. For example, if thearray_split_points
value is[1, 2, 3, 4]
, then there are five potential buckets that thenumerical_expression
value can be bucketized into.exclude_boundaries
: aBOOL
value that determines whether the upper and lower boundaries fromarray_split_points
are used. IfTRUE
, then the boundary values aren't used to create buckets. For example, if thearray_split_points
value is[1, 2, 3, 4]
andexclude_boundaries
isTRUE
, then there are three potential buckets that thenumerical_expression
value can be bucketized into. The default value isFALSE
.
Output
ML.BUCKETIZE
returns a STRING
value that contains the name of the bucket.
The returned bucket names are in the format of bin_<bucket_index>
, with
bucket_index
starting at 1
.
Example
The following example bucketizes a numerical expression both with and without boundaries:
SELECT ML.BUCKETIZE(2.5, [1, 2, 3]) AS bucket, ML.BUCKETIZE(2.5, [1, 2, 3], TRUE) AS bucket_without_boundaries;
The output looks similar to the following:
+---------+----------------------------+ | bucket | bucket_without_boundaries | +---------+----------------------------+ | bin_3 | bin_2 | +---------+----------------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.