The ML.QUANTILE_BUCKETIZE function
This document describes the ML.QUANTILE_BUCKETIZE
function, which lets you
break a continuous numerical feature into buckets based on quantiles.
When used in the
TRANSFORM
clause,
the same quantiles are automatically used in prediction.
You can use this function with models that support manual feature preprocessing. For more information, see the following documents:
Syntax
ML.QUANTILE_BUCKETIZE(numerical_expression, num_buckets [, output_format]) OVER()
Arguments
ML.QUANTILE_BUCKETIZE
takes the following arguments:
numerical_expression
: the numerical expression to bucketize.num_buckets
: anINT64
value that specifies the number of buckets to splitnumerical_expression
into.output_format
: aSTRING
value that specifies the output format of the bucket. Valid output formats are as follows:bucket_names
: returns aSTRING
value in the formatbin_<bucket_index>
. For example,bin_3
. Thebucket_index
value starts at 1. This is the default bucket format.bucket_ranges
: returns aSTRING
value in the format[lower_bound, upper_bound)
in interval notation. For example,(-inf, 2.5)
,[2.5, 4.6)
,[4.6, +inf)
.bucket_ranges_json
: returns a JSON-formattedSTRING
value in the format{"start": "lower_bound", "end": "upper_bound"}
. For example,{"start": "-Infinity", "end": "2.5"}
,{"start": "2.5", "end": "4.6"}
,{"start": "4.6", "end": "Infinity"}
. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as thebucket_ranges
option.
Output
ML.QUANTILE_BUCKETIZE
returns a STRING
value that contains the name of the bucket, in the format specified by the output_format
argument.
Example
The following example breaks a numerical expression of five elements into three buckets:
SELECT f, ML.QUANTILE_BUCKETIZE(f, 3) OVER() AS bucket, ML.QUANTILE_BUCKETIZE(f, 3, "bucket_ranges") OVER() AS bucket_ranges, ML.QUANTILE_BUCKETIZE(f, 3, "bucket_ranges_json") OVER() AS bucket_ranges_json FROM UNNEST([1,2,3,4,5]) AS f ORDER BY f;
The output looks similar to the following:
+---+--------+---------------+------------------------------------+ | f | bucket | bucket_ranges | bucket_ranges_json | |---|--------|---------------|------------------------------------| | 1 | bin_1 | (-inf, 2) | {"start": "-Infinity", "end": "2"} | | 2 | bin_2 | [2, 4) | {"start": "2", "end": "4"} | | 3 | bin_2 | [2, 4) | {"start": "2", "end": "4"} | | 4 | bin_3 | [4, +inf) | {"start": "4", "end": "Infinity"} | | 5 | bin_3 | [4, +inf) | {"start": "4", "end": "Infinity"} | +---+--------+---------------+------------------------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.