The ML.QUANTILE_BUCKETIZE function
This document describes the ML.QUANTILE_BUCKETIZE
function, which lets you
break a continuous numerical feature into buckets based on quantiles.
When used in the
TRANSFORM
clause,
the same quantiles are automatically used in prediction.
Syntax
ML.QUANTILE_BUCKETIZE(numerical_expression, num_buckets [, output_format]) OVER()
Arguments
ML.QUANTILE_BUCKETIZE
takes the following arguments:
numerical_expression
: the numerical expression to bucketize.num_buckets
: anINT64
value that specifies the number of buckets to splitnumerical_expression
into.output_format
: aSTRING
value that specifies the output format of the bucket. Valid output formats are as follows:bucket_names
: returns aSTRING
value in the formatbin_<bucket_index>
. For example,bin_3
. Thebucket_index
value starts at 1. This is the default bucket format.bucket_ranges
: returns aSTRING
value in the format[lower_bound, upper_bound)
in interval notation. For example,(-inf, 2.5)
,[2.5, 4.6)
,[4.6, +inf)
.bucket_ranges_json
: returns a JSON-formattedSTRING
value in the format{"start": "lower_bound", "end": "upper_bound"}
. For example,{"start": "-Infinity", "end": "2.5"}
,{"start": "2.5", "end": "4.6"}
,{"start": "4.6", "end": "Infinity"}
. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as thebucket_ranges
option.
Output
ML.QUANTILE_BUCKETIZE
returns a STRING
value that contains the name of the bucket, in the format specified by the output_format
argument.
Example
The following example breaks a numerical expression of five elements into three buckets:
SELECT f, ML.QUANTILE_BUCKETIZE(f, 3) OVER() AS bucket, ML.QUANTILE_BUCKETIZE(f, 3, "bucket_ranges") OVER() AS bucket_ranges, ML.QUANTILE_BUCKETIZE(f, 3, "bucket_ranges_json") OVER() AS bucket_ranges_json FROM UNNEST([1,2,3,4,5]) AS f ORDER BY f;
The output looks similar to the following:
+---+--------+---------------+------------------------------------+ | f | bucket | bucket_ranges | bucket_ranges_json | |---|--------|---------------|------------------------------------| | 1 | bin_1 | (-inf, 2) | {"start": "-Infinity", "end": "2"} | | 2 | bin_2 | [2, 4) | {"start": "2", "end": "4"} | | 3 | bin_2 | [2, 4) | {"start": "2", "end": "4"} | | 4 | bin_3 | [4, +inf) | {"start": "4", "end": "Infinity"} | | 5 | bin_3 | [4, +inf) | {"start": "4", "end": "Infinity"} | +---+--------+---------------+------------------------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.