Manual feature preprocessing
You can use the
TRANSFORM
clause
of the CREATE MODEL
statement in combination with manual preprocessing
functions to define custom data preprocessing. You can
also use these manual preprocessing functions outside of the TRANSFORM
clause.
If you want to decouple data preprocessing from model training, you can create a
transform-only model
that only performs data transformations by using the TRANSFORM
clause.
You can use the
ML.TRANSFORM
function
to increase the transparency of feature preprocessing. This function lets you
return the preprocessed data from a model's TRANSFORM
clause, so that you can
see the actual training data that goes into the model training, as well as the
actual prediction data that goes into model serving.
For information about feature preprocessing support in BigQuery ML, see Feature preprocessing overview.
For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.
Types of preprocessing functions
There are several types of manual preprocessing functions:
- Scalar functions operate on a single row. For example,
ML.BUCKETIZE
. - Table-valued functions operate on all rows and output a table. For example,
ML.FEATURES_AT_TIME
. Analytic functions operate on all rows, and output the result for each row based on the statistics collected across all rows. For example,
ML.QUANTILE_BUCKETIZE
.You must always use an empty
OVER()
clause with ML analytic functions.When you use ML analytic functions inside the
TRANSFORM
clause during training, the same statistics are automatically applied to the input in prediction.
The following sections describe the available preprocessing functions.
General functions
Use the following function on string or numerical expressions to do data cleanup:
Numerical functions
Use the following functions on numerical expressions to regularize data:
ML.BUCKETIZE
ML.MAX_ABS_SCALER
ML.MIN_MAX_SCALER
ML.NORMALIZER
ML.POLYNOMIAL_EXPAND
ML.QUANTILE_BUCKETIZE
ML.ROBUST_SCALER
ML.STANDARD_SCALER
Categorical functions
Use the following functions on categorize data:
Text functions
Use the following functions on text string expressions:
Image functions
Use the following functions on image data:
Known limitations
- BigQuery ML supports both automatic preprocessing and manual
preprocessing in the model export. See
the supported data types
and functions
for exporting models trained with the
BigQuery ML
TRANSFORM
clause.