The ML.TRANSFORM function

This document describes the ML.TRANSFORM function, which you can use to preprocess feature data. This function processes input data by applying the data transformations captured in the TRANSFORM clause of an existing model. The statistics that were calculated for data transformation during model training are applied to the input data of the function.

For more information about which models support feature preprocessing, see End-to-end user journey for each model.

Syntax

ML.TRANSFORM(
  MODEL `project_id.dataset.model`,
  { TABLE `project_id.dataset.table` | (query_statement) }
)

Arguments

ML.TRANSFORM takes the following arguments:

  • project_id: Your project ID.
  • dataset: a STRING value that specifies the BigQuery dataset that contains the model.
  • model: The name of a model. The model must have been created by using a CREATE MODEL statement that includes a TRANSFORM clause to manually preprocess feature data. You can check to see if a model uses a TRANSFORM clause by using the bq show command to look at the model's metadata. If the model was trained using a TRANSFORM clause, the model metadata contains a section about the transform columns. The function returns an error if you specify a model that was trained without a TRANSFORM clause.
  • table: The name of the input table that contains the feature data to preprocess.

    If table is specified, the input column names in the table must match the input column names in the model's TRANSFORM clause, and their types should be compatible according to BigQuery implicit coercion rules. You can get the input column names and data types from the model's metadata, in the section about the feature columns.

  • query_statement: A query that generates the feature data to preprocess. For the supported SQL syntax of the query_statement clause, see GoogleSQL query syntax.

    If query_statement is specified, the input column names from the query must match the input column names in the model's TRANSFORM clause, and their types should be compatible according to BigQuery implicit coercion rules. You can get the input column names and data types from the model's metadata, in the section about the feature columns.

Output

ML.TRANSFORM returns the columns specified in the model's TRANSFORM clause.

Example

The following example returns feature data that has been preprocessed by using the TRANSFORM clause included in the model named mydataset.mymodel in your default project.

Create the model that contains the TRANSFORM clause:

CREATE OR REPLACE MODEL `mydataset.mymodel`
  TRANSFORM(
    species,
    island,
    ML.MAX_ABS_SCALER(culmen_length_mm) OVER () AS culmen_length_mm,
    ML.MAX_ABS_SCALER(flipper_length_mm) OVER () AS flipper_length_mm,
    sex,
    body_mass_g)
  OPTIONS (
    model_type = 'linear_reg',
    input_label_cols = ['body_mass_g'])
AS (
  SELECT *
  FROM `bigquery-public-data.ml_datasets.penguins`
  WHERE body_mass_g IS NOT NULL
);

Return feature data preprocessed by the model's TRANSFORM clause:

SELECT
  *
FROM
  ML.TRANSFORM(
    MODEL `mydataset.mymodel`,
    TABLE `bigquery-public-data.ml_datasets.penguins`);

The result is similar to the following:

+-------------------------------------+--------+---------------------+---------------------+--------+-----------------+-------------+
| species                             | island | culmen_length_mm    | flipper_length_mm   | sex    | culmen_depth_mm | body_mass_g |
--------------------------------------+--------+ ------------------- +---------------------+--------+-----------------+-------------+
| Adelie Penguin (Pygoscelis adeliae) | Dream  | 0.61409395973154368 | 0.79653679653679654 | Female | 18.4            | 3475.0      |
| Adelie Penguin (Pygoscelis adeliae) | Dream  | 0.66778523489932884 | 0.79653679653679654 | Male   | 19.1            | 4650.0      |
+-------------------------------------+--------+---------------------+---------------------+--------+-----------------+-------------+

What's next