The ML.WEIGHTS function
This document describes the ML.WEIGHTS
function, which lets you see the
underlying weights that a model uses during prediction. This function applies to
linear and logistic regression models
and
matrix factorization models.
For matrix factorization models, you can use the
ML.GENERATE_EMBEDDING
function as an alternative to the ML.WEIGHTS
function.
ML.GENERATE_EMBEDDING
generates the same factor weights and intercept data as
ML.WEIGHTS
as an array in a single column, rather than in two columns.
Having all of the embeddings in a single column lets you directly use the
VECTOR_SEARCH
function
on theML.GENERATE_EMBEDDING
output.
Syntax
ML.WEIGHTS( MODEL `project_id.dataset.model` STRUCT([, standardize AS standardize]))
Arguments
ML.WEIGHTS
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.standardize
: aBOOL
value that specifies whether the model weights should be standardized to assume that all features have a mean of0
and a standard deviation of1
. Standardizing the weights allows the absolute magnitude of the weights to be compared to each other. The default value isFALSE
. This argument only applies to linear and logistic regression models.
Output
ML.WEIGHTS
has different output columns for different model types.
Linear and logistic regression models
For linear and logistic regression models, ML.WEIGHTS
returns the
following columns:
trial_id
: anINT64
value that contains the hyperparameter tuning trial ID. This column is only returned if you ran hyperparameter tuning when creating the model.processed_input
: aSTRING
value that contains the name of the feature input column. The value of this column matches the name of the feature column provided in thequery_statement
clause that was used when the model was trained.weight
: if the column identified by theprocessed_input
value is numerical,weight
contains aFLOAT64
value and thecategory_weights
column containsNULL
values. If the column identified by theprocessed_input
value is non-numerical and has been converted to one-hot encoding, theweight
column isNULL
and thecategory_weights
column contains the category names and weights for each category.category_weights.category
: aSTRING
value that contains the category name if the column identified by theprocessed_input
value is non-numeric.category_weights.weight
: aFLOAT64
that contains the category's weight if the column identified by theprocessed_input
value is non-numeric.class_label
: aSTRING
value that contains the label for a given weight. Only used for multiclass models. The output includes one row per<class_label, processed_input>
combination.
If you used the
TRANSFORM
clause
in the CREATE MODEL
statement that created the model, ML.WEIGHTS
outputs
the weights of TRANSFORM
output features. The weights are denormalized by
default, with the option to get normalized weights, exactly like models that
are created without TRANSFORM
.
Matrix factorization models
For matrix factorization models, ML.WEIGHTS
returns the following columns:
trial_id
: anINT64
value that contains the hyperparameter tuning trial ID. This column is only returned if you ran hyperparameter tuning when creating the model.processed_input
: aSTRING
value that contains the name of the user or item column. The value of this column matches the name of the user or item column provided in thequery_statement
clause that was used when the model was trained.feature
: aSTRING
value that contains the names of the specific users or items used during training.factor_weights
: anARRAY<STRUCT>
value that contains the factors and the weights for each factor.factor_weights.factor
: anINT64
value that contains the latent factor from training. This value can be between1
and the value of theNUM_FACTORS
option.factor_weights.weight
: aFLOAT64
value that contains the weight of the respective factor and feature.
intercept
: aFLOAT64
value that contains the intercept or bias term for a feature.
There is an additional row in the output that contains the
global__intercept__
value calculated from the input data. This row has NULL
values for the processed_input
and factor_weights
columns. For
implicit feedback
models, global__intercept__
is always 0.
Examples
The following examples show how to use ML.WEIGHTS
with and without the
standardize
argument.
Without standardization
The following example retrieves weight information from mymodel
in
mydataset
. The dataset is in your default project. It returns the weights
that are associated with each one-hot encoded category for the input column
input_col
.
SELECT category, weight FROM UNNEST(( SELECT category_weights FROM ML.WEIGHTS(MODEL `mydataset.mymodel`) WHERE processed_input = 'input_col'))
This command uses the UNNEST
function because the category_weights
column is a nested repeated column.
With standardization
The following example retrieves weight information from mymodel
in
mydataset
. The dataset is in your default project. It retrieves standardized
weights, which assume all features have a mean of 0
and a standard deviation
of 1
.
SELECT * FROM ML.WEIGHTS(MODEL `mydataset.mymodel`, STRUCT(true AS standardize))
What's next
- For information about model weights support in BigQuery ML, see BigQuery ML model weights overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.