ML.ADVANCED_WEIGHTS
function
The ML.ADVANCED_WEIGHTS
function lets you to see the underlying weights used by a
linear or binary logisitic regression model during prediction with the associated p-values and
standard errors for that weight. ML.ADVANCED_WEIGHTS
can only be used if the model was trained with calculate_p_values=TRUE
and
category_encoding_method=DUMMY_ENCODING
.
ML.ADVANCED_WEIGHTS
is an extended version of ML.WEIGHTS
for linear
and binary logistic regression models.
For information about supported model types of each SQL statement and function, and all supported SQL statements and functions for each model type, read End-to-end user journey for each model.
ML.ADVANCED_WEIGHTS
syntax
In the following example syntax, standardize
is an optional parameter that
determines whether the model weights should be standardized to assume that all
features have a mean of zero and a standard deviation of one. Standardizing
the weights allows the absolute magnitude of the weights to be compared to
each other. The default value is false. The value that is supplied must be the
only field in a
STRUCT.
ML.ADVANCED_WEIGHTS(MODEL `project-id.dataset.model`) [, STRUCT(<T> as standardize)])
Replace the following:
project-id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.
ML.ADVANCED_WEIGHTS
output
ML.ADVANCED_WEIGHTS
returns the following columns:
processed_input
— The name of the model feature input. The value of this column matches the name of the column in theSELECT
statement used during training. If the feature is non-numeric, then there will be multiple rows with the sameprocessed_input
string, one for each category of the feature.category
— The category name if the input column is non-numeric. NULL for numeric columns.weight
— The weight of each feature.standard_error
— The standard error of the weight.p_value
— The p-value that was tested against the null hypothesis. The p-value for feature $j$ is calculated using the following formula:\( p(j) = 2 * (1 - stats.norm.cdf(abs(\hat\beta_j), loc=0, scale=\sigma_j)) \)such that $\hat\beta_j$ is the weight of feature $j$ after training and $\sigma_j$ is its standard error.
If the TRANSFORM
clause was present in the CREATE MODEL
statement that
created model
, ML.ADVANCED_WEIGHTS
outputs the weights of
TRANSFORM
output
features. The weights are denormalized by default, with the option to get
normalized weights, exactly like models that are created without
TRANSFORM
.
ML.ADVANCED_WEIGHTS
permissions
Both bigquery.models.create
and bigquery.models.getData
are required to run
ML.ADVANCED_WEIGHTS
.
ML.ADVANCED_WEIGHTS
syntax examples
The following examples demonstrate ML.ADVANCED_WEIGHTS
with and without standardization.
ML.ADVANCED_WEIGHTS
without standardization
The following example retrieves weight information from mymodel
in
mydataset
where the dataset is in your default project.
The query returns the weights associated with each one-hot encoded category for
the input column input_col
.
SELECT * FROM ML.ADVANCED_WEIGHTS(MODEL `mydataset.mymodel`, STRUCT(false AS standardize))
ML.ADVANCED_WEIGHTS
with standardization
The following example retrieves weight information from mymodel
in
mydataset
. The dataset is in your default project.
The query retrieves standardized weights, which assume all features have a mean of zero and a standard deviation of one.
SELECT * FROM ML.ADVANCED_WEIGHTS(MODEL `mydataset.mymodel`, STRUCT(true AS standardize))
ML.ADVANCED_WEIGHTS
usage requirements
ML.ADVANCED_WEIGHTS
can only be used on linear and binary logisitc regression
models. Multiclass logistic regression models are not supported. ML.ADVANCED_WEIGHTS
requires several machine learning options.
l1_reg
must be zero.- The standard errors or the p-values for either the regression coefficients or other etimated quantities for these penalized regression are common requirements. In principle, such standard errors can be calculated, e.g. using the bootstrap. In practice this is not done for reasons the authors of the R package explained as follows:
[S]tandard errors are not very meaningful for strongly biased estimates such as arise from penalized estimation methods. Penalized estimation is a procedure that reduces the variance of estimators by introducing substantial bias. The bias of each estimator is therefore a major component of its mean squared error, whereas its variance may contribute only a small part. Unfortunately, in most applications of penalized regression it is impossible to obtain a sufficiently precise estimate of the bias. Any bootstrap-based calculations can only give an assessment of the variance of the estimates. Reliable estimates of the bias are only available if reliable unbiased estimates are available, which is typically not the case in situations in which penalized estimates are used.
CATEGORY_ENCODING_METHOD
must be set toDUMMY_ENCODING
.
ML.ADVANCED_WEIGHTS
limitations
The limitations of ML.ADVANCED_WEIGHTS
are the result of the limitations of computing
p-values and standard error when calculate_p_values=TRUE
is set for training:
- Total cardinality of training features must be less than 1,000.