The ML.ADVANCED_WEIGHTS function
This document describes the ML.ADVANCED_WEIGHTS
function, which lets you
see the underlying weights that a linear or binary logistic regression model
uses during prediction, along with the associated p-values and
standard errors for that weight. ML.ADVANCED_WEIGHTS
is an extended version of
ML.WEIGHTS
for linear and binary logistic regression models.
Usage requirements
You can only use ML.ADVANCED_WEIGHTS
on linear and binary logistic regression
models that are trained with the following option settings:
- The
CALCULATE_P_VALUES
value isTRUE
. - The
CATEGORY_ENCODING_METHOD
value isDUMMY_ENCODING
. - The
L1_REG
value is0
.
It's common to require standard errors or p-values for either the regression coefficients or other estimated quantities for these penalized regression methods. In principle, such standard errors can be calculated—for example, using the bootstrap. In practice, this calculation isn't done for reasons that the authors of the R package explain as follows:
Multiclass logistic regression models aren't supported.
Syntax
ML.ADVANCED_WEIGHTS( MODEL `project_id.dataset.model`, STRUCT( [standardize AS standardize]))
Arguments
ML.ADVANCED_WEIGHTS
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.standardize
: aBOOL
value that specifies whether the model weights should be standardized to assume that all features have a mean of zero and a standard deviation of one. Standardizing the weights allows the absolute magnitude of the weights to be compared to each other. The default value isFALSE
.
Output
ML.ADVANCED_WEIGHTS
returns the following columns:
processed_input
: aSTRING
value that contains the name of the feature column. The value of this column is the name of the feature column that's provided in thequery_statement
clause used during model training. If the feature is non-numeric, then there are multiple rows with the sameprocessed_input
value, one for each category of the feature.category
: aSTRING
value that contains the category name if the column identified in theprocessed_input
value is non-numeric. Returns aNULL
value for numeric columns.weight
: aFLOAT64
value that contains the weight of each feature.standard_error
: aFLOAT64
value that contains the standard error of the weight.p_value
: aFLOAT64
value that contains the p-value that was tested against the null hypothesis. The p-value for feature $j$ is calculated using the following formula:$$ p(j) = 2 * (1 - stats.norm.cdf(abs(\hat\beta_j), loc=0, scale=\sigma_j)) $$such that $\hat\beta_j$ is the weight of feature $j$ after training and $\sigma_j$ is its standard error.
If the TRANSFORM
clause
was used in the CREATE MODEL
statement that created the model,
ML.ADVANCED_WEIGHTS
outputs the weights of the TRANSFORM
output
features. The weights are denormalized by default, with the option to get
normalized weights, exactly like models that are created without
TRANSFORM
.
Permissions
You must have the bigquery.models.create
andbigquery.models.getData
Identity and Access Management (IAM) permissions
in order to run ML.ADVANCED_WEIGHTS
.
Limitations
The total cardinality of training features must be less than 1,000. This
limitation is the result of the
limitations of computing p-values and standard error
when you set the CALCULATE_P_VALUES
option to TRUE
when training
the model.
Examples
The following examples demonstrate ML.ADVANCED_WEIGHTS
with and without
standardization.
Without standardization
The following example retrieves weight information from mymodel
in
mydataset
where the dataset is in your default project.
The query returns the weights associated with each one-hot encoded category for
the input column input_col
.
SELECT * FROM ML.ADVANCED_WEIGHTS(MODEL `mydataset.mymodel`, STRUCT(FALSE AS standardize))
With standardization
The following example retrieves weight information from mymodel
in
mydataset
. The dataset is in your default project.
The query retrieves standardized weights, which assume all features have a mean
of 0
and a standard deviation of 1.0
.
SELECT * FROM ML.ADVANCED_WEIGHTS(MODEL `mydataset.mymodel`, STRUCT(TRUE AS standardize))
What's next
- For information about Explainable AI, see BigQuery Explainable AI overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.