The ML.ADVANCED_WEIGHTS function

This document describes the ML.ADVANCED_WEIGHTS function, which lets you see the underlying weights that a linear or binary logistic regression model uses during prediction, along with the associated p-values and standard errors for that weight. ML.ADVANCED_WEIGHTS is an extended version of ML.WEIGHTS for linear and binary logistic regression models.

Usage requirements

You can only use ML.ADVANCED_WEIGHTS on linear and binary logistic regression models that are trained with the following option settings:

It's common to require standard errors or p-values for either the regression coefficients or other estimated quantities for these penalized regression methods. In principle, such standard errors can be calculated—for example, using the bootstrap. In practice, this calculation isn't done for reasons that the authors of the R package explain as follows:

Standard errors are not very meaningful for strongly biased estimates such as arise from penalized estimation methods. Penalized estimation is a procedure that reduces the variance of estimators by introducing substantial bias. The bias of each estimator is therefore a major component of its mean squared error, whereas its variance may contribute only a small part. Unfortunately, in most applications of penalized regression it is impossible to obtain a sufficiently precise estimate of the bias. Any bootstrap-based calculations can only give an assessment of the variance of the estimates. Reliable estimates of the bias are only available if reliable unbiased estimates are available, which is typically not the case in situations in which penalized estimates are used.

Multiclass logistic regression models aren't supported.

Syntax

ML.ADVANCED_WEIGHTS(
  MODEL `project_id.dataset.model`,
  STRUCT(
    [standardize AS standardize]))

Arguments

ML.ADVANCED_WEIGHTS takes the following arguments:

  • project_id: Your project ID.
  • dataset: The BigQuery dataset that contains the model.
  • model: The name of the model.
  • standardize: a BOOL value that specifies whether the model weights should be standardized to assume that all features have a mean of zero and a standard deviation of one. Standardizing the weights allows the absolute magnitude of the weights to be compared to each other. The default value is FALSE.

Output

ML.ADVANCED_WEIGHTS returns the following columns:

  • processed_input: a STRING value that contains the name of the feature column. The value of this column is the name of the feature column that's provided in the query_statement clause used during model training. If the feature is non-numeric, then there are multiple rows with the same processed_input value, one for each category of the feature.
  • category: a STRING value that contains the category name if the column identified in the processed_input value is non-numeric. Returns a NULL value for numeric columns.
  • weight: a FLOAT64 value that contains the weight of each feature.
  • standard_error: a FLOAT64 value that contains the standard error of the weight.
  • p_value: a FLOAT64 value that contains the p-value that was tested against the null hypothesis. The p-value for feature $j$ is calculated using the following formula:

    $$ p(j) = 2 * (1 - stats.norm.cdf(abs(\hat\beta_j), loc=0, scale=\sigma_j)) $$

    such that $\hat\beta_j$ is the weight of feature $j$ after training and $\sigma_j$ is its standard error.

If the TRANSFORM clause was used in the CREATE MODEL statement that created the model, ML.ADVANCED_WEIGHTS outputs the weights of the TRANSFORM output features. The weights are denormalized by default, with the option to get normalized weights, exactly like models that are created without TRANSFORM.

Permissions

You must have the bigquery.models.create andbigquery.models.getData Identity and Access Management (IAM) permissions in order to run ML.ADVANCED_WEIGHTS.

Limitations

The total cardinality of training features must be less than 1,000. This limitation is the result of the limitations of computing p-values and standard error when you set the CALCULATE_P_VALUES option to TRUE when training the model.

Examples

The following examples demonstrate ML.ADVANCED_WEIGHTS with and without standardization.

Without standardization

The following example retrieves weight information from mymodel in mydataset where the dataset is in your default project.

The query returns the weights associated with each one-hot encoded category for the input column input_col.

SELECT
  *
FROM
  ML.ADVANCED_WEIGHTS(MODEL `mydataset.mymodel`,
    STRUCT(FALSE AS standardize))

With standardization

The following example retrieves weight information from mymodel in mydataset. The dataset is in your default project.

The query retrieves standardized weights, which assume all features have a mean of 0 and a standard deviation of 1.0.

SELECT
  *
FROM
  ML.ADVANCED_WEIGHTS(MODEL `mydataset.mymodel`,
    STRUCT(TRUE AS standardize))

What's next