The ML.ADVANCED_WEIGHTS function

ML.ADVANCED_WEIGHTS function

The ML.ADVANCED_WEIGHTS function lets you to see the underlying weights used by a linear or binary logisitic regression model during prediction with the associated p-values and standard errors for that weight. ML.ADVANCED_WEIGHTS can only be used if the model was trained with calculate_p_values=TRUE and category_encoding_method=DUMMY_ENCODING. ML.ADVANCED_WEIGHTS is an extended version of ML.WEIGHTS for linear and binary logistic regression models.

For information about supported model types of each SQL statement and function, and all supported SQL statements and functions for each model type, read End-to-end user journey for each model.

ML.ADVANCED_WEIGHTS syntax

In the following example syntax, standardize is an optional parameter that determines whether the model weights should be standardized to assume that all features have a mean of zero and a standard deviation of one. Standardizing the weights allows the absolute magnitude of the weights to be compared to each other. The default value is false. The value that is supplied must be the only field in a STRUCT.

ML.ADVANCED_WEIGHTS(MODEL `project-id.dataset.model`)
          [, STRUCT(<T> as standardize)])

Replace the following:

  • project-id: Your project ID.
  • dataset: The BigQuery dataset that contains the model.
  • model: The name of the model.

ML.ADVANCED_WEIGHTS output

ML.ADVANCED_WEIGHTS returns the following columns:

  • processed_input — The name of the model feature input. The value of this column matches the name of the column in the SELECT statement used during training. If the feature is non-numeric, then there will be multiple rows with the same processed_input string, one for each category of the feature.
  • category — The category name if the input column is non-numeric. NULL for numeric columns.
  • weight — The weight of each feature.
  • standard_error — The standard error of the weight.
  • p_value — The p-value that was tested against the null hypothesis. The p-value for feature $j$ is calculated using the following formula:
    \( p(j) = 2 * (1 - stats.norm.cdf(abs(\hat\beta_j), loc=0, scale=\sigma_j)) \)
    such that $\hat\beta_j$ is the weight of feature $j$ after training and $\sigma_j$ is its standard error.

If the TRANSFORM clause was present in the CREATE MODEL statement that created model, ML.ADVANCED_WEIGHTS outputs the weights of TRANSFORM output features. The weights are denormalized by default, with the option to get normalized weights, exactly like models that are created without TRANSFORM.

ML.ADVANCED_WEIGHTS permissions

Both bigquery.models.create and bigquery.models.getData are required to run ML.ADVANCED_WEIGHTS.

ML.ADVANCED_WEIGHTS syntax examples

The following examples demonstrate ML.ADVANCED_WEIGHTS with and without standardization.

ML.ADVANCED_WEIGHTS without standardization

The following example retrieves weight information from mymodel in mydataset where the dataset is in your default project.

The query returns the weights associated with each one-hot encoded category for the input column input_col.

SELECT
  *
FROM
  ML.ADVANCED_WEIGHTS(MODEL `mydataset.mymodel`,
    STRUCT(false AS standardize))

ML.ADVANCED_WEIGHTS with standardization

The following example retrieves weight information from mymodel in mydataset. The dataset is in your default project.

The query retrieves standardized weights, which assume all features have a mean of zero and a standard deviation of one.

SELECT
  *
FROM
  ML.ADVANCED_WEIGHTS(MODEL `mydataset.mymodel`,
    STRUCT(true AS standardize))

ML.ADVANCED_WEIGHTS usage requirements

ML.ADVANCED_WEIGHTS can only be used on linear and binary logisitc regression models. Multiclass logistic regression models are not supported. ML.ADVANCED_WEIGHTS requires several machine learning options.

  • l1_reg must be zero.

    • The standard errors or the p-values for either the regression coefficients or other etimated quantities for these penalized regression are common requirements. In principle, such standard errors can be calculated, e.g. using the bootstrap. In practice this is not done for reasons the authors of the R package explained as follows:

    [S]tandard errors are not very meaningful for strongly biased estimates such as arise from penalized estimation methods. Penalized estimation is a procedure that reduces the variance of estimators by introducing substantial bias. The bias of each estimator is therefore a major component of its mean squared error, whereas its variance may contribute only a small part. Unfortunately, in most applications of penalized regression it is impossible to obtain a sufficiently precise estimate of the bias. Any bootstrap-based calculations can only give an assessment of the variance of the estimates. Reliable estimates of the bias are only available if reliable unbiased estimates are available, which is typically not the case in situations in which penalized estimates are used.

  • CATEGORY_ENCODING_METHOD must be set to DUMMY_ENCODING.

ML.ADVANCED_WEIGHTS limitations

The limitations of ML.ADVANCED_WEIGHTS are the result of the limitations of computing p-values and standard error when calculate_p_values=TRUE is set for training:

  • Total cardinality of training features must be less than 1,000.