The ML.WEIGHTS function

ML.WEIGHTS function

The ML.WEIGHTS function allows you to see the underlying weights used by a model during prediction.

ML.WEIGHTS returns the following columns:

  • processed_input — The name of the model feature input. The value of this column matches the name of the column in the SELECT statement used during training.
  • weight — The weight of each feature. For numerical columns, weight contains a value and the category_weights column is NULL. For non-numeric columns that are converted to one-hot encoding, the weight column is NULL and the category_weights column is an ARRAY of category names and weights for each category.
  • category_weights.category — The category name if the input column is non-numeric.
  • category_weights.weight — The category's weight if the input column is non-numeric.
  • class_label — For multiclass models, class_label is the label for a given weight. The output includes one row per <class_label, processed_input> combination.

If the TRANSFORM clause was present in the CREATE MODEL statement that created model, ML.WEIGHTS outputs the weights of TRANSFORM output features. The weights are denormalized by default, with the option to get normalized weights, exactly like models that are created without TRANSFORM.

processed_input results for TIMESTAMPs

When BigQuery ML encounters a TIMESTAMP column, it extracts a set of components from the TIMESTAMP and performs a mix of Standardization and One-Hot Encoding to the extracted components. You can see the transformation results in the processed_input column when you use the ML.WEIGHTS function.

The following table shows the components extracted from TIMESTAMPs and the corresponding transformation method. For processed_input values, [COLUMN_NAME] is the name of the TIMESTAMP column.

TIMESTAMP component processed_input result Transformation method
Unix time in seconds [COLUMN_NAME] Standardization
Day of month _TS_DOM_[COLUMN_NAME] One-hot encoding
Day of week _TS_DOW_[COLUMN_NAME] One-hot encoding
Month of year _TS_MOY_[COLUMN_NAME] One-hot encoding
Hour of day _TS_HOD_[COLUMN_NAME] One-hot encoding
Minute of hour _TS_MOH_[COLUMN_NAME] One-hot encoding
Week of year (weeks begin on Sunday) _TS_WOY_[COLUMN_NAME] One-hot encoding
Year _TS_YEAR_[COLUMN_NAME] One-hot encoding

ML.WEIGHTS syntax

ML.WEIGHTS(MODEL `project_id.dataset.model`)
          [, STRUCT(<T> as standardize)])

Where:

  • project_id is your project ID.
  • dataset is the BigQuery dataset that contains the model.
  • model is the name of the model.
  • standardize is an optional parameter that determines whether the model weights should be standardized to assume all features have a mean of zero and a standard deviation of one. Standardizing the weights allows the absolute magnitude of the weights to be compared to each other. The default value is false. The value that is supplied must be the only field in a STRUCT.

ML.WEIGHTS examples

ML.WEIGHTS without standardization

The following example retrieves weight information from mymodel in mydataset. The dataset is in your default project.

The query returns the weights associated with each one-hot encoded category for the input column input_col.

SELECT
  category,
  weight
FROM
  UNNEST((
    SELECT
      category_weights
    FROM
      ML.WEIGHTS(MODEL `mydataset.mymodel`)
    WHERE
      processed_input = 'input_col'))

This command uses the UNNEST function because the category_weights column is a nested repeated column.

ML.WEIGHTS with standardization

The following example retrieves weight information from mymodel in mydataset. The dataset is in your default project.

The query retrieves standardized weights, which assume all features have a mean of zero and a standard deviation of one.

SELECT
  *
FROM
  ML.WEIGHTS(MODEL `mydataset.mymodel`,
    STRUCT(true AS standardize))

ML.WEIGHTS limitations

The ML.WEIGHTS function is subject to the following limitations:

Bu sayfayı yararlı buldunuz mu? Lütfen görüşünüzü bildirin:

Şunun hakkında geri bildirim gönderin...