The ML.FEATURE_IMPORTANCE function
This document describes the ML.FEATURE_IMPORTANCE
function, which lets you
see the feature importance score. This score indicates how useful or valuable
each feature was in the construction of a boosted tree or a random forest model
during training. For more information, see the
feature_importances
property
in the XGBoost library.
Syntax
ML.FEATURE_IMPORTANCE(MODEL `project_id.dataset.model`)
Arguments
ML.FEATURE_IMPORTANCE
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.
Output
ML.FEATURE_IMPORTANCE
returns the following columns:
feature
: aSTRING
value that contains the name of the feature column in the input training data.importance_weight
: aFLOAT64
value that contains the number of times a feature is used to split the data across all trees.importance_gain
: aFLOAT64
value that contains the average gain across all splits the feature is used in.importance_cover
: aFLOAT64
value that contains the average coverage across all splits the feature is used in.
If the TRANSFORM
clause
was used in the CREATE MODEL
statement that created the model,
ML.FEATURE_IMPORTANCE
returns the information of the pre-transform columns
from the query_statement
clause of the CREATE MODEL
statement.
Permissions
You must have the bigquery.models.create
and bigquery.models.getData
Identity and Access Management (IAM) permissions
in order to run ML.FEATURE_IMPORTANCE
.
Limitations
ML.FEATURE_IMPORTANCE
is only supported with
boosted tree models
and random forest models.
Example
This example retrieves feature importance from mymodel
in
mydataset
. The dataset is in your default project.
SELECT * FROM ML.FEATURE_IMPORTANCE(MODEL `mydataset.mymodel`)
What's next
- For information about Explainable AI, see BigQuery Explainable AI overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.