BigQuery Explainable AI overview
This document describes how BigQuery ML supports Explainable artificial intelligence (AI), sometimes called XAI.
Explainable AI helps you understand the results that your predictive machine learning model generates for classification and regression tasks by defining how each feature in a row of data contributed to the predicted result. This information is often referred to as feature attribution. You can use this information to verify that the model is behaving as expected, to recognize biases in your models, and to inform ways to improve your model and your training data.
For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.
Local versus global explainability
There are two types of explainability: local explainability and global explainability. These are also known respectively as local feature importance and global feature importance.
- Local explainability returns feature attribution values for each explained example. These values describe how much a particular feature affected the prediction relative to the baseline prediction.
- Global explainability returns the feature's overall influence on the model and is often obtained by aggregating the feature attributions over the entire dataset. A higher absolute value indicates the feature had a greater influence on the model's predictions.
Explainable AI offerings in BigQuery ML
Explainable AI in BigQuery ML supports a variety of machine learning models, including both time series and non-time series models. Each of the models takes advantage of a different explainability method.
If you want to use Explainable AI on BigQuery ML models you've registered to the Model Registry, there are separate requirements to follow. To learn more, see Apply Explainable AI on BigQuery ML models.
Model category | Model types | Explainability method | Basic explanation of the method | Local explain functions | Global explain functions |
---|---|---|---|---|---|
Supervised models | Linear & Logistic Regression | Shapley values | Shapley values for linear models are equal to model weight * feature
value , where feature values are standardized and model weights are
trained with the standardized feature values. |
ML.EXPLAIN_PREDICT 1 |
ML.GLOBAL_EXPLAIN 2 |
Standard Errors and P-values | Standard errors and p-values are used for significance testing against the model weights. | NA | ML.ADVANCED_WEIGHTS 4 |
||
Boosted trees Random forest |
Tree SHAP | Tree SHAP is an algorithm to compute exact SHAP values for decision tree-based models. | ML.EXPLAIN_PREDICT 1 |
ML.GLOBAL_EXPLAIN 2 |
|
Approximate Feature Contribution | Approximates the feature contribution values. It is faster and simpler compared to Tree SHAP. | ML.EXPLAIN_PREDICT 1 |
ML.GLOBAL_EXPLAIN 2 |
||
Gini Index-based feature importance | A global feature importance score that indicates how useful or valuable each feature was in the construction of the boosted tree or random forest model during training. | NA | ML.FEATURE_IMPORTANCE |
||
Deep Neural Network (DNN) Wide-and-Deep |
Integrated gradients | A gradients-based method that efficiently computes feature attributions
with the same axiomatic properties as the Shapley value. It provides a
sampling approximation of exact feature attributions. Its accuracy is
controlled by the integrated_gradients_num_steps
parameter. |
ML.EXPLAIN_PREDICT 1 |
ML.GLOBAL_EXPLAIN 2 |
|
AutoML Tables | Sampled Shapley | Sampled Shapley assigns credit for the model's outcome to each feature, and considers different permutations of the features. This method provides a sampling approximation of exact Shapley values. | NA | ML.GLOBAL_EXPLAIN 2 |
|
Time series models | ARIMA_PLUS | Time series decomposition | Decomposes the time series into multiple components if those components are present in the time series. The components include trend, seasonal, holiday, step changes, and spike and dips. See ARIMA_PLUS modeling pipeline for more details. | ML.EXPLAIN_FORECAST 3 |
NA |
ARIMA_PLUS_XREG | Time series decomposition and Shapley values |
Decomposes the time series into multiple components, including trend, seasonal, holiday, step changes, and spike and dips
(similar to ARIMA_PLUS).
Attribution of each external regressor is calculated based on
Shapley Values, which is equal to model weight * feature value . |
ML.EXPLAIN_FORECAST 3 |
NA |
1ML_EXPLAIN_PREDICT
is an extended version of ML.PREDICT
.
2ML.GLOBAL_EXPLAIN
returns the global explainability
obtained by taking the mean absolute attribution that each feature receives for
all the rows in the evaluation dataset.
3ML.EXPLAIN_FORECAST
is an extended version of ML.FORECAST
.
4ML.ADVANCED_WEIGHTS
is an extended version of ML.WEIGHTS
.