BigQuery explainable AI overview


Explainable artificial intelligence (XAI) helps you understand the results that your predictive machine-learning model generates for classification and regression tasks by defining how each feature in a row of data contributed to the predicted result. This is often referred to as feature attribution. This information can be used to verify that the model is behaving as expected, to recognize biases in your models, and to inform ways to improve your model and your training data.

For information about the supported model types of each SQL statement and function, and all of the supported SQL statements and functions for each model type, read the End-to-end user journey for each model.

Local vs. global explainability

Explainability encompasses two types: local and global explainability. These are also known respectively as local and global feature importance.

  • Local explainability : Returns feature attribution values for each explained example. These values describe how much a particular feature affected the prediction relative to the baseline prediction.
  • Global explainability : Returns the feature's overall influence on the model, often obtained by aggregating the feature attributions over the entire dataset. A higher absolute value indicates the feature had a greater influence on the model's predictions.

Explainable AI offerings in BigQuery ML

Explainable AI in BigQuery ML supports a variety of machine learning models, including both time series and non-time series models. Each of the models takes advantage of a different explainability method.

If you want to use Explainable AI on BigQuery ML models you've registered to the Vertex AI Model Registry, there is a separate set of requirements to follow. To learn more, see Apply Explainable AI on BigQuery ML models.

Model category Model types Explainability method Basic explanation of the method Local explain functions Global explain functions
Supervised models
Linear & Logistic Regression Shapley values Shapley values for linear models are equal to "model weight * feature value", where feature values are standardized and model weights are trained with the standardized feature values. ML_EXPLAIN_PREDICT1 ML.GLOBAL_EXPLAIN2
Standard Errors and P-values Standard errors and p-values are used for significance testing against the model weights. NA ML.ADVANCED_WEIGHTS4
Boosted trees

Random forest
Tree SHAP Tree SHAP is an algorithm to compute exact SHAP values for decision trees based models. ML_EXPLAIN_PREDICT1 ML.GLOBAL_EXPLAIN2
Gini Index based feature importance Global feature importance score that indicates how useful or valuable each feature was in the construction of the boosted tree or random forest model during training. NA ML.FEATURE_IMPORTANCE
Deep Neural Network (DNN) Integrated gradients A gradients-based method that efficiently computes feature attributions with the same axiomatic properties as the Shapley value. It provides a sampling approximation of exact feature attributions. Its accuracy is controlled by the num_integral_steps parameter. ML_EXPLAIN_PREDICT1 ML.GLOBAL_EXPLAIN2
Time series models ARIMA_PLUS Time series decomposition Decompose the time series into multiple components if they present in the time series. The components include trend, seasonal, holiday, step changes, spike and dips. See ARIMA_PLUS modeling pipeline for more details. ML.EXPLAIN_FORECAST3 NA

1ML_EXPLAIN_PREDICT is an extended version of ML.PREDICT.

2ML.GLOBAL_EXPLAIN returns the global explainability obtained by taking the mean absolute attribution that each feature receives for all the rows in the evaluation dataset.

3ML.EXPLAIN_FORECAST is an extended version of ML.FORECAST.

4ML.ADVANCED_WEIGHTS is an extended version of ML.WEIGHTS.