The ML.GET_INSIGHTS function
This document describes the ML.GET_INSIGHTS
function, which you can use to
retrieve information about changes to key metrics in your multi-dimensional data
from a
contribution analysis model.
You can use a
CREATE MODEL
statement
to create a contribution analysis model in BigQuery.
Syntax
ML.GET_INSIGHTS( MODEL `project_id.dataset.model_name` )
Arguments
ML.GET_INSIGHTS
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the contribution analysis model.
Output
Some of the ML.GET_INSIGHTS
output columns contain metrics that compare the
values for a given segment in either the test or control dataset against the
values for the population, which is all segments in the same dataset. The
metric values calculated for the entire population except for the given segment
are referred to as ambient values.
Output for summable metric contribution analysis models
ML.GET_INSIGHTS
returns the following output columns for contribution
analysis models that use
summable metrics, in
addition to any input data columns specified in the query_statement
of the
contribution analysis model:
contributors
: anARRAY<STRING>
value that contains the dimension values for a given segment. The other output metrics that are returned in the same row apply to the segment described by these dimensions.metric_test
: aNUMERIC
value that contains the sum of the value of the metric column in the test dataset for the given segment. The metric column is specified in theCONTRIBUTION_METRIC
option of the contribution analysis model.metric_control
: aNUMERIC
value that contains the sum of the value of the metric column in the control dataset for the given segment. The metric column is specified in theCONTRIBUTION_METRIC
option of the contribution analysis model.difference
: aNUMERIC
value that contains the difference between themetric_test
andmetric_control
values, calculated asmetric_test - metric_control
.relative_difference
: aNUMERIC
value that contains the relative change in the segment value between the test and control datasets, calculated asdifference / metric_control
.unexpected_difference
: aNUMERIC
value that contains the unexpected difference between the segment's actualmetric_test
value and the segment's expectedmetric_test
value, which is determined by comparing the ratio of change for this segment against the ambient ratio of change. Theunexpected_difference
value is calculated as follows:Determine the
metric_test
value for all segments except the given segment, referred to here asambient_test_change
:ambient_test_change = sum(metric_test for the population) - metric_test
Determine the
metric_control
value for all segments except the given segment, referred to here asambient_control_change
:ambient_control_change = sum(metric_control for the population) - metric_control
Determine the ratio between the
ambient_test_change
andambient_control_change
values, referred to here asambient_change_ratio
:ambient_change_ratio = ambient_test_change / ambient_control_change
Determine the expected
metric_test
value for the given segment, referred to here asexpected_metric_test
:expected_metric_test = metric_control * ambient_change_ratio
Determine the
unexpected_difference
value:unexpected_difference = metric_test - expected_metric_test
relative_unexpected_difference
: aNUMERIC
value that contains the ratio between theunexpected_difference
value and theexpected_metric_test
value, calculated asunexpected_difference / expected_metric_test
. You can use therelative_unexpected_difference
value to determine if the change to this segment is smaller than expected compared to the change in all of the other segments.apriori_support
: aNUMERIC
value that contains the apriori support value for the segment. The apriori support value is either the ratio between themetric_test
value for the segment and themetric_test
value for the population, or the ratio between themetric_control
value for the segment and themetric_control
value for the population, whichever is greater. The calculation is expressed asGREATEST((metric_test/ sum(metric_test for the population),(metric_control/ sum(metric_control for the population))
. If theapriori_support
value is less theMIN_APRIORI_SUPPORT
option value specified in the model, then the segment is considered too small to be of interest and is excluded by the model.
You might find it useful to order the output by the unexpected_difference
column, in order to quickly determine the contributors associated with the
largest differences in your data between the test and control sets.
Output for summable ratio metric contribution analysis models
ML.GET_INSIGHTS
returns the following output columns for contribution
analysis models that use
summable ratio metrics, in
addition to any input data columns specified in the query_statement
of the
contribution analysis model:
contributors
: anARRAY<STRING>
value that contains the dimension values for a given segment. The other output metrics that are returned in the same row apply to the segment described by these dimensions.ratio_test
: aNUMERIC
value that contains the ratio between the two metrics that you are evaluating, in the test dataset for the given metric. These two metrics are specified in theCONTRIBUTION_METRIC
option of the contribution analysis model. Theratio_test
value is calculated assum(numerator_metric_column_name)/sum(denominator_metric_column_name)
.ratio_control
: aNUMERIC
value that contains the ratio between the two metrics that you are evaluating, in the control dataset for the given metric. These two metrics are specified in theCONTRIBUTION_METRIC
option of the contribution analysis model. Theratio_control
value is calculated assum(numerator_metric_column_name)/sum(denominator_metric_column_name)
.regional_relative_ratio
: aNUMERIC
value that contains the ratio between theratio_test
value and theratio_control
value, calculated asratio_test / ratio_control
.ambient_relative_ratio_test
: aNUMERIC
value that contains the ratio between theratio_test
value for this segment and the ambientratio_test
value, calculated asratio_test / sum(ratio_test for the population)
. You can use theambient_relative_ratio_test
value to compare the size of this segment to the size the other segments.For example, consider the following table of test data:
dim1
dim2
dim3
metric_a
metric_b
1
10
20
50
100
1
15
30
100
200
5
20
40
1
10
Assume that the
CONTRIBUTION_METRIC
value issum(metric_a)/sum(metric_b)
. Using the data in the preceding table, themetric_a
value for the population is151
, while themetric_b
value is310
. Theambient_relative_ratio_test
value for the first segment in the table is calculated as(50/100)/(101/210) = .50/.48 = 1.03
. Thisambient_relative_ratio_test
value indicates that the size of this segment is fairly close to the size of all of the other segments combined. Alternatively, theambient_relative_ratio_test
value for the last segment in the table is calculated as(1/10)/(150/300) = .10/.50 = 0.2
. Thisambient_relative_ratio_test
value indicates that the size of this segment is smaller than the combined size of the rest of the segments.ambient_relative_ratio_control
: aNUMERIC
value that contains the ratio between theratio_control
value for this segment and the ambientratio_control
value, calculated asratio_control / sum(ratio_control for the population)
. You can use theambient_relative_ratio_control
value to compare the size of this segment to the size the other segments.aumann_shapley_attribution
: aNUMERIC
value that contains the Aumann-Shapley value for the this segment. The Aumann-Shapley value measures the contribution of the segment ratio relative to the population ratio. You can use the Aumann-Shapley value to determine how much a feature contributes to the prediction value. In the context of contribution analysis, BigQuery ML uses the Aumann-Shapley value to measure the attribution of the segment relative to the population. When calculating this measurement, the service considers the segment ratio changes and the ambient population changes between the test and control datasets.apriori_support
: aNUMERIC
value that contains the apriori support value for the segment. The apriori support value is calculated using the numerator column specified in the model'sCONTRIBUTION_METRIC
option. The calculation is expressed asnumerator column value for the given segment / sum(numerator column value for the population)
. If theapriori_support
value is less theMIN_APRIORI_SUPPORT
option value specified in the model, then the segment is considered too small to be of interest and is excluded by the model.
You might find it useful to order the output by the aumann_shapley_attribution
column, in order to quickly determine the contributors associated with the
largest differences in your data between the test and control sets.
What's next
Get data insights from a contribution analysis model.