This page describes the AML output data model. AML outputs are sent to BigQuery.
Prediction outputs
Prediction outputs include risk scores and explainability and are generated when you create a PredictionResult resource. For more information, see Understand prediction outputs.
Risk scores
Risk scores are written to the BigQuery table specified in the
outputs.predictionDestination
field.
Column | Type | Description |
---|---|---|
party_id |
STRING |
Unique party ID string |
risk_period_end_time |
TIMESTAMP |
The end of the target period, in the timezone of the dataset |
risk_score |
FLOAT64 |
Prediction value. Between 0 and 1. Higher score means higher risk. |
Explainability
Explainability is written to the BigQuery table specified in the
outputs.explainabilityDestination
field.
Column | Type | Description |
---|---|---|
party_id |
STRING |
Unique party ID string |
risk_period_end_time |
TIMESTAMP |
The end of the target period, in the timezone of the dataset |
attributions |
STRUCT |
(repeated) Record of feature families and their attribution value |
attributions.feature |
STRING |
Name of feature family |
attributions.attribution |
FLOAT64 |
Feature family's attribution score |
Exported registered parties
The following registered parties information is exported from an
instance
to the BigQuery table specified in the
dataset
field.
Column | Type | Description |
---|---|---|
party_id | STRING | Unique identifier of the party in the instance's datasets |
party_size | STRING |
Specifies the tier for commercial customers (large versus small). This field does not apply
to retail customers.
All values are case sensitive. |
earliest_remove_time | STRING | The earliest time at which the party can be removed |
party_with_prediction_intent | STRING | The indicator that suggests if a party has been predicted on since the registration |
registration_or_uptier_time | STRING | The time at which the party was registered or uptiered |
Exported metadata
Exported metadata varies based on the AML AI resource.
Engine config
The following metadata is output from an engine config.
Column | Type | Description |
---|---|---|
resource_type | STRING | Type of AML AI resource, such as an engine config or prediction results |
resource_id | STRING | Name of the resource |
name | STRING | Name of the metadata entry, such as a metric (see the following table) |
value | JSON | Value of the metadata entry |
Metric name | Metric description | Example metric value |
---|---|---|
ExpectedRecallPreTuning | Recall metric measured on a test set when using
default hyperparameters of the engine version.
This recall measurement assumes the number of investigations per month
specified in |
{ "recallValues": [ { "partyInvestigationsPerPeriod": 5000, "recallValue": 0.72, "scoreThreshold": 0.42, }, ], } |
ExpectedRecallPostTuning | Recall metric measured on a test set when using
tuned hyperparameters.
This recall measurement assumes the number of investigations per month
specified in |
{ "recallValues": [ { "partyInvestigationsPerPeriod": 5000, "recallValue": 0.80, "scoreThreshold": 0.43, }, ], } |
Missingness |
Share of missing values across all features in each feature family. Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration. A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used. |
{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "missingnessValue": 0.00, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "missingnessValue": 0.45, }, ], } |
Model
The following metadata is output from a model.
Column | Type | Description |
---|---|---|
resource_type | STRING | Type of AML AI resource, such as an engine config or prediction results |
resource_id | STRING | Name of the resource |
name | STRING | Name of the metadata entry, such as a metric (see the following table) |
value | JSON | Value of the metadata entry |
Metric name | Metric description | Example metric value |
---|---|---|
Missingness |
Share of missing values across all features in each feature family. Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration. A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used. |
{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "missingnessValue": 0.00, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "missingnessValue": 0.45, }, ], } |
Importance |
A metric that shows the importance of a feature family to the model. Higher values indicate more significant use of the feature family in the model. A feature family that is not used in the model has zero importance. Importance values can be used when prioritizing acting on family skew results. For example, the same skew value for a family with higher importance to the model is more urgent to resolve. |
{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "importanceValue": 459761000000, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "importanceValue": 27492, }, ], } |
Backtest results
The following metadata is output from backtest results.
Column | Type | Description |
---|---|---|
resource_type | STRING | Type of AML AI resource, such as an engine config or prediction results |
resource_id | STRING | Name of the resource |
name | STRING | Name of the metadata entry, such as a metric (see the following table) |
value | JSON | Value of the metadata entry |
Metric name | Metric description | Example metric value |
---|---|---|
ObservedRecallValues | Recall metric measured on the dataset specified for backtesting. The API
includes 20 of these measurements, at different operating points, evenly
distributed from 0 (not included) until 2 *
partyInvestigationsPerPeriodHint . The API adds a final recall
measurement at partyInvestigationsPerPeriodHint .
|
{ "recallValues": [ { "partyInvestigationsPerPeriod": 5000, "recallValue": 0.80, "scoreThreshold": 0.42, }, ... ... { "partyInvestigationsPerPeriod": 8000, "recallValue": 0.85, "scoreThreshold": 0.30, }, ], } |
Missingness |
Share of missing values across all features in each feature family. Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration. A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used. |
{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "missingnessValue": 0.00, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "missingnessValue": 0.45, }, ], } |
Skew |
Metrics showing skew between training and prediction or backtest datasets. Family skew indicates changes in the distribution of feature values within a feature family, weighted by importance of the feature within that family. Max skew indicates the maximum skew of any feature within that family. Skew values range from 0, representing no significant change in the distribution of values of features in the family, to 1 for the most significant change. A large value for either family skew or max skew indicates a significant change in the structure of your data in a way that may impact model performance. Family skew takes the value -1 when no features in the family are used by the model. For large skew values, you should do one of the following:
You should set thresholds for acting on family and max skew values based on observing the natural variation in skew metrics over several months. |
{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "familySkewValue": 0.10, "maxSkewValue": 0.14, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "familySkewValue": 0.11, "maxSkewValue": 0.11, }, ], } |
Prediction results
The following metadata is output from prediction results.
Column | Type | Description |
---|---|---|
resource_type | STRING | Type of AML AI resource, such as an engine config or prediction results |
resource_id | STRING | Name of the resource |
name | STRING | Name of the metadata entry, such as a metric (see the following table) |
value | JSON | Value of the metadata entry |
Metric name | Metric description | Example metric value |
---|---|---|
Missingness |
Share of missing values across all features in each feature family. Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration. A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used. |
{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "missingnessValue": 0.00, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "missingnessValue": 0.45, }, ], } |
Skew |
Metrics showing skew between training and prediction or backtest datasets. Family skew indicates changes in the distribution of feature values within a feature family, weighted by importance of the feature within that family. Max skew indicates the maximum skew of any feature within that family. Skew values range from 0, representing no significant change in the distribution of values of features in the family, to 1 for the most significant change. A large value for either family skew or max skew indicates a significant change in the structure of your data in a way that may impact model performance. Family skew takes the value -1 when no features in the family are used by the model. For large skew values, you should do one of the following:
You should set thresholds for acting on family and max skew values based on observing the natural variation in skew metrics over several months. |
{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "familySkewValue": 0.10, "maxSkewValue": 0.14, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "familySkewValue": 0.11, "maxSkewValue": 0.11, }, ], } |