AML output data model

This page describes the AML output data model. AML outputs are sent to BigQuery.

Prediction outputs

Prediction outputs include risk scores and explainability and are generated when you create a PredictionResult resource. For more information, see Understand prediction outputs.

Risk scores

Risk scores are written to the BigQuery table specified in the outputs.predictionDestination field.

Column	Type	Description
`party_id`	`STRING`	Unique party ID string
`risk_period_end_time`	`TIMESTAMP`	The end of the target period, in the timezone of the dataset
`risk_score`	`FLOAT64`	Prediction value. Between 0 and 1. Higher score means higher risk.

Explainability

Explainability is written to the BigQuery table specified in the outputs.explainabilityDestination field.

Column	Type	Description
`party_id`	`STRING`	Unique party ID string
`risk_period_end_time`	`TIMESTAMP`	The end of the target period, in the timezone of the dataset
`attributions`	`STRUCT`	(repeated) Record of feature families and their attribution value
`attributions.feature`	`STRING`	Name of feature family
`attributions.attribution`	`FLOAT64`	Feature family's attribution score

Exported registered parties

The following registered parties information is exported from an instance to the BigQuery table specified in the dataset field.

Column	Type	Description
`party_id`	`STRING`	Unique identifier of the party in the instance's datasets
`party_size`	`STRING`	Specifies the tier for commercial customers (large versus small). This field does not apply to retail customers. `NULL` for all retail customers `SMALL` for small commercial parties with less than 500 average monthly transactions `LARGE` for large commercial parties with greater than or equal to 500 average monthly transactions All values are case sensitive.
`earliest_remove_time`	`STRING`	The earliest time at which the party can be removed
`party_with_prediction_intent`	`STRING`	The indicator that suggests if a party has been predicted on since the registration
`registration_or_uptier_time`	`STRING`	The time at which the party was registered or uptiered

Exported metadata

Exported metadata varies based on the AML AI resource.

Engine config

The following metadata is output from an engine config.

Column	Type	Description
`resource_type`	`STRING`	Type of AML AI resource, such as an engine config or prediction results
`resource_id`	`STRING`	Name of the resource
`name`	`STRING`	Name of the metadata entry, such as a metric (see the following table)
`value`	`JSON`	Value of the metadata entry

Metric name Metric description Example metric value

ExpectedRecallPreTuning

Metric name	Metric description	Example metric value
ExpectedRecallPreTuning	Recall metric measured on a test set when using default hyperparameters of the engine version. This recall measurement assumes the number of investigations per month specified in `partyInvestigationsPerPeriodHint`.	{ "recallValues": [ { "partyInvestigationsPerPeriod": 5000, "recallValue": 0.72, "scoreThreshold": 0.42, }, ], }
ExpectedRecallPostTuning	Recall metric measured on a test set when using tuned hyperparameters. This recall measurement assumes the number of investigations per month specified in `partyInvestigationsPerPeriodHint`.	{ "recallValues": [ { "partyInvestigationsPerPeriod": 5000, "recallValue": 0.80, "scoreThreshold": 0.43, }, ], }
Missingness	Share of missing values across all features in each feature family. Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration. A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.	{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "missingnessValue": 0.00, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "missingnessValue": 0.45, }, ], }

Recall metric measured on a test set when using default hyperparameters of the engine version.

This recall measurement assumes the number of investigations per month specified in partyInvestigationsPerPeriodHint.

{
  "recallValues": [
    {
      "partyInvestigationsPerPeriod": 5000,
      "recallValue": 0.72,
      "scoreThreshold": 0.42,
    },
  ],
}

ExpectedRecallPostTuning

Recall metric measured on a test set when using tuned hyperparameters.

This recall measurement assumes the number of investigations per month specified in partyInvestigationsPerPeriodHint.

{
  "recallValues": [
    {
      "partyInvestigationsPerPeriod": 5000,
      "recallValue": 0.80,
      "scoreThreshold": 0.43,
    },
  ],
}

Missingness

Share of missing values across all features in each feature family.

Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration.

A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "missingnessValue": 0.00,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "missingnessValue": 0.45,
    },
  ],
}

Model

The following metadata is output from a model.

Column	Type	Description
`resource_type`	`STRING`	Type of AML AI resource, such as an engine config or prediction results
`resource_id`	`STRING`	Name of the resource
`name`	`STRING`	Name of the metadata entry, such as a metric (see the following table)
`value`	`JSON`	Value of the metadata entry

Metric name

Metric description

Example metric value

Missingness

Share of missing values across all features in each feature family.

Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration.

A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "missingnessValue": 0.00,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "missingnessValue": 0.45,
    },
  ],
}

Importance

A metric that shows the importance of a feature family to the model. Higher values indicate more significant use of the feature family in the model. A feature family that is not used in the model has zero importance.

Importance values can be used when prioritizing acting on family skew results. For example, the same skew value for a family with higher importance to the model is more urgent to resolve.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "importanceValue": 459761000000,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "importanceValue": 27492,
    },
  ],
}

Backtest results

The following metadata is output from backtest results.

Column	Type	Description
`resource_type`	`STRING`	Type of AML AI resource, such as an engine config or prediction results
`resource_id`	`STRING`	Name of the resource
`name`	`STRING`	Name of the metadata entry, such as a metric (see the following table)
`value`	`JSON`	Value of the metadata entry

Metric name Metric description Example metric value

ObservedRecallValues

Metric name	Metric description	Example metric value
ObservedRecallValues	Recall metric measured on the dataset specified for backtesting. The API includes 20 of these measurements, at different operating points, evenly distributed from 0 (not included) until 2 * `partyInvestigationsPerPeriodHint`. The API adds a final recall measurement at `partyInvestigationsPerPeriodHint`.	{ "recallValues": [ { "partyInvestigationsPerPeriod": 5000, "recallValue": 0.80, "scoreThreshold": 0.42, }, ... ... { "partyInvestigationsPerPeriod": 8000, "recallValue": 0.85, "scoreThreshold": 0.30, }, ], }
Missingness	Share of missing values across all features in each feature family. Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration. A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.	{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "missingnessValue": 0.00, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "missingnessValue": 0.45, }, ], }
Skew	Metrics showing skew between training and prediction or backtest datasets. Family skew indicates changes in the distribution of feature values within a feature family, weighted by importance of the feature within that family. Max skew indicates the maximum skew of any feature within that family. Skew values range from 0, representing no significant change in the distribution of values of features in the family, to 1 for the most significant change. A large value for either family skew or max skew indicates a significant change in the structure of your data in a way that may impact model performance. Family skew takes the value -1 when no features in the family are used by the model. For large skew values, you should do one of the following: Investigate changes in the data used by that feature family (see model governance support materials) and fix any input data issues Retrain a model on more recent data You should set thresholds for acting on family and max skew values based on observing the natural variation in skew metrics over several months.	{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "familySkewValue": 0.10, "maxSkewValue": 0.14, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "familySkewValue": 0.11, "maxSkewValue": 0.11, }, ], }

Recall metric measured on the dataset specified for backtesting. The API includes 20 of these measurements, at different operating points, evenly distributed from 0 (not included) until 2 * partyInvestigationsPerPeriodHint. The API adds a final recall measurement at partyInvestigationsPerPeriodHint.

{
  "recallValues": [
    {
      "partyInvestigationsPerPeriod": 5000,
      "recallValue": 0.80,
      "scoreThreshold": 0.42,
    },
    ...
    ...
    {
      "partyInvestigationsPerPeriod": 8000,
      "recallValue": 0.85,
      "scoreThreshold": 0.30,
    },
  ],
}

Missingness

Share of missing values across all features in each feature family.

Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration.

A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "missingnessValue": 0.00,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "missingnessValue": 0.45,
    },
  ],
}

Skew

Metrics showing skew between training and prediction or backtest datasets. Family skew indicates changes in the distribution of feature values within a feature family, weighted by importance of the feature within that family. Max skew indicates the maximum skew of any feature within that family.

Skew values range from 0, representing no significant change in the distribution of values of features in the family, to 1 for the most significant change. A large value for either family skew or max skew indicates a significant change in the structure of your data in a way that may impact model performance. Family skew takes the value -1 when no features in the family are used by the model.

For large skew values, you should do one of the following:

Investigate changes in the data used by that feature family (see model governance support materials) and fix any input data issues
Retrain a model on more recent data

You should set thresholds for acting on family and max skew values based on observing the natural variation in skew metrics over several months.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "familySkewValue": 0.10,
      "maxSkewValue": 0.14,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "familySkewValue": 0.11,
      "maxSkewValue": 0.11,
    },
  ],
}

Prediction results

The following metadata is output from prediction results.

Column	Type	Description
`resource_type`	`STRING`	Type of AML AI resource, such as an engine config or prediction results
`resource_id`	`STRING`	Name of the resource
`name`	`STRING`	Name of the metadata entry, such as a metric (see the following table)
`value`	`JSON`	Value of the metadata entry

Metric name	Metric description	Example metric value
Missingness	Share of missing values across all features in each feature family. Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration. A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.	{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "missingnessValue": 0.00, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "missingnessValue": 0.45, }, ], }
Skew	Metrics showing skew between training and prediction or backtest datasets. Family skew indicates changes in the distribution of feature values within a feature family, weighted by importance of the feature within that family. Max skew indicates the maximum skew of any feature within that family. Skew values range from 0, representing no significant change in the distribution of values of features in the family, to 1 for the most significant change. A large value for either family skew or max skew indicates a significant change in the structure of your data in a way that may impact model performance. Family skew takes the value -1 when no features in the family are used by the model. For large skew values, you should do one of the following: Investigate changes in the data used by that feature family (see model governance support materials) and fix any input data issues Retrain a model on more recent data You should set thresholds for acting on family and max skew values based on observing the natural variation in skew metrics over several months.	{ "featureFamilies": [ { "featureFamily": "unusual_wire_credit_activity", "familySkewValue": 0.10, "maxSkewValue": 0.14, }, ... ... { "featureFamily": "party_supplementary_data_id_3", "familySkewValue": 0.11, "maxSkewValue": 0.11, }, ], }