AML output data model

This page describes the AML output data model. AML outputs are sent to BigQuery.

Prediction outputs

Prediction outputs include risk scores and explainability and are generated when you create a PredictionResult resource. For more information, see Understand prediction outputs.

Risk scores

Risk scores are written to the BigQuery table specified in the outputs.predictionDestination field.

Column Type Description
party_id STRING Unique party ID string
risk_period_end_time TIMESTAMP The end of the target period, in the timezone of the dataset
risk_score FLOAT64 Prediction value. Between 0 and 1. Higher score means higher risk.

Explainability

Explainability is written to the BigQuery table specified in the outputs.explainabilityDestination field.

Column Type Description
party_id STRING Unique party ID string
risk_period_end_time TIMESTAMP The end of the target period, in the timezone of the dataset
attributions STRUCT (repeated) Record of feature families and their attribution value
attributions.feature STRING Name of feature family
attributions.attribution FLOAT64 Feature family's attribution score

Exported registered parties

The following registered parties information is exported from an instance to the BigQuery table specified in the dataset field.

ColumnTypeDescription
party_idSTRINGUnique identifier of the party in the instance's datasets
party_sizeSTRING Specifies the tier for commercial customers (large versus small). This field does not apply to retail customers.
  • NULL for all retail customers
  • SMALL for small commercial parties with less than 500 average monthly transactions
  • LARGE for large commercial parties with greater than or equal to 500 average monthly transactions

All values are case sensitive.

earliest_remove_timeSTRINGThe earliest time at which the party can be removed
party_with_prediction_intentSTRINGThe indicator that suggests if a party has been predicted on since the registration
registration_or_uptier_timeSTRINGThe time at which the party was registered or uptiered

Exported metadata

Exported metadata varies based on the AML AI resource.

Engine config

The following metadata is output from an engine config.

ColumnTypeDescription
resource_typeSTRINGType of AML AI resource, such as an engine config or prediction results
resource_idSTRINGName of the resource
nameSTRINGName of the metadata entry, such as a metric (see the following table)
valueJSONValue of the metadata entry
Metric name Metric description Example metric value
ExpectedRecallPreTuning Recall metric measured on a test set when using default hyperparameters of the engine version.

This recall measurement assumes the number of investigations per month specified in partyInvestigationsPerPeriodHint.

{
  "recallValues": [
    {
      "partyInvestigationsPerPeriod": 5000,
      "recallValue": 0.72,
      "scoreThreshold": 0.42,
    },
  ],
}
ExpectedRecallPostTuning Recall metric measured on a test set when using tuned hyperparameters.

This recall measurement assumes the number of investigations per month specified in partyInvestigationsPerPeriodHint.

{
  "recallValues": [
    {
      "partyInvestigationsPerPeriod": 5000,
      "recallValue": 0.80,
      "scoreThreshold": 0.43,
    },
  ],
}
Missingness

Share of missing values across all features in each feature family.

Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration.

A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "missingnessValue": 0.00,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "missingnessValue": 0.45,
    },
  ],
}

Model

The following metadata is output from a model.

ColumnTypeDescription
resource_typeSTRINGType of AML AI resource, such as an engine config or prediction results
resource_idSTRINGName of the resource
nameSTRINGName of the metadata entry, such as a metric (see the following table)
valueJSONValue of the metadata entry
Metric name Metric description Example metric value
Missingness

Share of missing values across all features in each feature family.

Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration.

A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "missingnessValue": 0.00,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "missingnessValue": 0.45,
    },
  ],
}
Importance

A metric that shows the importance of a feature family to the model. Higher values indicate more significant use of the feature family in the model. A feature family that is not used in the model has zero importance.

Importance values can be used when prioritizing acting on family skew results. For example, the same skew value for a family with higher importance to the model is more urgent to resolve.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "importanceValue": 459761000000,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "importanceValue": 27492,
    },
  ],
}

Backtest results

The following metadata is output from backtest results.

ColumnTypeDescription
resource_typeSTRINGType of AML AI resource, such as an engine config or prediction results
resource_idSTRINGName of the resource
nameSTRINGName of the metadata entry, such as a metric (see the following table)
valueJSONValue of the metadata entry
Metric name Metric description Example metric value
ObservedRecallValues Recall metric measured on the dataset specified for backtesting. The API includes 20 of these measurements, at different operating points, evenly distributed from 0 (not included) until 2 * partyInvestigationsPerPeriodHint. The API adds a final recall measurement at partyInvestigationsPerPeriodHint.
{
  "recallValues": [
    {
      "partyInvestigationsPerPeriod": 5000,
      "recallValue": 0.80,
      "scoreThreshold": 0.42,
    },
    ...
    ...
    {
      "partyInvestigationsPerPeriod": 8000,
      "recallValue": 0.85,
      "scoreThreshold": 0.30,
    },
  ],
}
Missingness

Share of missing values across all features in each feature family.

Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration.

A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "missingnessValue": 0.00,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "missingnessValue": 0.45,
    },
  ],
}
Skew

Metrics showing skew between training and prediction or backtest datasets. Family skew indicates changes in the distribution of feature values within a feature family, weighted by importance of the feature within that family. Max skew indicates the maximum skew of any feature within that family.

Skew values range from 0, representing no significant change in the distribution of values of features in the family, to 1 for the most significant change. A large value for either family skew or max skew indicates a significant change in the structure of your data in a way that may impact model performance. Family skew takes the value -1 when no features in the family are used by the model.

For large skew values, you should do one of the following:

  • Investigate changes in the data used by that feature family (see model governance support materials) and fix any input data issues
  • Retrain a model on more recent data

You should set thresholds for acting on family and max skew values based on observing the natural variation in skew metrics over several months.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "familySkewValue": 0.10,
      "maxSkewValue": 0.14,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "familySkewValue": 0.11,
      "maxSkewValue": 0.11,
    },
  ],
}

Prediction results

The following metadata is output from prediction results.

ColumnTypeDescription
resource_typeSTRINGType of AML AI resource, such as an engine config or prediction results
resource_idSTRINGName of the resource
nameSTRINGName of the metadata entry, such as a metric (see the following table)
valueJSONValue of the metadata entry
Metric name Metric description Example metric value
Missingness

Share of missing values across all features in each feature family.

Ideally, all AML AI feature families should have a Missingness near to 0. Exceptions may occur where the data underlying those feature families is unavailable for integration.

A significant change in this value for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "missingnessValue": 0.00,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "missingnessValue": 0.45,
    },
  ],
}
Skew

Metrics showing skew between training and prediction or backtest datasets. Family skew indicates changes in the distribution of feature values within a feature family, weighted by importance of the feature within that family. Max skew indicates the maximum skew of any feature within that family.

Skew values range from 0, representing no significant change in the distribution of values of features in the family, to 1 for the most significant change. A large value for either family skew or max skew indicates a significant change in the structure of your data in a way that may impact model performance. Family skew takes the value -1 when no features in the family are used by the model.

For large skew values, you should do one of the following:

  • Investigate changes in the data used by that feature family (see model governance support materials) and fix any input data issues
  • Retrain a model on more recent data

You should set thresholds for acting on family and max skew values based on observing the natural variation in skew metrics over several months.

{
  "featureFamilies": [
    {
      "featureFamily": "unusual_wire_credit_activity",
      "familySkewValue": 0.10,
      "maxSkewValue": 0.14,
    },
    ...
    ...
    {
      "featureFamily": "party_supplementary_data_id_3",
      "familySkewValue": 0.11,
      "maxSkewValue": 0.11,
    },
  ],
}