AML AI glossary

This glossary defines terms specific to AML AI. For general machine learning terms, see Machine Learning Glossary.

B

backtesting

Backtesting uses historical data to evaluate performance (observed recall) of a model by comparing the risk scores it generates to the actual outcomes of historical investigations.

backtest results

An AML AI BacktestResult resource (also known as "backtest results") is created to test the performance of a model on a dataset. For more information, see Evaluate a model.

C

core banking data

Core banking data includes data on parties, transactions, and account holdings. It helps AML AI to understand your customers and their banking activity to detect risky characteristics and behaviors.

core time window

Core time window refers to the time range used in an AML AI operation (engine configuration, training, backtesting, and prediction) for generating training, evaluation examples, or model outputs. This time range must be covered by all tables in the dataset.

Different API operations have different requirements for the core time window to generate features and labels. For more information, see Understand data scope and duration.

D

dataset

An AML AI Dataset resource (or just "dataset") is used to specify data, conforming to the AML input data model, which can be used in generating a model, evaluating performance of a model, and generating risk scores and explainability per party. For more information, see Understand the AML data model and requirements.

data validation

AML AI conducts data validation checks when creating a dataset, engine config, model, backtest results, or prediction results. If the specified dataset does not pass data validation, then the resource is not created and data validation errors are produced (indicating the nature of the issue). For more information, see Data validation errors.

E

end time

AML AI operations that use a dataset require you to specify an end time. This field is used to control which months in the dataset are used for generating training or evaluation examples and model outputs.

The end time and all months used for an operation should be within the date range of the associated dataset. For example, a training operation requires a core time window of 15 months. If you use a dataset with a date range from October 15, 2021 to May 21, 2023 and an end time on April 12, 2023, then training uses examples from the calendar months of January 2022 to March 2023, which is in the date range of the dataset.

engine config

An AML AI EngineConfig resource (also known as an "engine config") specifies parameters in generating and evaluating an AML AI model and in generating risk scores and explainability.

Some of these parameters are specified in the API call to create an engine config, such as engine version and the expected investigation volume. Other parameters are automatically generated by AML AI using a specified dataset, for example, tuned hyperparameters. For more information, see Configure an engine.

engine version

An AML AI EngineVersion resource (also known as an "engine version") defines aspects of how AML AI detects risk, which encompasses model tuning, training, and evaluation as well as the overall AML data model and feature families.

Configuring an AML AI engine requires you to specify an engine version to use. The engine version is then used to train and evaluate models with that engine config and to generate risk scores and explainability.

Engine version naming is structured as follows, with the engine type expressing the supported line of business, and engine subtype, tuning, major, and minor versions updated as new behaviors are implemented. Example versions include aml-retail.default.v004.000.202312-000 and aml-commercial.default.v004.000.202312-000.

Engine versioning

For more information on managing engine versions, see Manage engine versions.

evaluation

See backtesting.

explainability

AML AI models are used to identify parties exhibiting behaviors or characteristics with high risk for money laundering. Explainability indicates which behaviors or characteristics contributed the most to a high-risk score for a given party. For more information, see Understand prediction outputs.

export metadata

Several AML AI resources store additional information relating to performance and data quality which can be accessed using the export metadata operation. For more information, see AML output data model.

F

feature family

Feature families are collections of related ML features, providing a simple, human-understandable categorization to inform investigators and internal audit teams.

I

immutable entity

AML AI needs to be able to recreate views of the data at different points in time for tuning, training, and backtesting. To achieve this, AML AI differentiates between mutable entities, that is, entities that can change values over time, and immutable entities, such as events, which, after they come into existence or occur, do not reasonably change.

In the AML input data model, tables representing immutable entities do not have the fields validity_start_time and is_entity_deleted. This includes the RiskCaseEvent table. For more information, see Understanding how data changes over time.

See also mutable entity.

instance

An AML AI Instance resource (also known as an "instance") sits at the root of all other AML AI resources and must be created before you can work with other AML AI resources. Multiple instances can be created in the same region within a project. For more information, see Create an AML AI instance.

investigation process

An investigation process covers the entire investigation or sequence of investigations triggered by an alert. The process starts when the first part of an investigation starts and ends when no further outcomes are expected from this investigation. For more information, see Lifecycle of a risk case.

L

line of business (LOB)

The line of business distinguishes retail and commercial banking customers in AML AI. Datasets, engine versions, and party registration are linked to a specific line of business, retail or commercial.

long-running operation (LRO)

Several AML AI operations, including engine configuration, training, backtesting, and prediction, initiate a long-running operation (LRO). For more information, see Manage long-running operations.

lookback window

In addition to the core time window, AML AI operations require that datasets include a lookback window to allow generation of features that track behavior over time. For more information, see Understand data scope and duration.

M

Missingness

The Missingness metric is computed for all feature families when creating the following AML AI resources: engine config, model, backtest results, and prediction results.

This metric shows the share of missing values across all features in a feature family. A significant change in Missingness for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.

model

An AML AI Model resource (also known as a "model") represents a trained model that can be used to generate risk scores and explainability.

mutable entity

AML AI needs to be able to recreate views of the data at different points in time for tuning, training, and backtesting. To achieve this, AML AI differentiates between entities that can change values over time and immutable entities, such as events, which, when they come into existence or occur, do not reasonably change.

In the AML input data model, tables representing mutable entities have the fields validity_start_time and is_entity_deleted. This includes the Party, AccountPartyLink, Transaction, and PartySupplementaryData tables. For more information, see Understanding how data changes over time.

See also immutable entity.

O

observed recall

AML AI measures model performance on historical data using the Observed Recall metric.

This metric shows the proportion of positive labeled parties (for example, customer exits) from a selected period that would have been identified during a suspicious activity period as high risk by the model being evaluated.

P

party

In the AML input data model, a party represents a customer of the bank. A party can be a natural person or a legal entity. For more information, see the Party table. See also registered party.

prediction

Prediction is using a model to generate risk scores and explainability which can be used in your AML investigation process.

prediction results

An AML AI PredictionResult resource (also known as "prediction results") is the result of using a model to create predictions. For more detail on how to generate risk scores and explainability, and how to use these in your investigative process, see the pages in section Generate risk scores and explainability.

R

registered party

Before a party can be used to create prediction results (for example, party level risk scores and explainability), the party must be registered for the corresponding line of business.

risk case

A risk case covers an investigation process or a group of related investigation processes for different parties.

See the RiskCaseEvent table.

risk investigation data

Risk investigation data is used by AML AI to understand your risk investigation process and outcomes and to generate training labels.

risk score

AML AI models are used to identify parties exhibiting behaviors or characteristics with high risk for money laundering. This is done through a risk score.

Risk scores vary from 0 to 1. A higher score indicates higher risk. However, risk scores should not be interpreted directly as a probability of money laundering activity. For more information, see Understand prediction outputs.

risk typology

AML AI can identify money laundering risk across five core AML risk typologies related to transaction monitoring.

With sufficient investigation and supplementary party data (see Supplementary data tables), AML AI can cover more typologies.

S

supplementary data

Supplementary data is additional data, beyond what is contained in the core banking data and risk investigation data areas of the AML AI schema, which is relevant to predicting risk of money laundering. For example, you might identify and add a risk indicator that helps models better predict a risk typology that is not otherwise well covered.

Supplementary data can be added to a dataset using the PartySupplementaryData table.

suspicious activity period

A suspicious activity period is a period of time in which you believe an investigated party exhibited suspicious behavior. This is used in model evaluation (for example, the recall metric for backtest results) to confirm that high-risk customers are identified during months when they had suspicious activity. For more information, see Lifecycle of a risk case.

T

training

AML AI does training as part of creating a model using hyperparameters (see tuning) from a specified engine config.

tuning

Tuning is the optimisation of model hyperparameters. AML AI does tuning as part of creating an engine config.

V

validity start time

Validity start time for a mutable entity is used by AML AI to construct a view of what was known by the bank at a given point in time. This allows AML AI to accurately train models that can be reused on the latest data (that is, what is currently known by the bank) to produce high fidelity risk scores. Validity start time for a given row represents the earliest time that the data in this row was known to the bank and correct. For more information, see Understanding how data changes over time.