This glossary defines terms specific to AML AI. For general machine learning terms, see Machine Learning Glossary.
core banking data
Core banking data includes data on parties, transactions, and account holdings. It helps AML AI to understand your customers and their banking activity to detect risky characteristics and behaviors.
core time window
Core time window refers to the time range used in an AML AI operation (engine configuration, training, backtesting, and prediction) for generating training, evaluation examples, or model outputs. This time range must be covered by all tables in the dataset.
Different API operations have different requirements for the core time window to generate features and labels. For more information, see Understand data scope and duration.
An AML AI Dataset resource (or just "dataset") is used to specify data, conforming to the AML input data model, which can be used in generating a model, evaluating performance of a model, and generating risk scores and explainability per party. For more information, see Understand the AML data model and requirements.
AML AI conducts data validation checks when creating a dataset, engine config, model, backtest results, or prediction results. If the specified dataset does not pass data validation, then the resource is not created and data validation errors are produced (indicating the nature of the issue). For more information, see Data validation errors.
AML AI operations that use a dataset require you to specify an end time. This field is used to control which months in the dataset are used for generating training or evaluation examples and model outputs.
The end time and all months used for an operation should be within the date range of the associated dataset. For example, a training operation requires a core time window of 15 months. If you use a dataset with a date range from October 15, 2021 to May 21, 2023 and an end time on April 12, 2023, then training uses examples from the calendar months of January 2022 to March 2023, which is in the date range of the dataset.
An AML AI EngineConfig resource (also known as an "engine config") specifies parameters in generating and evaluating an AML AI model and in generating risk scores and explainability.
Some of these parameters are specified in the API call to create an engine config, such as engine version and the expected investigation volume. Other parameters are automatically generated by AML AI using a specified dataset, for example, tuned hyperparameters. For more information, see Configure an engine.
An AML AI EngineVersion resource (also known as an "engine version") defines aspects of how AML AI detects risk, which encompasses model tuning, training, and evaluation as well as the overall AML data model and feature families.
Configuring an AML AI engine requires you to specify an engine version to use. The engine version is then used to train and evaluate models with that engine config and to generate risk scores and explainability.
Engine version naming is structured as follows, with the engine type expressing
the supported line of business, and engine subtype, tuning, major, and
minor versions updated as new behaviors are implemented. Example versions
For more information on managing engine versions, see Manage engine versions.
AML AI models are used to identify parties exhibiting behaviors or characteristics with high risk for money laundering. Explainability indicates which behaviors or characteristics contributed the most to a high-risk score for a given party. For more information, see Understand prediction outputs.
Several AML AI resources store additional information relating to performance and data quality which can be accessed using the export metadata operation. For more information, see AML output data model.
Feature families are collections of related ML features, providing a simple, human-understandable categorization to inform investigators and internal audit teams.
AML AI needs to be able to recreate views of the data at different points in time for tuning, training, and backtesting. To achieve this, AML AI differentiates between mutable entities, that is, entities that can change values over time, and immutable entities, such as events, which, after they come into existence or occur, do not reasonably change.
In the AML input data model, tables representing immutable entities do
not have the fields
is_entity_deleted. This includes
the RiskCaseEvent table. For more information, see
Understanding how data changes over time.
See also mutable entity.
An AML AI Instance resource (also known as an "instance") sits at the root of all other AML AI resources and must be created before you can work with other AML AI resources. Multiple instances can be created in the same region within a project. For more information, see Create an AML AI instance.
An investigation process covers the entire investigation or sequence of investigations triggered by an alert. The process starts when the first part of an investigation starts and ends when no further outcomes are expected from this investigation. For more information, see Lifecycle of a risk case.
line of business (LOB)
The line of business distinguishes retail and commercial banking customers in AML AI. Datasets, engine versions, and party registration are linked to a specific line of business, retail or commercial.
long-running operation (LRO)
Several AML AI operations, including engine configuration, training, backtesting, and prediction, initiate a long-running operation (LRO). For more information, see Manage long-running operations.
In addition to the core time window, AML AI operations require that datasets include a lookback window to allow generation of features that track behavior over time. For more information, see Understand data scope and duration.
The Missingness metric is computed for all feature families when creating the following AML AI resources: engine config, model, backtest results, and prediction results.
This metric shows the share of missing values across all features in a feature family. A significant change in Missingness for any feature family between tuning, training, evaluation, and prediction can indicate inconsistency in the datasets used.
An AML AI Model resource (also known as a "model") represents a trained model that can be used to generate risk scores and explainability.
AML AI needs to be able to recreate views of the data at different points in time for tuning, training, and backtesting. To achieve this, AML AI differentiates between entities that can change values over time and immutable entities, such as events, which, when they come into existence or occur, do not reasonably change.
In the AML input data model, tables representing mutable entities have
is_entity_deleted. This includes the
Transaction, and PartySupplementaryData tables.
For more information, see Understanding how data changes over time.
See also immutable entity.
AML AI measures model performance on historical data using the Observed Recall metric.
This metric shows the proportion of positive labeled parties (for example, customer exits) from a selected period that would have been identified during a suspicious activity period as high risk by the model being evaluated.
An AML AI PredictionResult resource (also known as "prediction results") is the result of using a model to create predictions. For more detail on how to generate risk scores and explainability, and how to use these in your investigative process, see the pages in section Generate risk scores and explainability.
A risk case covers an investigation process or a group of related investigation processes for different parties.
See the RiskCaseEvent table.
risk investigation data
Risk investigation data is used by AML AI to understand your risk investigation process and outcomes and to generate training labels.
AML AI models are used to identify parties exhibiting behaviors or characteristics with high risk for money laundering. This is done through a risk score.
Risk scores vary from 0 to 1. A higher score indicates higher risk. However, risk scores should not be interpreted directly as a probability of money laundering activity. For more information, see Understand prediction outputs.
AML AI can identify money laundering risk across five core AML risk typologies related to transaction monitoring.
With sufficient investigation and supplementary party data (see Supplementary data tables), AML AI can cover more typologies.
Supplementary data is additional data, beyond what is contained in the core banking data and risk investigation data areas of the AML AI schema, which is relevant to predicting risk of money laundering. For example, you might identify and add a risk indicator that helps models better predict a risk typology that is not otherwise well covered.
Supplementary data can be added to a dataset using the PartySupplementaryData table.
suspicious activity period
A suspicious activity period is a period of time in which you believe an investigated party exhibited suspicious behavior. This is used in model evaluation (for example, the recall metric for backtest results) to confirm that high-risk customers are identified during months when they had suspicious activity. For more information, see Lifecycle of a risk case.
AML AI does training as part of creating a model using hyperparameters (see tuning) from a specified engine config.
Tuning is the optimisation of model hyperparameters. AML AI does tuning as part of creating an engine config.
validity start time
Validity start time for a mutable entity is used by AML AI to construct a view of what was known by the bank at a given point in time. This allows AML AI to accurately train models that can be reused on the latest data (that is, what is currently known by the bank) to produce high fidelity risk scores. Validity start time for a given row represents the earliest time that the data in this row was known to the bank and correct. For more information, see Understanding how data changes over time.