Understand data scope and duration

AML AI is set up to assess money laundering risk for one line of business. An LoB is associated with one of your retail or commercial customers.

When creating a dataset for use with an LoB, you should first determine the time range for which the dataset should cover.

Dataset time range

The time range is composed of three parts:

Historical data requirements chart

  • Core time window: This time range must be covered by all tables in the dataset. Different API operations have different requirements for the core time window to generate features and labels:

    • Creating an engine config (for tuning): minimum of 18 months
    • Creating a model (for training): minimum of 15 months
    • Creating prediction results (for scoring): minimum of 1 month
    • Creating backtest results (for backtesting or model evaluation): minimum of 3 months, include more months for more precise evaluation

  • Lookback window: An additional 24 months of data is needed prior to the core time window to support model features that trace activity over time. Minimum lookback window requirements vary by table.

  • Additional risk case events: Data on risk cases more recent than the dataset end time can be included to have more complete labels for training and evaluation of models.

For example, you must create an engine config to use the rest of AML AI. You must create a dataset that covers at least 42 months of transaction data (18 months of core time window and 24 months of lookback window).

Tables to use

For a given core time window and LoB, the BigQuery dataset used with AML AI should contain the following tables:

  • Party: All parties relevant to that LoB for the entire core time window — no lookback window is required
    • Retail LoB: All retail banking customers that have held accounts at any point in the core time window
    • Commercial LoB: All commercial banking customers (legal and natural entities) that have held accounts at any point in the core time window
  • AccountPartyLink: Full history of which accounts were held by which parties for the entire core time window as well as a 24-month lookback window. This should cover all accounts for products and services when a party in the Party table is (or was) the primary account holder
  • Transaction: All transactions for accounts in the AccountPartyLink table for the entire core time window as well as the 24-month lookback window
  • RiskCaseEvent: All risk case events (see event type values) for any risk case and party in the Party table with an AML_PROCESS_START (start of investigation) in the core time window and a minimum 12-month lookback window. Some of these events might have an event time earlier or later than the core time window and lookback window.
  • PartySupplementaryData: (If used) For 0 to 100 unique party_supplementary_data_id values, include a full history of the values of these fields for all parties in the Party table for the core time window — no lookback window is required.

Using additional data

See Supplementary data if you have additional data on parties (not otherwise covered in the schema) that is relevant to identifying money laundering risk.