Detecting instances that significantly differ from the rest of the group they belong to is the task known as anomaly detection. Humans are usually very efficient at spotting out anomalies visually or acoustically, a different colored sheep in a flock or a misplaced note in a melody, however they can have a very hard time doing the same with tabular data. While it might still be feasible to do so if the dataset is small enough such that it can by manually inspected, this becomes increasingly difficult for larger collections of data.
Take for example the case of a financial institution such as a bank: clients perform millions of transactions every week, among which only a tiny percentage are tied to illegal activities. The bank is morally and legally obliged to flag these types of transactions and pass them to the relevant authorities. However, identifying this small subset of transactions is by no means a trivial task.
Rule based approaches, i.e. flagging all transactions above a specified set of thresholds, are typically quite inefficient since they face the risk of generating large amounts of false positive and/or false negative alerts. Creating and tuning these kind of rules also requires a deep knowledge of how the various criminal schemes (fraud, money laundering, counter-terrorist financing…) are tied to transaction data.
In recent years, the increase in computational power and the migration to cloud-based technologies has lead to the rise of machine learning models as valid alternatives to more business-driven practices. Among the countless models there are also anomaly detection algorithms. Typically an anomaly detection model starts by defining some computable quantity which is used to define how different a certain data point is with respect to the underlying dataset. The metric used differs based on the algorithm: common anomaly detection models can rely on the data density in feature space, decision trees or dimensionality reduction techniques. Models then predict an anomaly score for each instance in the dataset, scores which exceed a predetermined threshold are then labeled as anomalies.
At RiskQuest, over the course of different projects, the data science pilllar has acquired the know-how regarding the development and productionalization of an anomaly detection model as well as a deep understanding of the inner workings of these algorithms. We have deployed several anomaly detection models all aimed at detecting both known and new types of money laundering schemes within our client’s transaction data. If you are interested in working for our Data Science team, look here.
Anomaly detection models are an excellent tool to spot unusual customer activity in a high-dimensional feature space, complementing and supporting the business rule approaches and supervised ML models.