Imagine creating a model that reviews thousands of transactions every second and aims to flag the transactions related to money laundering. This is something that has become possible thanks to advances in artificial intelligence in recent years, and even appears to be a necessity for banks that are flooded with huge amounts of daily transactions. Hence the challenge of fighting money laundering is growing and RiskQuest is thrilled to be able to help the banks in the process of designing and creating Anti-Money Laundering (AML) models.
During the fight against money laundering other financial crimes like financing of terrorism, fraudulent transactions and corruption can also be captured. Even better, banks can decide to not only monitor money laundering but to also actively fight other financial crime. New models may be designed each aimed to detect a different specific fraudulent behavior. Examples are counter human trafficking, illegal wild life, corona fraud or healthcare fraud. In this blog the latter will serve as an example on how to develop such a transaction monitoring model and thus fight financial crime!
Before creating a model to fight a specific form of financial crime there must be sufficient reason to believe that this behavior occurs often and that this behavior could be captured by transactional data. Let’s tackle the first subject and dive into the health expenditures landscape of the Netherlands.
In 2015 the healthcare system of the Netherlands was decentralized, meaning that the responsibility of certain government duties were shifted from the government towards municipalities. Municipalities are among other things responsible for work and income, youth, long-term sick and elderly care. This decentralization was the starting signal for an increase in fraudulent health care declarations, often through healthcare providers (‘zorgbureaus’). These ‘providers’ for instance receive insurance money from home care declarations which is never provided, or they cheat with personal budget (PGB), the so-called PGB-fraud. Unfortunately the patient is often victimized, since they do not receive the care they are entitled to. However, sometimes the (fake) patient is engaged in the fraud as well.
Since the decentralization, it is fairly easy to become a healthcare provider nowadays; one does not need a diploma or patients to register. Subsequently it is very difficult for the municipalities to check whether everything is correct and non-fraudulent. Some even say that fraud is risk-free for such a healthcare provider at the moment and there is no sanction.
In the meantime, the Public Prosecutor (‘Openbaar Ministerie’ (OM)) estimates that hundreds of millions of Euros per year are used for fraud with healthcare funds. Partly because organized crime focuses on this ‘field’. The government and insurers are gradually seeing valuable healthcare funds disappear towards sports cars and hotel stays in Ibiza. This leak of hundreds of millions of Euros each year should be fixed!
Investigative journalists from Follow The Money (FTM) thought it was time for action. They built a model that checked annual financial statements of health care providers and filtered out those companies with high margins. Margins above 10% and high dividend payments were found to be indicators of fraud.
However, while developing our own tool RiskNavigator we found that using annual financial statements for analyses of a client is time-consuming, old fashioned and not up to date. Besides, it appears that annual financial statements are sensitive to fraud, which is a big problem in the healthcare fraud scenario. On the contrary, client transactions do not lie. Hence creating a model based on client transactions is more accurate and less fraud sensitive.
Reason enough to start making a transaction monitoring model, but how?
When creating a transaction monitoring model, one should first decide on the scope of the clients. In the case of the healthcare fraud detection model one could for instance decide on only including those accounts that are registered under a certain SBI code, or maybe including those accounts that receive funding from insurance companies. If you stay focused on the behavior you would like to capture, a good prior selection may achieve a better performing model.
The second step in creating a transaction based model is finding the right data. Not only transactional data, but also client or company information, network information and other enrichments may be useful. Now is also the time to find your label (or target) information, in the case of the healthcare fraud detection model try to gather all historically known fraud cases.
Once you gathered all the useable sources it is time to create our data pipeline. In the data pipeline we combine all the input sources and create understandable features that our model can train on. Regarding the healthcare fraud detection model, good features to start with are for instance margins and dividend features, since FTM already found those are good risk indicators. Since the quantity of transactional data can be immense, one could also consider aggregation of transactions. Consider aggregating transactions for each client. This can for instance be a daily, weekly, monthly, quarterly or yearly aggregation.
Make sure to create your data pipeline in a structural way, such that development and production environment both can make use of the same underlying library. Git tooling is a DevOps add-on that may be convenient during development of the data pipeline. It allows the data pipeline to have multiple versions and allows contributors to work together on the same project. Even one step further is to train the final model on the production environment, such that no alignment test needs to be done between production and development environment.
Once you have your features, you need to combine them with historic label information and decide whether sufficient labels are available to perform supervised machine learning or that we should aim at unsupervised machine learning techniques. Sometimes explainable non-machine learning models or even clever business rules may be preferred to capture certain behavior. It may also be preferred to use a combination of techniques (an ensemble). This part of the project requires a lot of testing, for instance testing different features, different models, different hyperparameters and different scopes. To make a considered choice of the final model one can use MLflow. This tooling may be convenient during development, since it helps to register all findings and performance of tested models, but also during production, since it helps with registry of the model and automatic deployment.
After choosing a final model, one could also determine whether automatic retraining is preferred each time the model is executed. When new labels are available why not use them as soon as possible?
During model development analysts are often available to give feedback on the model performance. In the case of creating a healthcare fraud detection model one could try to retrieve specific feedback on the chosen client scope or certain features used in the model. Precious information is for instance behavior that analysts never find risky, or behavior that is always risky.
But we need to give something back to the analysts. Using machine learning techniques comes with a price. As previously discussed in the blog ‘What can Credit Risk modelling learn from Counter Financial Crime (Models) and vice versa?’ , the models themselves are somewhat of a ‘black box’. Hence we should make sure that our models and outcomes are understandable and explainable for analysts.
Documentation is a big part of this and spreading the word is as well: explaining how the models work, showing the performance of the models and provide insight in the use of models. Another important aspect is training the analyst in order to correctly interpret model alerts. Feature importance or transaction importance per alert may be a big help for the analyst.
Another thing that can’t be forgotten is bias quantification, for more information on this topic please read our previous blog ‘Are you mindful of discrimination in (your) models?’ and wait for our upcoming blog on Algorithmic Fairness.
Once we checked the performance of the healthcare fraud detection model, and made sure our outcomes are understandable and non-discriminating it is time to put the model into production. Let's find those fraudsters!
In this blog we aimed to inform you on how to develop a transaction monitoring model. This blog used the example of the healthcare fraud detection model, however many other transaction models can follow this setup. At RiskQuest we have gained experience in creating models that fight financial crime and are fully aware of the importance of widespread knowledge sharing within an organization. We are open to share our knowledge and support you on the challenges you face in this area. Let us fight financial crime together!
If you want to join our RiskQuest team, please check our current job openings here