Up until now, institutions in the world of credit risk are hesitant to use ‘black box’ Machine Learning (ML) techniques in the context of Internal Rating-Based (IRB) models to estimate regulatory capital requirements. But why are banks so resistant to adopt ML models in this area? Compared to the ‘traditionally’ applied models ML models are quite complex, which makes it harder to interpret the results and justify these to management and supervisors. Especially for IRB models, in which the fundamentals of the model must be robust, this 'lack of explainability' is a huge disadvantage. For other areas within credit risk, however, such as credit decisions and pricing, this increase in complexity can lead to higher predictive power of models. If this also holds for IRB models, using ML techniques is now a missed opportunity for institutions.
On November 11th 2021, the European Banking Authority (EBA) published a paper and initiated the discussion on the implementation of ML models in the IRB model space (EBA, 2021). The paper lists practical issues and benefits relating to this famous interpretability versus performance trade-off and aims to set supervisory expectations on how machine learning models can coexist with and adhere to the Capital Requirements Regulation (CRR) when used in the context of IRB models (EBA, 2021).
The term Machine Learning covers a wide range of models with different levels of complexity. For example, a simple ML model would be a logistic regression, as often already used as an IRB model, and a more complex model could be a neural network. The EBA paper focuses on the more complex models since these are harder to interpret and have therefore not been used as IRB models so far.
With this in mind, the EBA identifies the following main areas in which CCR requirements for IRB models are difficult to comply with, when using ML techniques:
Risk differentiation and quantification
For the assignment of ratings or pools in IRB models, we expect to find a clear economic link between the risk drivers and target variable (e.g. the default indicator) in the statistical model (see Article 171 CCR). The complex structure of a ML model makes it hard to establish this clear relationship and provides us with intuitive estimates established during the risk quantification. Let’s assume that we have an Artificial Neural Network (ANN). During the training phase, the layers and connections in the network create non-linear combinations of the input data. This makes it almost impossible to trace back what the actual contribution of each ‘input’ risk driver was. Also, this increase in complexity is making it harder to meet the requirements on the documentation, in which all the modelling assumptions and theory behind the model should be explained (see Article 175 CCR). For a traditional model, we can describe the functional form of the model in one line, but with a ML model this becomes a more complex story.
Robust rating systems cannot only be developed based on a statistical model but to a certain extent have to involve expert judgement. By doing so, we make sure that the model is appropriate for its current and expected portfolio conditions and is acceptable for the business side (see Article 172 CCR) (EBA, 2017). However, the ‘lack of explainability’ in ML models makes it more difficult to complement the model with expert judgement; For example, let’s replace the ‘traditional’ Probability of Default (PD) ranking model (e.g. logistic regression) with again an ANN model. In a more traditional model, an expert would give his input on which risk drivers should be included or on the weight of the risk drivers in the model. But now he has to have an opinion on the number of nodes, activation functions and weight of each connection. How will an expert manage this?
Representativeness and data quality
Complex ML models use big, possibly unstructured, data as input for their models. Hence, these ML models can use over hundreds of risk drivers in their models instead of the typical 5-10 used by the ‘traditional models’. Currently, CCR requires that the estimates for PD and LGD are determined on at least a period of 5 years. It could be hard to meet this requirement when using big data since this might not be available for a long-term horizon.
The validation process should challenge the model design, assumptions and methodology. The increase in complexity in ML models makes it more important to be extra critical of the core model performance. For example, ML models can easily overfit, which distorts the in-sample performance. For this reason, validation should focus on the composition of the Out-of-Sample (OOS) and Out-of-Time (OOT) samples. Also, it will be complicated to resolve any identified deficiencies by the validation process. If there exists any material differences between the predicted and observed default rates, how will the modelling team properly defend these to validation?
First and above all to overcome these obstacles, the EBA paper emphasizes that more understanding needs to be created on the model design. To assess whether the economic relationship between each risk driver and target variable is logical, the paper proposes using graphical tools, shapley values etc. Secondly, experts, management, validation and supervisory committees should develop an appropriate level of understanding of the model. For this purpose, detailed documentation has to be created, explaining the most important drivers for the assignment of exposure to grade or pools. Thirdly, closely monitoring the ML model is required by the institution. This is particularly important for ML models that need more regular updating than the ‘traditional models’. IRB models should be robust against changes in economic conditions and therefore we would expect the parameters to be stable. ML algorithms could compose and introduce PiT (Point-in-Time) elements in a ML model, which possibly affects this stability. Therefore, banks should be alert if an update of the model does not lead to a material change. Lastly, thorough validation on the models should be carried out that checks for biases (e.g. overfitting), model design, representativeness and quality of data and stability of the estimates.
At RiskQuest, we believe that the adoption of complex ML models as primary IRB model will take a significant time or may never happen. Traditional models perform quite well in determining the regulatory capital, so why make these models unnecessarily complex? However, we envision that ML models will gain more importance as preprocessing layers that ingest complex and unstructured data and transform these to risk drivers for traditional models. For example, ML models could compose more informative risk drivers based on customer transaction data - something that is not happing yet. In this case, it would be even possible to simultaneously estimate the complicated ML models, which construct the payment behaviour risk drivers, and the final traditional model, leading to an optimal risk differentiating model. Stay tuned for our detailed vision on this!