Not logged in.

Contribution Details

Type Master's Thesis
Scope Discipline-based scholarship
Title Comparative analysis of Machine Learning methods for the estimation of Probability of Default
Organization Unit
Authors
  • Silvia Forcina Barrero
Supervisors
  • Erich Walter Farkas
  • Thomas Horger
  • Urban Ulrych
Language
  • English
Institution University of Zurich
Faculty Faculty of Business, Economics and Informatics
Number of Pages 68
Date 2021
Abstract Text Machine Learning (ML) is gaining prominence in financial risk management application studies by providing improved modelling flexibility compared to the current state-of-theart parametric approaches. Under the supervised learning framework, various classifiers may contribute to a more accurate estimation of risk parameters in Internal Rating-Based models developed by financial institutions. The main objective of this thesis is to construct and compare various classification models used in credit scoring applications and estimation of Probability of Default (PD). In particular, this study compares the performances of Random Forest (RF), k-Nearest Neighbors (k-NN), XGBoost, and AdaBoost on a realworld credit scoring portfolio made available by Credit Suisse. The portfolio considered in the analysis ranges from 2000 to 2014 and includes all counterparties in Credit Suisse’s corporate portfolio consisting of Swiss corporate small-medium enterprises (SME) and large enterprises (LE). Common issues in credit scoring portfolios such as the low default problem and feature selection are addressed in the analysis by employing oversampling techniques and hybrid feature selection procedures. Models specifically for SMEs and for both SMEs and LEs simultaneously are constructed and compared using AUROC and Brier Score performance metrics. The performance of these models is also compared to the logistic regression, which is the industry benchmark model for such applications. This study confirms the literature findings that ML models outperform traditional approaches (e.g., logistic regression) and supports the superior performance of these models on the Swiss corporate portfolio specifically. Out of the ML models, the best performing model in terms of AUROC is the RF, while the boosting models provide the most accurate probability predictions. k-NN performs worse than the rest of the ML models, but still outperforms the logistic regression. Finally, the effect of model averaging on model performance is assessed and compared to the performance of the single models. Averaging the three best ML models results in increased performance and reduced model risk. The results suggest that ML techniques prove to be important aids in credit risk modelling and should be considered as serious competitors of classical approaches.
Export BibTeX