search for


Comparison of bankruptcy prediction models using statistical learning at multiple times
Journal of the Korean Data & Information Science Society 2021;32:487-99
Published online May 31, 2021;
© 2021 Korean Data and Information Science Society.

Kyung In Cho1 · Young Min Kim2

12Department of Statistics, Kyungpook National University
Correspondence to: This research was supported by the Human Resources Program in Energy Technology of the Korea Institute of Energy Technology Evaluation and Planning (KETEP) granted financial resource from the Ministry of Trade, Industry & Energy, Republic of Korea (No. 20204010600060).
1 Graduate student, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.
1 Corresponding author: Assistant professor, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.
Received January 13, 2021; Revised February 19, 2021; Accepted February 23, 2021.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This research examines the performance change of the bankruptcy prediction models over time. This study is conducted on companies listed on the Korea Exchange from January 1, 2000 to January 31, 2016. We consider accounting variables indicating Growth rate and the Index of All Industrial Productions(IAIP) as a macroeconomic variable. Additional market variables are chosen based on Market Capitalization of each company; volatility, skewness, and kurtosis. The first three years are designated as training data, and the consequent next one year is set as testing data. For each three-one pair trial, we predict the bankruptcy of the enterprise and conduct a series of tests on all trials using logistic regression, partial least squares (PLS), and Random Forest. Logistic regression and Random Forest assign weights on bankruptcy companies due to imbalance of the bankruptcy data and PLS takes the Synthetic Minority Oversampling Techniques to handle imbalance data.
Keywords : Accounting variables indicating growth rate, IAIP, imbalanced data, PLS, random forest.