search for




 

Developing a practical college dropout prediction system using machine learning
Journal of the Korean Data & Information Science Society 2024;35:641-52
Published online September 30, 2024;  https://doi.org/10.7465/jkdi.2024.35.5.641
© 2024 Korean Data and Information Science Society.

Jin Baek Kwon1

1Department of Computer Science and Engineering, Sun Moon University
Correspondence to: 1 Professor, Department of Computer Science and Engineering, Chungnam 31460, Korea. E-mail: jbkwon@sunmoon.ac.kr
Received July 10, 2024; Revised August 8, 2024; Accepted August 9, 2024.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Student dropouts are one of the major challenges in higher education. An effective way to prevent dropouts is to predict students at risk of dropping out early and provide them with focused management and support programs. There have been studies on predicting dropouts using machine learning algorithms, but there are limitations to their practical application in various universities, such as small dataset size and insufficient performance. In this paper, we propose a college student dropout prediction system that can be practically utilized. To ensure the universal application of the proposed model, training features are selected from basic academic data. A dataset of 54,771 samples was used, and a prediction model was developed for the prediction period of one to four semesters by considering long-term prediction for early detection of at-risk students. The random forest and gradient boosting models achieved AUC scores of about 0.92 to 0.98. Short-term predictions showed excellent performance a precision of more than 0.83 at recall of 0.8 and, and long-term predictions also showed high predictive power with F1 scores of 0.64 to 0.72, confirming its practical application. In addition, we synthesized the prediction results by period and suggested practical applications in student information system.
Keywords : Classification models, college dropout, machine learning, prediction models, supervised learning