search for




 

Prediction of hospital readmission for type 2 diabetes patients using machine learning models
Journal of the Korean Data & Information Science Society 2024;35:99-110
Published online January 31, 2024;  https://doi.org/10.7465/jkdi.2024.35.1.99
© 2024 Korean Data and Information Science Society.

Yujeong Song1 · Jisu Park2 · Dongwook Kim3 · Chansoo Kim4

124Department of Applied Mathematics, Kongju National University
3Department of Information Statistics, Gyeongsang National University
Correspondence to: 1 Master course student, Department of Applied Mathematics, Kongju National University, Gongju 32588, Korea.
2 Master course student, Department of Applied Mathematics, Kongju National University, Gongju 32588, Korea.
3 Associate professor, Department of Information Statistics, Gyeongsang National University, Jinju 52828, Korea.
4 Corresponding author: Professor, Department of Applied Mathematics, Kongju National University, Gongju 32588, Korea. E-mail: chanskim@kongju.ac.kr
Received November 24, 2023; Revised December 27, 2023; Accepted December 29, 2023.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Diabetes, a globally increasing chronic condition, frequently leads to patient readmission due to various complications. Unplanned readmission leads to serious health problems for a patient and increases the financial burden on healthcare systems. Therefore, governments and healthcare institutions pay attention to reducing patient readmission. This study aims to predict the likelihood of readmission within 30 days for diabetic patients using machine learning models. The dataset comprises patient records collected over a decade from hospitals across the United States, and it displays an imbalanced distribution. Therefore, we applied Random Under Sampling to address the imbalanced issue and utilized 7 machine learning models to predict the likelihood of readmission. Additionally, we conducted a preprocessing process, which involved removing multiple admission records and generating a derived variable by leveraging nominal variables. The results indicate an overall improvement in the predictive performances of the machine learning model through the preprocessing process. Specifically, the Light GBM model exhibited superior predictive performance with an AUC of 0.7238. According to the analysis of feature importance, factors such as the number of lab tests, medications, and the duration of hospitalization were found to significantly influence the predictive model.
Keywords : Diabetes, imbalance, machine learning model, preprocessing, readmission.