search for


Prediction model for clustered survival data with missing covariates using decision tree
Journal of the Korean Data & Information Science Society 2018;29:1119-26
Published online September 30, 2018
© 2018 Korean Data and Information Science Society.

Hanna Yoo1

1Department of Computer Software, Busan University of Foreign Studies
Correspondence to: Professor, Department of Computer Software, Busan University of Foreign Studies, 65 Geumsaem-ro 485 beon-gil, Geumjeong-gu, Busan, Korea. Email:
This work was supported by the research grant of the Busan University of Foreign Studies in 2018 and also was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No.2017R1C1B5076671).
Received August 3, 2018; Revised September 4, 2018; Accepted September 10, 2018.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
In this study patients with Crohn’s disease (CD) are clustered based on similar characteristics and the risk factors for recurrence after the first abdominal surgery is assessed using decision tree analysis for each cluster. Also missing covariates in the data are imputed using single imputation method. Using cluster analysis patients were classified to two clusters. Using survival analysis there was significant difference in recurrence time between the two clusters (p < 0.001). In the decision tree analysis in each cluster, different risk factors were chosen as the best optimal partitioning variables. In the first cluster, types of surgery, indication of surgery and time interval was chosen as the best optimal partitioning variable. On the other hand time interval was chosen in the second cluster. The result of this study suggest that different prediction model are necessary for different cluster group. Determining patient’s gender, disease status, family history etc, which can be termed as personalized medicine is in the limelight. Modeling prediction model for each cluster with patients who has similar characteristics can lower the risk of recurrence of a disease for new latent patients and we can also expect positive role for medical doctors to enable accurate and effective diagnosis and treatment.
Keywords : Cluster analysis, Crohn’s disease, decision tree analysis, imputation method.