search for


Review of the survival analysis methods for genetic data
Journal of the Korean Data & Information Science Society 2018;29:1391-408
Published online November 30, 2018
© 2018 Korean Data and Information Science Society.

Seungyeoun Lee1

1Department of Mathematics and Statistics, Sejong University
Correspondence to: Professor, Department of Mathematics and Statistics, Sejong University, Seoul, 05006, Korea. E-mail :
Received October 15, 2018; Revised November 15, 2018; Accepted November 16, 2018.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Survival analysis focuses on the statistical inference for the time to event of interest, which cannot be often completely observed due to censoring. Considering the characteristics of these censored data, traditional survival analysis methods have been developed for estimation, testing, and model development to predict survival time for patients based on clinical data. However, large-scale data from high-throughput genomic technologies, especially microarrays, have been collected, which poses the challenging statistical issues in combining those with the survival time. Many statistical methods have been developed by additionally considering the high-dimensional genomic information in the statistical prediction model constructed only by the existing clinical data. Recently, there have been many studies on the methodology of integrating different types of genomic data through various advanced biologic techniques, which results in making an early prediction for the disease and developing personalized medicine. As well, there has been considerable interest in applying machine learning techniques to analyse these complex and huge amount of genomic data associated with the censored data. In this paper, we review the basic concepts in survival analysis, traditional statistical methods based on clinical data, more appropriate statistical methods dealing with genomic data, and machine learning methods extended to the survival analysis.
Keywords : Censoring, machine learning, nonparametric methods, penalty function, statistical predictive model, survival time.