search for


Development of prediction models using high dimensional RNA sequencing data for the prognosis of pancreatic ductal adenocarcinoma
Journal of the Korean Data & Information Science Society 2018;29:1409-19
Published online November 30, 2018
© 2018 Korean Data and Information Science Society.

Seokho Jeong1 · Lydia Mok2 · Taesung Park3

13Department of Statistics, Seoul National University
23Interdisciplinary Program in Bioinformatics, Seoul National University
Correspondence to: Professor, Department of Statistics, Seoul National University Seoul 08826, Korea. E-mail:
Received October 11, 2018; Revised October 12, 2018; Accepted November 19, 2018.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Pancreatic cancer is a well known disease with a high risk of death. Accurate prediction of prognosis using only clinical information has not been easy. Therefore, an effort to develop a better prediction model by using genetic information along with clinical information is needed. RNA sequencing data consist of tens of thousands of gene expression variables. As a result, the number of variables is much larger than sample size. In this study, we developed the prognosis prediction model by integrating the high dimensional RNA sequencing data with clinical data through the following three steps: (1) gene filtering, (2) selecting candidate genetic markers, (3) final marker selection using penalized Cox model. The prognosis prediction model development procedure introduced in this study is expected to be widely used for the development of prognosis prediction models for other types of cancer as well.
Keywords : Cox regression, elastic net, prognosis prediction, lasso penalty.