search for


Survival analysis for tomato big data in smart farming
Journal of the Korean Data & Information Science Society 2021;32:361-74
Published online March 31, 2021;
© 2021 Korean Data and Information Science Society.

Jun Cheol Kim1 · Sookhee Kwon2 · Il Do Ha3 · Myung Hwan Na4

123Department of Statistics, Pukyong National University
3Department of Artificial Intelligence Convergence, Pukyong National University
4Department of Mathematics/Statistics, Chonnam National University
Correspondence to: This work was supported by the Research Program of Rural Development Administration (Project No. PJ015361012020).
1Graduate student, Department of Statistics, Pukyong National University, Busan, 48513, South Korea.
2Corresponding author: Graduate student, Department of Statistics, Pukyong National University, Busan, 48513, South Korea. e-mail:
3Professor, Department of Statistics, Department of Artificial Intelligence Convergence, Pukyong National University, Busan, 48513, Korea.
4Professor, Department of Mathematics/Statistics, Chonnam National University, Gwangju, 61186, Korea.
Received January 28, 2021; Revised March 16, 2021; Accepted March 17, 2021.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
With technological convergence and innovation following the 4th industrial revolution, smart farms are recently spreading to maintain an appropriate cultivation environment by grafting information and communication technology into the agricultural field. In this paper, we present the results of survival analysis using weekly tomato data actually collected from smart farm big data. Here, the survival time for the event of interest is defined as the harvest time (time from fruiting to harvesting). In addition, nonparametric estimation results of the cumulative harvest probability of the group variables of interest such as internal temperature, internal humidity, CO2 concentration, and cumulative insolation, which are important environmental factors, are presented through the calculated harvest time. Furthermore, using the acceleration failure time (AFT) model and the penalized likelihood, we present the factors that have an important influence on the survival time. Here, we use LASSO, ALASSO, SCAD, and HL (hierarchical likelihood) as the penalty function. In particular, we evaluate the performance of the prediction model according to the four penalized variable selection methods via MSE (mean squared error) and C-index (concordance index).
Keywords : Acceleration failure time model, harvest time, penalized likelihood, prediction evaluation, smart farm.