search for


Insolation prediction using air pollutants and meteorological variables
Journal of the Korean Data & Information Science Society 2021;32:997-1005
Published online September 30, 2021;
© 2021 Korean Data and Information Science Society.

Yeongeun Hwang1 · Dayoung Kang2 · Myunghwan Na3 · Sanghoo Yoon4

12Department of Statistics, Daegu University
3Department of Statistics, Chonnam National University
4Division of Mathematics and Big Data Science, Daegu University
Correspondence to: 1 Master’s course, Department of statistics, Daegu University, Gyeongbuk 38453, Korea.
2 Master’s course, Department of statistics, Daegu University, Gyeongbuk 38453, Korea.
3 Professor, Department of Statistics, Chonnam National University, Gwangju 61186, Korea.
4 Corresponding author: Assistant professor, Division of Mathematics and Big Data Science, Daegu University, Gyeongbuk 38453, Korea. E-mail:
This work was supported by the research program of Rural Development Administration(Project No. PJ0153372021).
Received June 22, 2021; Revised July 22, 2021; Accepted July 28, 2021.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The quality of fruits and vegetables is affected by exposure to insolation at each stage of growth. A machine-learning study using a time series model reflecting the time series characteristics of insolation and meteorological variables affecting insolation was performed. This study presents a model for predicting insolation in a tree-based ensemble that considers both atmospheric pollutant concentrations and meteorological variables that can affect surface insolation. The research data were collected from the Korea Meteorological Administrator and Air Korea, and the research period was between 2015 and 2019. The daily insolation was predicted through machine learning, Random Forest (RF), gradient boosting model (GBM), and XGboost. 5-fold cross-validation was used for model validation, and prediction performance was compared with mean absolute value error, root mean square error, and coefficient of determination. GBM was the best predictive performance among models through 5-fold cross-validation, but with overfitting. Therefore, as a result of applying the optimized parameters, the RF prediction was the best. Both sunshine time and duration were very important variables. However, the amount of clouds and fine dust concentration are not important variables in predicting.
Keywords : Extreme gradient boosting machine, gradient boosting, insolation, random forest.