search for


Developing a XGBoost trading system based on N-period volatility labeling in the stock market
Journal of the Korean Data & Information Science Society 2021;32:1049-70
Published online September 30, 2021;
© 2021 Korean Data and Information Science Society.

Yechan Han1 · Jaeyun Kim2

1Department of Future Convergence Technology, Soonchunhyang University
2Department of Big Data Engineering, Soonchunhyang University
Correspondence to: 1 Graduate student, Department of Future Convergence Technology, Soonchunhyang University, 22, Soonchunhyang-ro, Asan-si, Chungcheongnam-do, 31538, Korea.
2 Corresponding author: Assistant professor, Department of Big Data Engineering, Soonchunhyang University, 22, Soonchunhyang-ro, Asan-si, Chungcheongnam-do, 31538, Korea. E-mail:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2018R1C1B5085049).
Received July 29, 2021; Revised August 17, 2021; Accepted August 27, 2021.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
It is difficult to predict the direction of the stock market price since it is affected by many factors such as political events, general economic conditions, and traders’expectations. In recent years, machine learning algorithms have been successfully shown to obtain high accuracy of stock price direction. Although there are numerous research studies being conducted regarding the prediction of the direction of stock price movement, there are relatively few studies on labeling of data for learning algorithms. The problem with incorrectly labeled data sets in intelligent systems based on machine learning is that learner then trains the incorrect identification and knowledge, which will make it harder for trained learners to predict the correct results. Therefore, incorrect labeling methods could cause information loss and may act as a cause of poor prediction performance when implementing a prediction model. To solve this problem, this study proposes an N-period volatility labeling that is based on the volatility of n-period look back window, which is more dynamic. In order to identify the usefulness of the proposed model, this study is compared against the conventional approach. An empirical study of the proposed model is simulated in the Nasdaq stock market.
Keywords : Data labeling, N-period volatility, trading system, XGBoost.