search for


Identification of Celtis species using random forest with infrared spectroscopy and analysis of spectral feature importance
Journal of the Korean Data & Information Science Society 2021;32:1183-94
Published online November 30, 2021;
© 2021 Korean Data and Information Science Society.

Tae-Im Heo1 · Dong-Hyun Kim2 · Sung-Wook Hwang3

1Division of Forest Bioresources Conservation, Baedudaegan National Arboretum
2Department of Wood Science and Technology, Kyungpook National University
3Institute of Agriculture and Life Sciences, Seoul National University
Correspondence to: 1 Researcher, Division of Forest Bioresources Conservation, Baekdudaegan National Arboretum, Gyeongsnagbuk-do, 36209, Korea.
2 Graduate student, Department of Wood Science and Technology, Kyungpook National University, Daegu 41566, Korea.
3 Corresponding author: Research assistant professor, Institute of Agriculture and Life Sciences, Seoul National University, Seoul 08826, Korea. E-mail:
This study was carried out with the support of ‘R&D Program for Forest Science Technology (Project No. FTIS-2019149A00-2123-0301)’provided by Korea Forest Service (Korea Forestry Promotion Institute).
Received October 5, 2021; Revised October 25, 2021; Accepted October 28, 2021.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This study aims to accumulate data for taxonomic profiling of Celtis species with high potential usefulness. This paper describes a process for wood identification using machine learning techniques with infrared spectroscopy. A spectral dataset was built by acquiring infrared spectra in the 4000-400 cm−1 region from six species of the Cannabaceae family. Random forest, support vector machine, and artificial neural network models were established for wood identification. In addition, the mean decrease impurity-based feature importance was produced from the random forest model. Because the spectral characteristics of the six species were very similar, the identification accuracies of the models trained with the original infrared spectra were low as 0.533-0.733. Data preprocessing using the Savitzky-Golay algorithm resulted in improved accuracy to 0.800-0.867. Random forest suggested that the feature importance of the 1800-700 cm−1 region was relatively high. The identification performance of all models trained with spectral data of the 1800-700 cm-1 region was improved to 0.867-0.933, proving that the selected region has high importance for the identification of the Celtis species.
Keywords : Artificial neural network, Celtis, feature importance measures, infrared spectroscopy, random forest, wood identification.