search for




 

How many trees in a random forest?
Journal of the Korean Data & Information Science Society 2022;33:325-35
Published online March 31, 2022;  https://doi.org/10.7465/jkdi.2022.33.2.325
© 2022 Korean Data and Information Science Society.

Cheolyong Park1 · Fred W. Huffer2

1Major in Statistics, Keimyung University
2Department of Statistics, Florida State Univrsity
Correspondence to: 1 Professor, Major in Statistics, Keimyung University, Daegu 42601, Korea. E-mail: cypark1@kmu.ac.kr
2 Professor, Department of Statistics, Florida State University, FL 32304, U.S.A.
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (NRF-2019R1F1A1058723).
Received January 25, 2022; Revised February 27, 2022; Accepted February 27, 2022.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
We propose diagnostic statistics which might assist in choosing the size of a random forest for classification. We use these statistics sequentially as we construct the forest. The statistics are computed from out-of-bag or test set votes and give an estimate of expected disagreement between the current and infinite forests. Simulation studies are provided to illustrate the performance of these statistics and to compare them with other methods for choosing the size of a random forest.
Keywords : Binary classi cation, diagnostic statistics, measure of disagreement, number of trees, random forest.