search for




 

A measure of discrepancy based on margin of victory useful for the determination of random forest size
Journal of the Korean Data & Information Science Society 2017;28:515-24
Published online May 31, 2017
© 2017 Korean Data & Information Science Society.

Cheolyong Park1

1Major in Statistics, Keimyung University
Correspondence to: Cheolyong Park
Professor, Major in Statistics, Keimyung University, Daegu 42601, Korea. E-mail: cypark1@kmu.ac.kr
Received April 17, 2017; Revised May 15, 2017; Accepted May 16, 2017.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
In this study, a measure of discrepancy based on MV (margin of victory) has been suggested that might be useful in determining the size of random forest for classification. Here MV is a scaled difference in the votes, at infinite random forest, of two most popular classes of current random forest. More specifically, max(-MV,0) is proposed as a reasonable measure of discrepancy by noting that negative MV values mean a discrepancy in two most popular classes between the current and infinite random forests. We propose an appropriate diagnostic statistic based on this measure that might be useful for the determination of random forest size, and then we derive its asymptotic distribution. Finally, a simulation study has been conducted to compare the performances, in finite samples, between this proposed statistic and other recently proposed diagnostic statistics.
Keywords : Determination of random forest size, diagnostic statistic, margin of victory, measure of discrepancy.


KDISS e-Submission go

July 2017, 28 (4)