search for




 

Evaluation of outlier detection methods for multiple linear regression model
Journal of the Korean Data & Information Science Society 2018;29:1663-77
Published online November 30, 2018
© 2018 Korean Data and Information Science Society.

Youngtae Choi1 · Chun Gun Park2 · Kyeong Eun Lee3

13Department of Statistics, Kyungpook National University
2Department of Mathematics, Kyonggi University
Correspondence to: Associate professor, Department of Mathematics, Kyonggi University, Suwan 16227, Korea. E-mail: cgpark@kgu.ac.kr
This work was extracted from Master's thesis of Youngtae Choi at Kyungpook National University in August 2018.
Received September 10, 2018; Revised October 23, 2018; Accepted October 29, 2018.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
In regression analysis, outliers have serious effiects on estimation, inference, and model selection. So far many outlier detection methods have focused on detecting the outliers and variable selection simultaneously. In this paper, we are interested in how many irrelevant variables could affect to detect the true outlier for several methods which are Least Trimmed Squares (LTS), LAD regression with the lasso penalty (LAD-lasso), Hard Thresholding Based Iteratively Procedure (Hard-IPOD), Soft Thresholding Based Iteratively Procedure (Soft-IPOD), and Difference-based Regression Model (ODDB). To evaluate these methods, simulation studies are presented.
Keywords : Difference-based regression model, irrelevant variables, masking, outlier detection, swamping.