search for




 

Outlier detection and variable selection via difference based regression model and penalized regression
Journal of the Korean Data & Information Science Society 2018;29:815-25
Published online May 31, 2018
© 2018 Korean Data and Information Science Society.

InHae Choi1 · Chun Gun Park2 · Kyeong Eun Lee3

13Department of Statistics, Kyungpook National University
2Department of Mathematics, Kyonggi University
Correspondence to: Associate Professor, Department of Statistics, Kyungpook National University, Daegu 41566, Korea. E-mail: artlee@knu.ac.kr
Received May 3, 2018; Revised May 22, 2018; Accepted May 22, 2018.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
This paper studies an efficient procedure for the outlier detection and variable selection problem in linear regression. The effect of outliers is added in linear regression as a mean shift parameter, nonzero or zero constant. To fit this mean shift model, most penalized regressions have used some adaptive penalties on the parameters to shrink most of the parameters to zero. Such penalized models do select the true variables well, but do not detect the outliers correctly. To overcome this problem, we first determine a group of possibly suspected outliers using difference-based regression model (DBRM) and add the group to the linear model as the parameters of the effect of each suspected outlier. Then, we perform outlier detection and variable selection simultaneously using Lasso regression or Elastic net regression for the linear regression with the effect term of each suspected outlier added. The proposed method is more efficient than the previous penalized regression. We compare the proposed procedure with other methods using a simulation study and apply this procedure to the real data.
Keywords : Difference-based regression model, Elastic net, Lasso, outliers detection, variable selection.