search for




 

Graphical method for evaluating the impact of influential observations in high-dimensional data
Journal of the Korean Data & Information Science Society 2017;28:1291-300
Published online November 30, 2017
© 2017 Korean Data & Information Science Society.

Sojin Ahn1 · Jae Eun Lee2 · Dae-Heung Jang3

123Department of Statistics, Pukyong National University
Correspondence to: Dae-Heung Jang
Professor, Department of Statistics, Pukyong National University, 45, Yongso-ro, Nam-gu, Busan 48513, Korea. E-mail: dhjang@pknu.ac.k
Received November 3, 2017; Revised November 23, 2017; Accepted November 23, 2017.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
In the high-dimensional data, the number of variables is very larger than the number of observations. In this case, the impact of influential observations on regression coefficient estimates can be very large. Jang and Anderson-Cook (2017) suggested the LASSO influence plot. In this paper, we propose the LASSO influence plot, LASSO variable selection ranking plot, and three-dimensional LASSO influence plot as graphical methods for evaluating the impact of influential observations in high-dimensional data. With real two high-dimensional data examples, we apply these graphical methods as the regression diagnostics tools for finding influential observations. It has been found that we can obtain influential observations with by these graphical methods.
Keywords : Influential observations, LASSO influence plot, LASSO variable selection ranking plot, three-dimensional LASSO influence plot