search for




 

A study on the performance of the influence function on the t statistic depending on population distributions in big data sets
Journal of the Korean Data & Information Science Society 2019;30:573-85
Published online May 31, 2019;  https://doi.org/10.7465/jkdi.2019.30.3.573
© 2019 Korean Data and Information Science Society.

Sojung Kim1 · Honggie Kim2

12Department of Information and Statistics, Chungnam National University
Correspondence to: Professor, Department of Information and Statistics, Chungnam National University, 99, Daehak-ro, Yuseong-gu, Deajeon 34133, Korea. E-mail: choi@dokdo.ac.kr
This research is fully supported by 2017 CNU research fund. This paper is based on part of Sojung Kim’s Master thesis.
Received January 17, 2019; Revised April 24, 2019; Accepted April 28, 2019.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
When we face data sets, we analyze both how the individual observations are far from the center of the whole data and how the data affect the analytical statistics. In this paper, we focus on how the effect varies according to the shape of the population distribution. The effect on the t statistic for the hypothesis test for the central parameter of the model distribution is studied. We try to compare and analyze how each influence function of statistic is operated in a data set which is repeatedly observed about 300 times in each distribution, assuming that the distribution is in one of three different types. It was confirmed both that the equation that predicts the change of the t statistic in each distribution was satisfied and that it could be used regardless of the underlying distribution. In addition, the change of the t statistic value through the influence function at the extreme values of the distribution is confirmed, and the performance of the influence function turns out to be very satisfactory.
Keywords : Influence function, inverted triangular distribution, normal distribution, t statistic, uniform distribution.