search for




 

An alternative method in estimating propensity scores with conditional inference tree in multilevel data: A case study
Journal of the Korean Data & Information Science Society 2019;30:951-66
Published online July 31, 2019;  https://doi.org/10.7465/jkdi.2019.30.4.951
© 2019 Korean Data and Information Science Society.

Hyunsuk Han1 · Minho Kwak2

1Institute of Educational Research, Korea University
2Quantitative Methodology, University of Georgia
Correspondence to: Ph.D. candidate, Quantitative Methodology, University of Georgia, Athens, Georgia, United States. E-mail: mk59520@uga.edu
Received May 16, 2019; Revised June 19, 2019; Accepted July 8, 2019.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
A multilevel structure of data is widely used in a variety of social science settings. To investigate the effects of interventions, researchers often conduct observational studies that use large scale secondary data and incorporate propensity score methods; this is beneficial in performing causal inference in non-randomized observational studies. The standard propensity score uses a logistic regression approach; however, this approach could be outperformed by alternative methods based on statistical learning and data mining algorithms. To date, little research had addressed data utilizing mining methods within propensity score design, especially with multilevel observational data. The purpose of this study is to examine the performance of propensity scores associated with the use of stratification, estimated by a multilevel logistic versus a conditional inference tree using large scale secondary data derived from the Programme for International Student Assessment. The results showed that a conditional inference tree more conservatively estimates the treatment effect. In addition, the covariate balance result showed that the CIT better produced a randomized treatment/control design than did the multilevel logistic regression.
Keywords : Conditional inference tree, multilevel, multilevel logistic regression, nonrandomized, propensity score.