search for




 

A pooling strategy for Bayesian hierarchical models utilizing machine learning algorithm
Journal of the Korean Data & Information Science Society 2024;35:321-30
Published online May 31, 2024;  https://doi.org/10.7465/jkdi.2024.35.3.321
© 2024 Korean Data and Information Science Society.

Ae Jeong Jo1

1Department of Data Science, Andong National University
Correspondence to: This work was supported by a grant from 2023 Research Fund of Andong National University.
1 Assistant professor, Department of Data Science, Andong National University, Andong 36729, Korea. E-mail: ajjo1223@anu.ac.kr
Received March 31, 2024; Revised April 30, 2024; Accepted May 7, 2024.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
The method of small area estimation employing hierarchical Bayesian pooling models provides reliable parameter estimates in small areas where regional estimation is impossible due to the scarcity of samples. The joint approach of hierarchical Bayesian pooling models increases reliability by indirectly and directly identifying regional parameters, borrowing information from surrounding similar regions to supplement the insufficient sample data. Despite the non-parametric method utilizing the Dirichlet process for direct joint modeling, which samples parameters from an infinite parameter space while simultaneously conducting cluster indexing, research has shown its performance to surpass that of parametric joint methods. The non-parametric Bayesian pooling model using the Dirichlet process, which bases the pooling of regions on the inherent similarity exhibited by the observations, does not guarantee optimization of homogeneity within clusters and heterogeneity between clusters due to independent cluster indexing by region. This study proposes a modified Bayesian pooling model that includes minimizing an objective function based on the central distance within clusters as a constraint, thereby minimizing heterogeneity within clusters based on regional similarity while maximizing inter-regional heterogeneity. The performance of the model has also been confirmed to increase.
Keywords : Bayesian model, Dirichlet process, machine learning algorithm, nonparametric, small area