search for


Bayesian hierarchical clustering for analyzing business data
Journal of the Korean Data & Information Science Society 2020;31:159-71
Published online January 31, 2020;
© 2020 Korean Data and Information Science Society.

Sung Kyun Rhyeu1 · Beom Seuk Hwang2

12Department of Applied Statistics, Chung-Ang University
Correspondence to: Assistant professor, Department of Applied Statistics, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, Korea. E-mail:
This research was supported by the Chung-Ang University Graduate Research Scholarship in 2018, and supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2019R1C1C1011710).
Received December 9, 2019; Revised January 7, 2020; Accepted January 8, 2020.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Clustering is a kind of data mining methods that groups similar objects by using similarity or nonsimilarity between objects. The hierarchical clustering and k-means clustering are widely exploited, but these methods have some drawbacks in that sensitive to the outliers and require predetermined options such as the number of clusters. Meanwhile, the Bayesian Hierarchical Clustering (BHC) employed in microarray data analysis determines clusters based on the hypothesis testing, and therefore, it does not concern about the problems as mentioned above. In this study, we examine the advantage of BHC and the differences between well-known clustering methods and how this method could be applied to business data to obtain superior clustering result.
Keywords : ARI, Bayesian hierarchical clustering, cluster purity, hierarchical clustering, Wholesale customers data.