search for




 

Parsimonious cluster-weighted models using multivariate skew normal distribution
Journal of the Korean Data & Information Science Society 2023;34:229-44
Published online March 31, 2023;  https://doi.org/10.7465/jkdi.2023.34.2.229
© 2023 Korean Data and Information Science Society.

Seunghoon Paik1 · Sangkon Oh2 · Byungtae Seo3

123Department of Statistics, Sungkyunkwan University
Correspondence to: The research of Byungtae Seo is supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2022R1A2C1006462).
1 Master, Department of Statistics, Sungkyunkwan University, Seoul 03063, Korea.
2 Ph. D candidate, Department of Statistics, Sungkyunkwan University, Seoul 03063, Korea. E-mail: ohsangkon@hanmail.net
3 Professor, Department of Statistics, Sungkyunkwan University, 25-2, Sungkyunkwan-Ro, Jongno-Gu, Seoul 03063, Korea.
Received February 8, 2023; Revised February 28, 2023; Accepted March 2, 2023.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Cluster-weighted model (CWM) is a probability model containing response variables, covariates and latent variables, which can be a useful tool to cluster data based on different functional relationships. In CWM, the covariate distribution is usually assumed to be a multivariate normal distribution. However, if the true covariate distribution deviates from the normal distribution, CWM can produce unreliable clustering results. To reduce such problems, in this study, we propose a model that uses multivariate skew normal distribution for the covariate distribution. In addition, we propose a way to reduce the number of model parameters by giving some constraints to the covariance matrix. The feasible expectation and maximization algorithm to estimate parameters is provided, and the performance of the proposed model is demonstrated through simulation studies and real data analysis.
Keywords : Cluster-weighted models, clustering analysis, parsimonious models, mixture models, multivariate skew normal distribution.