search for




 

An investigation on sparse partial least squares algorithms for compositional data
Journal of the Korean Data & Information Science Society 2024;35:919-31
Published online November 30, 2024;  https://doi.org/10.7465/jkdi.2024.35.6.919
© 2024 Korean Data and Information Science Society.

Jinkyung Yoo1 · Young Min Kim2

12Department of Statistics, Kyungpook National University
Correspondence to: 1 Doctor program, Department of Statistics, Kyungpook National University, Daegu, 41566, Korea
2 Corresponding author: Associate professor, Department of Statistics, Kyungpook National University, Daegu, 41566, Korea. E-mail: kymmyself@knu.ac.kr
Received October 3, 2024; Revised October 31, 2024; Accepted November 4, 2024.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Recently, the interest in analyzing compositional data has emerged. Since the microbiome abundance datasets from the TCMA study have high-dimensional and compositional properties in this research, we investigate statistical methods to handle data having high-dimensionality and compositional characteristics, which is the sparse partial least squares (SPLS) method with SIMPLS and NIPALS algorithms. In general, the SPLS method is selected as a promising method to consider the high-dimensionality. In this study, since we capture the compositional characteristics of data using the SPLS method, the log-ratio transformation, especially CLR transformation, is employed. To evaluate the performance of the SPLS method with the CLR transformation, we conduct simulation studies and apply the SPLS method with SIMPLS and NIPALS algorithms to real data.
Keywords : composition, high-dimensional, log-ratio transformation, microbiome, sparse partial least squares