search for




 

A comparison of clustering algorithms for surface potential temperature distributions over the Korean peninsula
Journal of the Korean Data & Information Science Society 2022;33:727-54
Published online September 30, 2022;  https://doi.org/10.7465/jkdi.2022.33.5.727
© 2022 Korean Data and Information Science Society.

Jae-Heon Lee1 · Song-Lak Kang2

12Department of Atmospheric & Environmental Sciences, Gangneung-Wonju National University
Correspondence to: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1A6A1A03044326 and 2021R1I1A3044379).
1 Researcher, Multi-scale ABL Laboratory, Department of Atmospheric & Environmental Sciences, Gangneung-Wonju National University, Gangneung 25457, Korea.
2 Professor, Multi-scale ABL Laboratory, Department of Atmospheric & Environmental Sciences, Gangneung-Wonju National University, Gangneung 25457, Korea. E-mail: slkang@gwnu.ac.kr
Received July 25, 2022; Revised August 10, 2022; Accepted September 2, 2022.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Using the k-means, mean shift, and spectral clustering algorithms, we analyze the spatial distributions of annual and seasonal-mean values of surface potential temperatures, selected as a means of compensating for terrain height difference, collected over the 30 years between 1992 to 2021 in the Korean peninsula. We employed the elbow method, silhouette coefficient, Calinski-Harabasz (CH) and Davies-Bouldin (DB) indices to find the optimal hyperparameters for each clustering algorithm. Depending on the algorithm, the clustering results of surface potential temperatures over the Korean peninsula are somewhat different, but produces a similar performance in terms of statistics. The k-means algorithm, which is sensitive to outliers, clusters temperatures collected at the high-elevation stations. The spectral algorithm clusters temperatures at the western coastal stations. The mean shift algorithm clusters temperatures at the southern coastal stations, which are not clusted by the other two algorithms. In terms of statistical performance evaluation, the k-mean clustering is the best, but is comparable to the other two clusterings.
Keywords : k-means, mean shift, spectral clustering, surface potential temperature.