search for




 

Density-based spatial clustering of applications with noise using Gower distance
Journal of the Korean Data & Information Science Society 2021;32:1121-33
Published online September 30, 2021;  https://doi.org/10.7465/jkdi.2021.32.5.1121
© 2021 Korean Data and Information Science Society.

Jinkyung Yoo1 · Yujeong An2 · Young Min Kim3

123Department of Statistics, Kyungpook National University
Correspondence to: 1 Doctor Program, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.
2 Master Program, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.
3 Corresponding author: Assistant professor, Department of Statistics, Kyungpook National University, Daegu 41566, Korea. E-mail: kymmyself@knu.ac.kr
This work was supported by the Human Resources Program in Energy Technology of the Korea Institute of Energy Technology Evaluation and Planning(KETEP) granted financial resource from the Ministry of Trade, Industry & Energy, Republic of Korea (No. 20204010600060).
Received July 29, 2021; Revised August 18, 2021; Accepted August 27, 2021.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Most clustering algorithms considering spatial characteristics of data have been de- veloped based on the geological location of observations. Density-based spatial cluster- ing of applications with noise (DBSCAN) provides arbitrarily shaped clusters grouping a set of observations which are closely packed together and noise detecting outliers which lie alone in low-density regions. A distance measure for DBSCAN is Euclidean distance, which is the standard measure of distance and especially suitable to handle continuous variables. To handle both categorical and continuous variables simultane- ously, other measures are required to compute distance for various types of variables. Thus, we propose DBSCAN algorithm using Gower distance. We provide numerical re- sults on spatial and non-spatial setup comparing DBSCAN methods with Euclidean and Gower distance and we apply this method to land price data and migraine treatments data. DBSCAN using Gower distance has a reasonable method and gives comparably stable results.
Keywords : Clustering, DBSCAN, Gower distance, spatial characteristics.