search for




 

Clustering analysis of torsion angles in the backbone of proteins by utilizing DBSCAN algorithm
Journal of the Korean Data & Information Science Society 2022;33:951-62
Published online November 30, 2022;  https://doi.org/10.7465/jkdi.2022.33.6.951
© 2022 Korean Data and Information Science Society.

Byungwon Kim1
Correspondence to: This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. NRF-2021R1G1A109535511-1-1).
1 Assistant professor, Department of Statistics, Kyungpook National University, Gyeongbuk 41566, Korea. E-mail: byungwonkim@knu.ac.kr
Received September 1, 2022; Revised September 24, 2022; Accepted September 26, 2022.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Using the density-based spatial clustering of applications with noise (DBSCAN), we investigate the distributional characteristic of torsion angles in the backbone of proteins, which are known to determine the kinds and functions of proteins. DBSCAN algorithm has been developed for the spatial data which is usually widespread and large sized. In particular, DBSCAN algorithm is specialized for analyzing data with clusters whose shapes are irregular and arbitrary. Since it is based on the density, the algorithm can be easily applied to various types of data if an appropriate metric is defined on the data space. In this article, we utilize the DBSCAN algorithm by using the toroidal distance which is defined for the torus data. The modified DBSCAN is applied to the torsion angles in the backbone of proteins extracted from three strains of coronavirus spike glycoproteins including SARS-CoV-2, delta mutant, and omicron mutant, all of them are contagious in humans. The analysis reveals a few differences in the clusters of those three coronaviruses.
Keywords : Cluster analysis, DBSCAN, torsion angles, torus data