search for




 

A study on the classification of importance variables in a digital divide data using machine learning
Journal of the Korean Data & Information Science Society 2022;33:177-93
Published online March 31, 2022;  https://doi.org/10.7465/jkdi.2022.33.2.177
© 2022 Korean Data and Information Science Society.

Kwang Yoon Song1 · Youn Su Kim2 · In Hong Chang3

13Department of Computer Science and Statistics, Chosun University
2Department of Computer Science and Statistics, Graduate School, Chosun University
Correspondence to: 1 Research Professor, Department of Computer Science and Statistics, Chosun University, Gwangju 61452, Korea.
2 Ph.D. Candidate, Department of Computer Science and Statistics, Graduate School, Chosun University, Gwangju 61452, Korea.
3 Professor, Department of Computer Science and Statistics, Chosun University, Gwangju 61452, Korea. E-mail: ihchang@chosun.ac.kr
This study was supported by research fund from Chosun University, 2021.
Received February 28, 2022; Revised March 14, 2022; Accepted March 15, 2022.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Today, with the development of computers and the Internet, it is possible to obtain information faster and easier than in the past through the information age. However, it is difficult for everyone to obtain the same information or collect information suitable for them. There is a very large difference depending on the level of using smart devices, and among them, the class that has difficulty in not being able to use a PC or smart device is called the information underprivileged class. In this study, based on the survey data surveyed by the National Information Society Agency for 3 years from 2018 to 2020, we proposed a model for classifying the general public and the people belonging to the information underprivileged class by using the machine learning methods Random Forest and Support Vector Machine. In addition, variables that have a significant effect on the classification between each class were calculated. The importance variables were age and job, PC Competence, PC & Smart Phone Competence. Based on the above results, We suggested a plan to reduce the gap between the general public and the information unprivileged class.
Keywords : Digital divide, machine learning, random forest, support vector machine.