search for




 

Building a correspondence database from Koreans living in China and content analysis using topic modeling and local surrogate
Journal of the Korean Data & Information Science Society 2021;32:123-34
Published online January 31, 2021;  https://doi.org/10.7465/jkdi.2021.32.1.123
© 2021 Korean Data and Information Science Society.

Hyon Hee Kim1 · Jinnam Jo2

12Department of Statistics and Information Science, Dongduk Women’s University
Correspondence to: 1Associate professor, Department of Statistics and Information Science, Dongduk Women’s University, Seoul 02748, Korea.
2Corresponding author: Professor emeritus, Department of Statistics and Information Science, Dongduk Women’s University, Seoul, 02748, Korea. E-mail: jinnam@dongduk.ac.kr

This work was supported by Korean Studies Foundation Research through the Ministry of Education of the Republic of Korea and Korean Studies Promotion service of the Academy of Korean Studies(ASK-2017-KFR1230011).
Received December 1, 2020; Revised January 9, 2021; Accepted January 15, 2021.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
In this paper, we presented a correspondence database from ethnic Koreans living in China and content analysis using topic modeling and local surrogates. Scan image files were generated from correspondences and contents were summarized using tags. And then, image files were uploaded into the database. In addition, sender information such as name, location, dates, and subjects were inserted. Topic modeling was applied to specialized subjects such as politics, economy, society, and culture. Also, important keywords were extracted using the local surrogate analysis, one of the explainable artificial intelligence technology. In the subject of politics, the relationship between South Korea and North Korea and requests for improving the status of Korean living in China to Korean government were found. In the subject of economics, requests for daily necessity, dictionary, etc. were found. This paper shows that successful results can be derived from humanities research by applying various big data analysis techniques used in big data research.
Keywords : Correspondence database, explainable AI, local surrogate analysis, topic modeling.