search for




 

A review and suggestions for synthetic data generation strategies using deep generative models
Journal of the Korean Data & Information Science Society 2023;34:791-810
Published online September 30, 2023;  https://doi.org/10.7465/jkdi.2023.34.5.791
© 2023 Korean Data and Information Science Society.

Jiwoo Kim1 · Sunghoon Kwon2 · Dongha Kim3

13Department of Statistics, Sungshin Women’s University
2Department of Applied Statistics, Konkuk University
Correspondence to: This work was supported partly by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No.2022-0-00937, Solving the problem of increasing the usability and usefulness synthetic data algorithm for statistical data.) and partly by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No.RS-2023-00218231).
1 Post-master’s researcher, Department of Statistics, Sungshin Women’s University, Seoul 02844, Korea.
2 Professor, Department of Applied Statistics, Konkuk University, Seoul 05029, Korea.
3 Corresponding author: Assistant professor, Department of Statistics, Sungshin Women’s University, Seoul 02844, Korea. E-mail: dongha0718@sungshin.ac.kr
Received July 22, 2023; Revised August 22, 2023; Accepted August 22, 2023.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Recently, many individuals and organizations have been demanding access to big public data to extract meaningful information from them and thus be able to elaborate their services. However, as it can also lead to unexpected privacy leakage, the distribution should be treated carefully. Synthetic data generation is a popular technique for simultaneously de-identifying privacy and data usability. Deep learning-based generative models have been shown to achieve high performance in generating high-dimensional data such as images, so there is an increasing number of approaches to apply deep learning methods to synthetic data generation. In this paper, we review various synthesizing techniques using deep neural networks, organizing regard to their pre-processing, architectures, and objective functions. We also deal with widely used measures to evaluate their corresponding synthesized data in two views: usability and identification degree. Finally, we suggest interesting and promising future works in this field based on in-depth analysis of deep learning-based generative models and data synthesis. We hope that our suggestions will provide practical help to future researchers.
Keywords : Adversarial learning, deep generative model, synthetic data