search for




 

Comparative study of artificial neural network models for semantic image segmentation
Journal of the Korean Data & Information Science Society 2024;35:769-89
Published online November 30, 2024;  https://doi.org/10.7465/jkdi.2024.35.6.769
© 2024 Korean Data and Information Science Society.

Bohyun Cho1 · Bogang Jun2 · Jiho Lee3 · Seok Hwan Hong4 · Donghyeon Yu5

1345Department of Statistics, Inha University
2Department of Economics, Inha University
Correspondence to: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)(NRF-2022R1A5A7033499).
1 Master student, Department of Statistics and Data Science, Inha University, Incheon 22212, Korea.
2 Associate professor, Department of Economics, Inha University, Incheon 22212, Korea.
3 Master student, Department of Statistics and Data Science, Inha University, Incheon 22212, Korea.
4 Master student, Department of Statistics and Data Science, Inha University, Incheon 22212, Korea.
5 Corresponding author: Associate professor, Department of Statistics and Data Science, Inha University, Incheon 22212, Korea. E-mail: dyu@inha.ac.kr
Received September 6, 2024; Revised October 4, 2024; Accepted October 5, 2024.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Semantic image segmentation is one of the active research topics in computer vision, which classifies image pixels into predetermined semantic classes such as roads, buildings, cars, and trees. In this paper, we aim to compare four recently developed semantic image segmentation models based on different encoder structures, including the Swin Transformer, BEiT, ConvNeXt, and InternImage models. We train and test these models using the Cityscapes image dataset. Additionally, we apply the trained models, based on the Cityscapes data, to the Naver Streetview images. The results show that the InternImage model performs the best on the Cityscapes dataset, and both the InternImage and ConvNeXt models achieve similar performance, outperforming the others on the Naver Streetview images.
Keywords : Cityscapes images, neural network model, semantic image segmentation, streetview images, transformer