search for




 

Comparison analysis of big data integration models
Journal of the Korean Data & Information Science Society 2017;28:755-68
Published online July 31, 2017
© 2017 Korean Data & Information Science Society.

Byung Ho Jung1 · Dong Hoon Lim2

1 Gyeongsangnamdo Provincial Government 2 Department of Information and Statistics, Gyeongsang National University
Correspondence to: Dong Hoon Lim
Professor and RINS, Department of Information Statistics, Gyeongsang National University, 52828, Korea. E-mail: dhlim@gnu.ac.kr
Received June 13, 2017; Revised July 13, 2017; Accepted July 13, 2017.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
As Big Data becomes the core of the fourth industrial revolution, big data-based processing and analysis capabilities are expected to influence the company’s future competitiveness. Comparative studies of RHadoop and RHIPE that integrate R and Hadoop environment, have not been discussed by many researchers although RHadoop and RHIPE have been discussed separately. In this paper, we constructed big data platforms such as RHadoop and RHIPE applicable to large scale data and implemented the machine learning algorithms such as multiple regression and logistic regression based on MapReduce framework. We conducted a study on performance and scalability with those implementations for various sample sizes of actual data and simulated data. The experiments demonstrated that our RHadoop and RHIPE can scale well and efficiently process large data sets on commodity hardware. We showed RHIPE is faster than RHadoop in almost all the data generally.
Keywords : Big data, multiple regression, logistic regression, RHadoop, RHIPE


KDISS e-Submission go

July 2017, 28 (4)