search for




 

Divide-and-conquer random sketched kernel ridge regression for large-scale data analysis
Journal of the Korean Data & Information Science Society 2020;31:15-23
Published online January 31, 2020;  https://doi.org/10.7465/jkdi.2020.31.1.15
© 2020 Korean Data and Information Science Society.

Jongkyeong Kang1 · Myoungshic Jhun2

1Department of Statistics, Korea University
2Department of Applied Mathematics and Statistics, SUNY Korea
Correspondence to: Professor, Department of Applied Mathematics and Statistics, SUNY Korea, 119 Songdo Moonhwa-Ro Incheon, 21985, Korea. E-mail: myoungshic.jhun@stonybrook.edu
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRT) funded by the Ministry of Education(NRF-2018R1D1A1B07047654) for M. Jhun.
Received October 11, 2019; Revised November 4, 2019; Accepted November 7, 2019.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Kernel ridge regression is a standard technique to estimate a nonparametric regression over a reproducing kernel Hilbert space and can be applied to a variety of statistical problems. Given the data set of size n, the computational and space complexity for typical kernel ridge regression implementations are O (n3) and O (n2), respectively, which makes its usage highly limited for large-scale data. To tackle this issue, several methods such as the divide-and-conquer kernel ridge regression and the random sketched kernel ridge regression have been developed. In this paper, we propose a novel method for kernel ridge regression by combining the divide-and-conquer and random sketching techniques used in kernel ridge regression for large-scale data. The proposed method enjoys much smaller computational and space complexity than those of the existing methods such as the divide-and-conquer kernel ridge regression and the random sketch kernel ridge regression. Simulation studies and real data analysis are presented to demonstrate the performance of the proposed method.
Keywords : Divide-and-conquer, kernel ridge regression, random sketch.