Authors: Zhen Liu, Yan Fu
Abstract: When the data size grows over large, multiple linear regression is hard to implement in a single machine.Compared with traditional serialized programming mode, Hadoop provides a distributed computing framework, MapReduce, which partially overcomes centralized system’s limitations about computation and storage. In this paper, we modified numerical calculation method of multiple linear
regression and deployed it on Hadoop platform. The testing results show that the new method completely outperform the single machine version and can process very large data set .