Authors: Ahmed Abdulhakim Al-Absi and Dae-Ki Kang
MapReduce is one of the most popular frameworks for large-scale data processing. With the growing of data-intensive computations on cloud environments, parallel computing has got more popularity to provide more innovative computational frameworks. The commonly used MapReduce frameworks incorporate a serial execution strategy for the map and reduce stage considering virtualized computing environments. Additionally a serial execution method is adopted in computing the map and reduces functions increasing the execution makespan times. In this paper, we introduce a Parallel Computation MapReduce framework (PCMR) that incorporates a parallel execution strategy of the map and reduce functions by exploiting the available computing cores of the worker nodes. A novel strategy to invoke the reduce stage is also adopted in the PCMR model. The performance of the PCMR model is compared with the Apache Hadoop & YARN frameworks considering imprecise application execution. The results presented in this paper prove that the PCMR model exhibits lower makespan times when compared to the Apache Hadoop & YARN on public and private cloud computing environments.
Keywords: Big Data apploications; Parallel Computation; MapReduce; Azure; Cloud Computing.