CAJ | 학술논문

详细介绍了MapReduce编程框架,具体分析了MapReduce中shuffle阶段流程。分别从Map端数据压缩、重构远程数据拷贝传输协议、Reduce端内存分配优化三方面来优化和重构Shuffle。最后通过搭建Hadoop集群,运用MapReduce分布式算法测试实验数据。实验结果证明优化重构后的shuffle能显著提高MapReduce计算性能。
상세개소료MapReduce편정광가,구체분석료MapReduce중shuffle계단류정。분별종Map단수거압축、중구원정수거고패전수협의、Reduce단내존분배우화삼방면래우화화중구Shuffle。최후통과탑건Hadoop집군,운용MapReduce분포식산법측시실험수거。실험결과증명우화중구후적shuffle능현저제고MapReduce계산성능。
We describe the MapReduce programming framework in detail,and analyze the shuffle-stage process.Shuffle in MapReduce is optimized and reconstructed through the following three measures：compressing the output of the Map end,reconstructing the protocol used to copy the data form the Map end to the Reduce end,and optimizing memory allocation on the Reduce end.Finally,through building a Hadoop cluster,the experimental data are tested using the MapReduce distributed algorithm.Experimental results show that the MapReduce computing performance improves significantly after optimizing the reconstructed shuffle.