中国科技论文
中國科技論文
중국과기논문
Sciencepaper Online
2012年
4期
241-245
,共5页
彭辅权%金苍宏%吴明晖%应晶
彭輔權%金蒼宏%吳明暉%應晶
팽보권%금창굉%오명휘%응정
云计算%Hadoop%MapReduce%shuffle
雲計算%Hadoop%MapReduce%shuffle
운계산%Hadoop%MapReduce%shuffle
cloud computing%Hadoop%MapReduce%shuffle
详细介绍了MapReduce编程框架,具体分析了MapReduce中shuffle阶段流程。分别从Map端数据压缩、重构远程数据拷贝传输协议、Reduce端内存分配优化三方面来优化和重构Shuffle。最后通过搭建Hadoop集群,运用MapReduce分布式算法测试实验数据。实验结果证明优化重构后的shuffle能显著提高MapReduce计算性能。
詳細介紹瞭MapReduce編程框架,具體分析瞭MapReduce中shuffle階段流程。分彆從Map耑數據壓縮、重構遠程數據拷貝傳輸協議、Reduce耑內存分配優化三方麵來優化和重構Shuffle。最後通過搭建Hadoop集群,運用MapReduce分佈式算法測試實驗數據。實驗結果證明優化重構後的shuffle能顯著提高MapReduce計算性能。
상세개소료MapReduce편정광가,구체분석료MapReduce중shuffle계단류정。분별종Map단수거압축、중구원정수거고패전수협의、Reduce단내존분배우화삼방면래우화화중구Shuffle。최후통과탑건Hadoop집군,운용MapReduce분포식산법측시실험수거。실험결과증명우화중구후적shuffle능현저제고MapReduce계산성능。
We describe the MapReduce programming framework in detail,and analyze the shuffle-stage process.Shuffle in MapReduce is optimized and reconstructed through the following three measures:compressing the output of the Map end,reconstructing the protocol used to copy the data form the Map end to the Reduce end,and optimizing memory allocation on the Reduce end.Finally,through building a Hadoop cluster,the experimental data are tested using the MapReduce distributed algorithm.Experimental results show that the MapReduce computing performance improves significantly after optimizing the reconstructed shuffle.