计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
11期
1334-1344
,共11页
MapReduce%行列混合存储%延迟物化%多表连接优化
MapReduce%行列混閤存儲%延遲物化%多錶連接優化
MapReduce%행렬혼합존저%연지물화%다표련접우화
MapReduce%row-column storage%deferred materialized%multi-join optimization
对MapReduce下的多表连接查询进行了研究,发现由于MapReduce框架本身的局限性,造成执行效率较低。针对此问题,提出了MapReduce启发式多表连接优化方法(MapReduce based heuristic multi-join opti-mization,MHMO),为不同的连接模式启发式地推荐不同的执行算法。特别的,对于混合连接,首先将其分组为多个简单连接模式,进而定义代价模型确定各分组的最优执行顺序。结合列存储的延迟物化技术,大大提高了MapReduce 下多表连接的执行性能。最后,在数据仓库基准测试数据集TPCH 上进行了实验,验证了 MHMO的有效性。
對MapReduce下的多錶連接查詢進行瞭研究,髮現由于MapReduce框架本身的跼限性,造成執行效率較低。針對此問題,提齣瞭MapReduce啟髮式多錶連接優化方法(MapReduce based heuristic multi-join opti-mization,MHMO),為不同的連接模式啟髮式地推薦不同的執行算法。特彆的,對于混閤連接,首先將其分組為多箇簡單連接模式,進而定義代價模型確定各分組的最優執行順序。結閤列存儲的延遲物化技術,大大提高瞭MapReduce 下多錶連接的執行性能。最後,在數據倉庫基準測試數據集TPCH 上進行瞭實驗,驗證瞭 MHMO的有效性。
대MapReduce하적다표련접사순진행료연구,발현유우MapReduce광가본신적국한성,조성집행효솔교저。침대차문제,제출료MapReduce계발식다표련접우화방법(MapReduce based heuristic multi-join opti-mization,MHMO),위불동적련접모식계발식지추천불동적집행산법。특별적,대우혼합련접,수선장기분조위다개간단련접모식,진이정의대개모형학정각분조적최우집행순서。결합렬존저적연지물화기술,대대제고료MapReduce 하다표련접적집행성능。최후,재수거창고기준측시수거집TPCH 상진행료실험,험증료 MHMO적유효성。
The MapReduce technology has become one of the key technology for massive data processing. However, the limitation of its computing framework leads to the poor performance in multi-join query analysis tasks. To deal with this problem, this paper proposes an adaptive multi-join optimization method for MapReduce framework, called MHMO (MapReduce based heuristic multi-join optimization). For a given query including multi-join, this paper first constructs the join graph to judge its join pattern, then recommends the“optimal”execution strategy for different patterns. Particularly, for hybrid join, this paper first converts and divides it into a set of simple join patterns, then defines the cost model to choose the execution order between different groups with minimum cost. Integrated with the row-column storage and deferred materialized technology, MHMO can improve the multi-join performance in MapReduce framework significantly. Finally, based on the benchmark dataset TPCH, several experiments are made to testify the effectiveness of MHMO.