计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2013年
11期
973-982
,共10页
吴志川%毛琛%韩蕾%陈立军
吳誌川%毛琛%韓蕾%陳立軍
오지천%모침%한뢰%진립군
稀疏矩阵乘法%分布式计算%Hadoop
稀疏矩陣乘法%分佈式計算%Hadoop
희소구진승법%분포식계산%Hadoop
sparse matrix multiplication%distributed computing%Hadoop
矩阵乘法是线性代数和图算法中非常重要的一个基本操作,而大规模数据处理中的矩阵往往是稀疏矩阵。MapReduce编程框架能够有效地支持海量数据的分布式计算。因此,对如何运用MapReduce编程框架实现超大规模稀疏矩阵的乘法进行了研究。传统矩阵乘法并行算法没有针对稀疏矩阵进行专门优化,导致计算过程中出现大量不必要的通信开销。提出了一种新的算法--CRM(column row multiplication)算法,并与传统的矩阵分块算法进行了比较。实验证明,CRM算法运行效率有很大的提高,并且具有高度的可伸缩性,适合在MapReduce平台上运行。
矩陣乘法是線性代數和圖算法中非常重要的一箇基本操作,而大規模數據處理中的矩陣往往是稀疏矩陣。MapReduce編程框架能夠有效地支持海量數據的分佈式計算。因此,對如何運用MapReduce編程框架實現超大規模稀疏矩陣的乘法進行瞭研究。傳統矩陣乘法併行算法沒有針對稀疏矩陣進行專門優化,導緻計算過程中齣現大量不必要的通信開銷。提齣瞭一種新的算法--CRM(column row multiplication)算法,併與傳統的矩陣分塊算法進行瞭比較。實驗證明,CRM算法運行效率有很大的提高,併且具有高度的可伸縮性,適閤在MapReduce平檯上運行。
구진승법시선성대수화도산법중비상중요적일개기본조작,이대규모수거처리중적구진왕왕시희소구진。MapReduce편정광가능구유효지지지해량수거적분포식계산。인차,대여하운용MapReduce편정광가실현초대규모희소구진적승법진행료연구。전통구진승법병행산법몰유침대희소구진진행전문우화,도치계산과정중출현대량불필요적통신개소。제출료일충신적산법--CRM(column row multiplication)산법,병여전통적구진분괴산법진행료비교。실험증명,CRM산법운행효솔유흔대적제고,병차구유고도적가신축성,괄합재MapReduce평태상운행。
Matrix multiplication is an important fundamental operation in algebra and graph algorithms. And matrixes are usually highly sparse when coming to massive data processing. MapReduce is a programming model which can process large data sets effectively. This paper focuses on how to deal with massive sparse matrix multiplication on top of MapReduce programming model. Block based matrix multiplication algorithms aren’t optimized for sparse matrix and produce large amount of redundant communication. This paper proposes a new algorithm named CRM (column row multiplication), and compares it with traditional block based matrix algorithms. The experimental results demonstrate that CRM has higher efficiency and scalability, is suitable for operating on MapReduce and out-performs traditional ways considerably.