计算机研究与发展
計算機研究與髮展
계산궤연구여발전
JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT
2015年
7期
1567-1579
,共13页
王春磊%张岩峰%鲍玉斌%赵长宽%于戈%高立新
王春磊%張巖峰%鮑玉斌%趙長寬%于戈%高立新
왕춘뢰%장암봉%포옥빈%조장관%우과%고립신
异步计算%迭代计算%Asyn-SimRank 算法%相似度%大数据%MapReduce 模型%Maiter 框架
異步計算%迭代計算%Asyn-SimRank 算法%相似度%大數據%MapReduce 模型%Maiter 框架
이보계산%질대계산%Asyn-SimRank 산법%상사도%대수거%MapReduce 모형%Maiter 광가
asynchronous computation%iterative computation%Asyn-SimRank%similarity%big data%MapReduce%Maiter
SimRank 算法利用网络结构来评估网络中任意2点的相似性,它被广泛应用于社交网络和链接预测等诸多领域中.近年来,随着大数据技术的发展,SimRank 算法处理的数据不断增大,人们利用MapReduce 等分布式计算模型设计实现分布式的大规模 SimRank 算法来适应大数据处理的需求.但是,由于 SimRank 算法包含开销较大的迭代过程,每次迭代之后都需要一个全局同步,且每次迭代的计算复杂度高、通信量大,SimRank 算法不能在分布式环境下高效地实现.1)提出 Asyn‐SimRank 算法,该算法采用迭代‐累积的方式完成迭代计算,异步执行 SimRank 的核心迭代过程,避免了大规模分布式计算中的大量同步开销,同时有效降低计算量并减少通信开销;2)提出关键点优先调度计算,提升了 Asyn‐SimRank 算法的全局收敛速度;3)证明了 Asyn‐SimRank 算法的正确性和收敛性以及关键点优先调度计算的有效性;4)支持异步迭代的分布式框架 Maiter 上实现了 Asyn‐SimRank 算法.实验结果显示,相比较于 Hadoop ,Spark 上实现的 SimRank 算法和 Delta‐SimRank 算法,Asyn‐SimRank 算法大大提升了算法的计算效率,加速了算法收敛.
SimRank 算法利用網絡結構來評估網絡中任意2點的相似性,它被廣汎應用于社交網絡和鏈接預測等諸多領域中.近年來,隨著大數據技術的髮展,SimRank 算法處理的數據不斷增大,人們利用MapReduce 等分佈式計算模型設計實現分佈式的大規模 SimRank 算法來適應大數據處理的需求.但是,由于 SimRank 算法包含開銷較大的迭代過程,每次迭代之後都需要一箇全跼同步,且每次迭代的計算複雜度高、通信量大,SimRank 算法不能在分佈式環境下高效地實現.1)提齣 Asyn‐SimRank 算法,該算法採用迭代‐纍積的方式完成迭代計算,異步執行 SimRank 的覈心迭代過程,避免瞭大規模分佈式計算中的大量同步開銷,同時有效降低計算量併減少通信開銷;2)提齣關鍵點優先調度計算,提升瞭 Asyn‐SimRank 算法的全跼收斂速度;3)證明瞭 Asyn‐SimRank 算法的正確性和收斂性以及關鍵點優先調度計算的有效性;4)支持異步迭代的分佈式框架 Maiter 上實現瞭 Asyn‐SimRank 算法.實驗結果顯示,相比較于 Hadoop ,Spark 上實現的 SimRank 算法和 Delta‐SimRank 算法,Asyn‐SimRank 算法大大提升瞭算法的計算效率,加速瞭算法收斂.
SimRank 산법이용망락결구래평고망락중임의2점적상사성,타피엄범응용우사교망락화련접예측등제다영역중.근년래,수착대수거기술적발전,SimRank 산법처리적수거불단증대,인문이용MapReduce 등분포식계산모형설계실현분포식적대규모 SimRank 산법래괄응대수거처리적수구.단시,유우 SimRank 산법포함개소교대적질대과정,매차질대지후도수요일개전국동보,차매차질대적계산복잡도고、통신량대,SimRank 산법불능재분포식배경하고효지실현.1)제출 Asyn‐SimRank 산법,해산법채용질대‐루적적방식완성질대계산,이보집행 SimRank 적핵심질대과정,피면료대규모분포식계산중적대량동보개소,동시유효강저계산량병감소통신개소;2)제출관건점우선조도계산,제승료 Asyn‐SimRank 산법적전국수렴속도;3)증명료 Asyn‐SimRank 산법적정학성화수렴성이급관건점우선조도계산적유효성;4)지지이보질대적분포식광가 Maiter 상실현료 Asyn‐SimRank 산법.실험결과현시,상비교우 Hadoop ,Spark 상실현적 SimRank 산법화 Delta‐SimRank 산법,Asyn‐SimRank 산법대대제승료산법적계산효솔,가속료산법수렴.
The SimRank algorithm , which exploits network structure to measure the similarity between node pairs ,has been widely used in many areas ,such as online social networks and link prediction .In recent years ,with the development of big data , the input data set of the SimRank algorithm is constantly increasing . People are utilizing distributed computing models , such as MapReduce ,to design large‐scale SimRank algorithm for solving the big data problems .However , since SimRank algorithm contains a high‐cost iterative process with synchronization barriers between iterations and the computational complexity is high in each iteration , the large‐scale SimRank computation does not result in the satisfactory performance . In this paper ,1 )We propose Asyn‐SimRank by employing the iterate‐cumulate approach ,which asynchronously executes the core iterative computation to avoid the high‐cost synchronization barriers in large‐scale distributed environments ,and effectively reduces the amount of computation and communication .2) We further propose the keypoint priority scheduling mechanism to accelerate convergence . 3 ) We prove the accuracy and convergence property of Asyn‐SimRank as well as the efficiency of the keypoint priority scheduling .4 ) We then implement Asyn‐SimRank on Maiter , which is a distributed framework supporting asynchronous iteration .Expremental results show that ,compared with the SimRank and Delta‐SimRank implementing on Hadoop and Spark , the large‐scale Asyn‐SimRank significantly promotes the computational efficiency and accelerates the convergence .