CAJ | 학술논문

SimRank 算法利用网络结构来评估网络中任意2点的相似性，它被广泛应用于社交网络和链接预测等诸多领域中．近年来，随着大数据技术的发展，SimRank 算法处理的数据不断增大，人们利用MapReduce 等分布式计算模型设计实现分布式的大规模 SimRank 算法来适应大数据处理的需求．但是，由于 SimRank 算法包含开销较大的迭代过程，每次迭代之后都需要一个全局同步，且每次迭代的计算复杂度高、通信量大，SimRank 算法不能在分布式环境下高效地实现．1）提出 Asyn‐SimRank 算法，该算法采用迭代‐累积的方式完成迭代计算，异步执行 SimRank 的核心迭代过程，避免了大规模分布式计算中的大量同步开销，同时有效降低计算量并减少通信开销；2）提出关键点优先调度计算，提升了 Asyn‐SimRank 算法的全局收敛速度；3）证明了 Asyn‐SimRank 算法的正确性和收敛性以及关键点优先调度计算的有效性；4）支持异步迭代的分布式框架 Maiter 上实现了 Asyn‐SimRank 算法．实验结果显示，相比较于 Hadoop ，Spark 上实现的 SimRank 算法和 Delta‐SimRank 算法，Asyn‐SimRank 算法大大提升了算法的计算效率，加速了算法收敛．
SimRank 산법이용망락결구래평고망락중임의2점적상사성，타피엄범응용우사교망락화련접예측등제다영역중．근년래，수착대수거기술적발전，SimRank 산법처리적수거불단증대，인문이용MapReduce 등분포식계산모형설계실현분포식적대규모 SimRank 산법래괄응대수거처리적수구．단시，유우 SimRank 산법포함개소교대적질대과정，매차질대지후도수요일개전국동보，차매차질대적계산복잡도고、통신량대，SimRank 산법불능재분포식배경하고효지실현．1）제출 Asyn‐SimRank 산법，해산법채용질대‐루적적방식완성질대계산，이보집행 SimRank 적핵심질대과정，피면료대규모분포식계산중적대량동보개소，동시유효강저계산량병감소통신개소；2）제출관건점우선조도계산，제승료 Asyn‐SimRank 산법적전국수렴속도；3）증명료 Asyn‐SimRank 산법적정학성화수렴성이급관건점우선조도계산적유효성；4）지지이보질대적분포식광가 Maiter 상실현료 Asyn‐SimRank 산법．실험결과현시，상비교우 Hadoop ，Spark 상실현적 SimRank 산법화 Delta‐SimRank 산법，Asyn‐SimRank 산법대대제승료산법적계산효솔，가속료산법수렴．
The SimRank algorithm , which exploits network structure to measure the similarity between node pairs ,has been widely used in many areas ,such as online social networks and link prediction .In recent years ,with the development of big data , the input data set of the SimRank algorithm is constantly increasing . People are utilizing distributed computing models , such as MapReduce ,to design large‐scale SimRank algorithm for solving the big data problems .However , since SimRank algorithm contains a high‐cost iterative process with synchronization barriers between iterations and the computational complexity is high in each iteration , the large‐scale SimRank computation does not result in the satisfactory performance . In this paper ,1 )We propose Asyn‐SimRank by employing the iterate‐cumulate approach ,which asynchronously executes the core iterative computation to avoid the high‐cost synchronization barriers in large‐scale distributed environments ,and effectively reduces the amount of computation and communication .2) We further propose the keypoint priority scheduling mechanism to accelerate convergence . 3 ) We prove the accuracy and convergence property of Asyn‐SimRank as well as the efficiency of the keypoint priority scheduling .4 ) We then implement Asyn‐SimRank on Maiter , which is a distributed framework supporting asynchronous iteration .Expremental results show that ,compared with the SimRank and Delta‐SimRank implementing on Hadoop and Spark , the large‐scale Asyn‐SimRank significantly promotes the computational efficiency and accelerates the convergence .