CAJ | 학술논문

为解决传统聚类算法在处理大规模信息网络中时间开销过大的问题，基于大规模信息网络的统计学特性，提出了一种将信息网络拓扑结构进行“分而治之”的思想，有效地减少了聚类问题规模和时间开销，并保持了相当的聚类效果。主要贡献包括：提出按照聚类影响力排名来对整个信息网络进行分层切割，然后分别聚类的思想；按照特定信息网络统计学意义上的结构特性，如信息网络的富人集团特性和分层社区结构特性，设计了一套将信息网络进行层次划分的粗略方案，并通过实验证明了其具有一定的合理性；提出了迭代的层级间聚类融合算法，可以实现不同层次聚类的融合。实验表明，该算法在兼具较好聚类效果的同时，非常明显地减少了运算开销。
위해결전통취류산법재처리대규모신식망락중시간개소과대적문제，기우대규모신식망락적통계학특성，제출료일충장신식망락탁복결구진행“분이치지”적사상，유효지감소료취류문제규모화시간개소，병보지료상당적취류효과。주요공헌포괄：제출안조취류영향력배명래대정개신식망락진행분층절할，연후분별취류적사상；안조특정신식망락통계학의의상적결구특성，여신식망락적부인집단특성화분층사구결구특성，설계료일투장신식망락진행층차화분적조략방안，병통과실험증명료기구유일정적합이성；제출료질대적층급간취류융합산법，가이실현불동층차취류적융합。실험표명，해산법재겸구교호취류효과적동시，비상명현지감소료운산개소。
The time cost of traditional clustering algorithm is too high when using it to large scale information net-work. To solve this issue, based on the statistical characteristic of information network, this paper proposes a novel“divide and conquer”strategy on information network, which reduces the clustering size and time cost heavily without efficiency loss. The main contribution of this paper is three folds:(1) It proposes the idea that clustering in different layers separately after dividing the whole information network into several layers according to the clustering contribution rank;(2) Based on the rich-club phenomenon and hierarchical community feature which exists in information network, it designs the blueprint of layer dividing method of clustering algorithm;(3) It presents an iteration procedure to merge clusters in different layers. The experimental results show that the proposed algorithm has good clustering effect and can reduce time cost.