CAJ | 학술논문

针对传统聚类算法难以高效进行海量数据聚类分析的问题，提出一种基于MapReduce框架的K-means聚类集成算法。利用K-means算法生成不同聚簇数目的基聚类结果，改进共协关系矩阵，依据数据点对出现次数进行集成，自动得出最终聚类结果。实验结果表明，该算法能够有效地改善聚类质量，具有良好的扩展性，适用于海量数据的聚类分析。
침대전통취류산법난이고효진행해량수거취류분석적문제，제출일충기우MapReduce광가적K-means취류집성산법。이용K-means산법생성불동취족수목적기취류결과，개진공협관계구진，의거수거점대출현차수진행집성，자동득출최종취류결과。실험결과표명，해산법능구유효지개선취류질량，구유량호적확전성，괄용우해량수거적취류분석。
Aiming at the problem of the clustering analysis on massive data for traditional clustering algorithm, this paper proposes a K-means clustering ensemble algorithm based on MapReduce. It generates component clustering results with different number of cluster by the K-means algorithm, improves co-association matrix, and gets a final result automatically via the number of times sample pair co-occurred. Experimental results show that this algorithm can effectively improve the quality of clustering, and has good scalability, fits to clustering analysis on massive data.