计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
14期
143-147
,共5页
海量数据%聚类%MapReduce%K-means算法%Canopy算法
海量數據%聚類%MapReduce%K-means算法%Canopy算法
해량수거%취류%MapReduce%K-means산법%Canopy산법
massive data%clustering%MapReduce%K-means algorithm%Canopy algorithm
针对集中式系统框架难以进行海量数据聚类分析的问题,提出基于MapReduce的K-means聚类优化算法。该算法运用MapReduce并行编程框架,引入Canopy聚类,优化K-means算法初始中心的选取,改进迭代过程中通信和计算模式。实验结果表明该算法能够有效地改善聚类质量,具有较高的执行效率以及优良的扩展性,适合用于海量数据的聚类分析。
針對集中式繫統框架難以進行海量數據聚類分析的問題,提齣基于MapReduce的K-means聚類優化算法。該算法運用MapReduce併行編程框架,引入Canopy聚類,優化K-means算法初始中心的選取,改進迭代過程中通信和計算模式。實驗結果錶明該算法能夠有效地改善聚類質量,具有較高的執行效率以及優良的擴展性,適閤用于海量數據的聚類分析。
침대집중식계통광가난이진행해량수거취류분석적문제,제출기우MapReduce적K-means취류우화산법。해산법운용MapReduce병행편정광가,인입Canopy취류,우화K-means산법초시중심적선취,개진질대과정중통신화계산모식。실험결과표명해산법능구유효지개선취류질량,구유교고적집행효솔이급우량적확전성,괄합용우해량수거적취류분석。
In order to solve the problem of the clustering on massive data under the framework of a centralized system, an optimized algorithm to K-means clustering based on MapReduce is proposed. By using MapReduce parallel programming framework and importing Canopy clustering, this algorithm optimizes initial clustering center, improves communication mode and calculation mode in iteration. The experimental results show that this algorithm can effectively improve the quality of clustering, and can have higher implementation efficiency, its good scalability, thus it fits to clustering analysis on massive data.