计算机系统应用
計算機繫統應用
계산궤계통응용
APPLICATIONS OF THE COMPUTER SYSTEMS
2015年
6期
188-192
,共5页
MapReduce%k-means算法%canopy算法%并行计算%聚类
MapReduce%k-means算法%canopy算法%併行計算%聚類
MapReduce%k-means산법%canopy산법%병행계산%취류
MapReduce%k-means algorithm%canopy algorithm%parallel computation%cluster
针对传统k_means聚类算法在处理海量数据时所面临的内存不足、运算速度慢等问题,提出了一种基于MapReduce的K_means并行算法,同时为了改善k_means算法在初始值确定方面的盲目性,采用canopy算法进行改进。实验结果表明,基于MapReduce的K_means并行算法和改进后的算法均能产生良好的聚类效果,不仅提高了聚类质量,而且在处理大数据集方面,改进后的算法的还能够得到趋近于线性的加速比。
針對傳統k_means聚類算法在處理海量數據時所麵臨的內存不足、運算速度慢等問題,提齣瞭一種基于MapReduce的K_means併行算法,同時為瞭改善k_means算法在初始值確定方麵的盲目性,採用canopy算法進行改進。實驗結果錶明,基于MapReduce的K_means併行算法和改進後的算法均能產生良好的聚類效果,不僅提高瞭聚類質量,而且在處理大數據集方麵,改進後的算法的還能夠得到趨近于線性的加速比。
침대전통k_means취류산법재처리해량수거시소면림적내존불족、운산속도만등문제,제출료일충기우MapReduce적K_means병행산법,동시위료개선k_means산법재초시치학정방면적맹목성,채용canopy산법진행개진。실험결과표명,기우MapReduce적K_means병행산법화개진후적산법균능산생량호적취류효과,불부제고료취류질량,이차재처리대수거집방면,개진후적산법적환능구득도추근우선성적가속비。
In view of the problems that traditional k-means clustering algorithm faces in dealing with mass data, such as running out of memory, the operating in slow speed and so on, this paper proposes a parallel k-means algorithm based on MapReduce. At the same time, in order to overcome the blindness of the k-means algorithm in terms of determining the initial value, we use the canopy algorithm to improve the insufficient. The experimental results show that the parallel k-means algorithm based on MapReduce has an effect on clustering before and after the improvement, not only the quality of the clustering has been increased, but in terms of processing large datasets. The speed-up ratio of the improved algorithm can get closer to the linear.