太原科技大学学报
太原科技大學學報
태원과기대학학보
JOURNAL OF TAIYUAN UNIVERSITY OF SCIENCE AND TECHNOLOGY
2015年
2期
92-96
,共5页
聚类%大数据%Hadoop%Kmeans
聚類%大數據%Hadoop%Kmeans
취류%대수거%Hadoop%Kmeans
big data%cluster%Hadoop%kmeans
针对最大最小值原则的Kmeans聚类算法运行在Hadoop平台时需要多次遍历所有数据的问题,提出了一种改进的初始聚类中心的选择算法称为M+Kmeans算法。该算法只需要遍历一次全局数据极大的缩减了算法并行运算时消耗的时间。多组实验测试结果显示,设计的M+Kmeans算法适合运行在大规模集群Hadoop平台上,并且加速比和扩展率较原始算法有明显提高。
針對最大最小值原則的Kmeans聚類算法運行在Hadoop平檯時需要多次遍歷所有數據的問題,提齣瞭一種改進的初始聚類中心的選擇算法稱為M+Kmeans算法。該算法隻需要遍歷一次全跼數據極大的縮減瞭算法併行運算時消耗的時間。多組實驗測試結果顯示,設計的M+Kmeans算法適閤運行在大規模集群Hadoop平檯上,併且加速比和擴展率較原始算法有明顯提高。
침대최대최소치원칙적Kmeans취류산법운행재Hadoop평태시수요다차편력소유수거적문제,제출료일충개진적초시취류중심적선택산법칭위M+Kmeans산법。해산법지수요편력일차전국수거겁대적축감료산법병행운산시소모적시간。다조실험측시결과현시,설계적M+Kmeans산법괄합운행재대규모집군Hadoop평태상,병차가속비화확전솔교원시산법유명현제고。
An initial clustering center selection algorithm called M+Kmeans algorithm was presented because the maximum-minimum principle of Kmeans clustering algorithm running on Hadoop platform needs to traverse all data for many times. This algorithm only needs to traverse a global data,thus greatly reducing the time of the algorithm and parallel computing. Multiple sets of experimental test results show that the design of M+Kmeans algorithm is suitable for operation on large Hadoop cluster platform,and the speed ratio can obviously improve the expansion rate than the original algorithm.