齐齐哈尔大学学报(自然科学版)
齊齊哈爾大學學報(自然科學版)
제제합이대학학보(자연과학판)
JOURNAL OF QIQIHAR UNIVERSITY(NATURAL SCIENCE EDITION)
2014年
5期
5-9
,共5页
林长方%黄仲开%曾少俊
林長方%黃仲開%曾少俊
림장방%황중개%증소준
云计算%数据挖掘%并行k-means%MapReduce
雲計算%數據挖掘%併行k-means%MapReduce
운계산%수거알굴%병행k-means%MapReduce
cloud computing%data mining%parallel k-means%MapReduce
针对传统k-means聚类算法面对海量数据存在时间复杂度急剧增加的问题,结合云计算的优势,提出基于MapReduce编程框架来实现k-means聚类算法的并行化处理。Map函数完成每个样本记录到聚类中心的距离计算并标记其所属聚类类别,Reduce函数汇总中间结果并计算出新的聚类中心,供下一轮迭代使用。通过实验表明:基于MapReduce的并行化k-means聚类算法具有较好的加速比和良好的扩展性。
針對傳統k-means聚類算法麵對海量數據存在時間複雜度急劇增加的問題,結閤雲計算的優勢,提齣基于MapReduce編程框架來實現k-means聚類算法的併行化處理。Map函數完成每箇樣本記錄到聚類中心的距離計算併標記其所屬聚類類彆,Reduce函數彙總中間結果併計算齣新的聚類中心,供下一輪迭代使用。通過實驗錶明:基于MapReduce的併行化k-means聚類算法具有較好的加速比和良好的擴展性。
침대전통k-means취류산법면대해량수거존재시간복잡도급극증가적문제,결합운계산적우세,제출기우MapReduce편정광가래실현k-means취류산법적병행화처리。Map함수완성매개양본기록도취류중심적거리계산병표기기소속취류유별,Reduce함수회총중간결과병계산출신적취류중심,공하일륜질대사용。통과실험표명:기우MapReduce적병행화k-means취류산법구유교호적가속비화량호적확전성。
For the problem of high time complexity in dealing with huge data of k-means algorithm,propose a parallel method using MapReduce programming model and cloud computing to reduce the time complexity of k-means. The distance between each record and each cluster was calculated and new category was marked to each record in the Map function.All the records of the same key value were sent to a single reducer and get the new cluster centroids for the next MapReduce Job.Experimental results show that the parallel k-means algorithm based on MapReduce has basically linear speedup with an increasing number of node computers and good scalability.