计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2015年
16期
47-54
,共8页
K -Me doids%分布式计算%Hadoop%并行采样
K -Me doids%分佈式計算%Hadoop%併行採樣
K -Me doids%분포식계산%Hadoop%병행채양
K -Me doids%distributed computation%Hadoop%parallel sampling
针对传统 K -Me doids算法对初始聚类中心敏感、收敛速度慢,以及在大数据环境下所面临的内存容量和CPU处理速度的瓶颈问题,从改进初始中心选择方案和中心替换策略入手,利用Hadoop分布式计算平台结合基于Top K 的并行随机采样策略,实现了一种高效稳定的 K -Medoids并行算法,并且通过调整Hadoop平台,实现算法的进一步优化。实验证明,改进的K-Medoids算法不仅有良好的加速比,其收敛性和聚类精度均得到了改善。
針對傳統 K -Me doids算法對初始聚類中心敏感、收斂速度慢,以及在大數據環境下所麵臨的內存容量和CPU處理速度的瓶頸問題,從改進初始中心選擇方案和中心替換策略入手,利用Hadoop分佈式計算平檯結閤基于Top K 的併行隨機採樣策略,實現瞭一種高效穩定的 K -Medoids併行算法,併且通過調整Hadoop平檯,實現算法的進一步優化。實驗證明,改進的K-Medoids算法不僅有良好的加速比,其收斂性和聚類精度均得到瞭改善。
침대전통 K -Me doids산법대초시취류중심민감、수렴속도만,이급재대수거배경하소면림적내존용량화CPU처리속도적병경문제,종개진초시중심선택방안화중심체환책략입수,이용Hadoop분포식계산평태결합기우Top K 적병행수궤채양책략,실현료일충고효은정적 K -Medoids병행산법,병차통과조정Hadoop평태,실현산법적진일보우화。실험증명,개진적K-Medoids산법불부유량호적가속비,기수렴성화취류정도균득도료개선。
In view of the traditional K -Me doids algorithm is sensitive to the initial clustering center, slow convergence speed, and in large data environment facing the bottleneck problem of memory and CPU processing speed, through improving the initial center options and replacement strategy of using the Hadoop distributed computing platform combined with par-allel random sampling strategy based on Top K , realizes a highly efficient and stable K -Medoids parallel algorithm, and by adjusting the Hadoop platform, realize the further optimization of the algorithm. Experiments show that the improved K -Medoids algorithm not only has a good speedup, the convergence and the clustering accuracy are also improved.