计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2015年
5期
1317-1320
,共4页
K-均值算法%随机抽样%最大最小距离法%映射归约%并行化
K-均值算法%隨機抽樣%最大最小距離法%映射歸約%併行化
K-균치산법%수궤추양%최대최소거리법%영사귀약%병행화
K-means algorithm%random sampling%max-min distance method%MapReduce%parallelization
为有效处理大规模数据聚类的问题,提出一种先抽样再用最大最小距离进行K-means并行化聚类的方法。基于抽样的方法避免了聚类陷入局部解中,基于最大最小距离法使得初始聚类中心趋于最优化。大量实验结果表明,无论是在单机环境还是集群环境下,该方法受初始聚类中心的影响降低,提高了聚类的准确性,减少了聚类的迭代次数,降低了聚类的时间。
為有效處理大規模數據聚類的問題,提齣一種先抽樣再用最大最小距離進行K-means併行化聚類的方法。基于抽樣的方法避免瞭聚類陷入跼部解中,基于最大最小距離法使得初始聚類中心趨于最優化。大量實驗結果錶明,無論是在單機環境還是集群環境下,該方法受初始聚類中心的影響降低,提高瞭聚類的準確性,減少瞭聚類的迭代次數,降低瞭聚類的時間。
위유효처리대규모수거취류적문제,제출일충선추양재용최대최소거리진행K-means병행화취류적방법。기우추양적방법피면료취류함입국부해중,기우최대최소거리법사득초시취류중심추우최우화。대량실험결과표명,무론시재단궤배경환시집군배경하,해방법수초시취류중심적영향강저,제고료취류적준학성,감소료취류적질대차수,강저료취류적시간。
To deal with large-scale data clustering problems,a speeding K-means parallel clustering method was presented which randomly sampled first and then used max-min distance means to carry out K-means parallel clustering.Sampling based method avoids the problem of clustering in local solutions and max-min distance based method makes the initial clustering centers tend to be optimum.Results of a large number of experiments show that the proposed method is affected less by the initial clustering center and improves the precision of clustering in both stand-alone environment and cluster environment.It also reduces the num-ber of iterations of clustering and the clustering time.