智能系统学报
智能繫統學報
지능계통학보
CAAI TRANSACTIONS ON INTELLIGENT SYSTEMS
2015年
4期
607-614
,共8页
K均值算法%聚类算法%单片多核%大规模数据集%数据挖掘%无监督学习%大数据
K均值算法%聚類算法%單片多覈%大規模數據集%數據挖掘%無鑑督學習%大數據
K균치산법%취류산법%단편다핵%대규모수거집%수거알굴%무감독학습%대수거
k-means%clustering algorithm%CMP%massive data set%data mining%unsupervised learning%big data
虽然现在多核CPU非常普及,但传统K-means聚类算法由于没有专门进行并行化设计,不能充分利用现代CPU的多核计算能力,算法针对大规模数据集的聚类效率有待进一步提高。因此,对K-means算法进行CMP并行化改进,提出了一种Multi-core K-means( MC-K-means)算法。该算法对K-means的聚类任务进行了分解,设计了独立且均衡的聚类子任务并分配给各线程并行执行,以此利用现代CPU 的多核计算能力。实验结果表明,MC-K-means相比K-means获得了较高的多核加速比,提高了针对大规模数据集的聚类能力。
雖然現在多覈CPU非常普及,但傳統K-means聚類算法由于沒有專門進行併行化設計,不能充分利用現代CPU的多覈計算能力,算法針對大規模數據集的聚類效率有待進一步提高。因此,對K-means算法進行CMP併行化改進,提齣瞭一種Multi-core K-means( MC-K-means)算法。該算法對K-means的聚類任務進行瞭分解,設計瞭獨立且均衡的聚類子任務併分配給各線程併行執行,以此利用現代CPU 的多覈計算能力。實驗結果錶明,MC-K-means相比K-means穫得瞭較高的多覈加速比,提高瞭針對大規模數據集的聚類能力。
수연현재다핵CPU비상보급,단전통K-means취류산법유우몰유전문진행병행화설계,불능충분이용현대CPU적다핵계산능력,산법침대대규모수거집적취류효솔유대진일보제고。인차,대K-means산법진행CMP병행화개진,제출료일충Multi-core K-means( MC-K-means)산법。해산법대K-means적취류임무진행료분해,설계료독립차균형적취류자임무병분배급각선정병행집행,이차이용현대CPU 적다핵계산능력。실험결과표명,MC-K-means상비K-means획득료교고적다핵가속비,제고료침대대규모수거집적취류능력。
The traditional K-means clustering algorithm is not designed to focus on parallelization, which can not make use of the multi -core computing capability of the modern CPU.Therefore, the clustering efficiency of the tra-ditional K-means for massive data set should be further improved.In this paper, a novel algorithm named Multi-core K-means ( MC-K-means) after redesigning the original K-means that focuses on parallelization in a chip multi-pro-cessor CMP environment is proposed.In order to utilize the multi-core computing capability of the modern CPU, MC-K-means partitions the clustering tasks into some independent and balanced subtasks and distributes these sub-tasks to the threads to execute parallel.The experimental results showed that the MC-K-means algorithm received the relatively higher speedup rate compared to the K-means algorithm, which improves the handling capacity for massive data set.