CAJ | 학술논문

k-means 算法以其算法简单、计算效率高而被广泛应用在数据挖掘、机器学习、计算机视觉等领域。然而，k-means 算法的性能严重依赖于其初始聚类中心的选取。不同的初始聚类中心导致 k-means 算法的聚类结果变化很大。一个合理的方式是选取处在数据相对密集区域的数据样本作为初始聚类中心。鉴于此，提出一种基于数据近邻图的 k-means 初始中心选取算法。该算法分为三个阶段：1）构建数据集的局部近邻图；2）选取初始聚类中心的候选集合；3）确定恰当的初始聚类中心。实验结果表明，该算法选取的初始聚类中心是合理的，同时，可以加快 k-means 的收敛速度。
k-means 산법이기산법간단、계산효솔고이피엄범응용재수거알굴、궤기학습、계산궤시각등영역。연이，k-means 산법적성능엄중의뢰우기초시취류중심적선취。불동적초시취류중심도치 k-means 산법적취류결과변화흔대。일개합리적방식시선취처재수거상대밀집구역적수거양본작위초시취류중심。감우차，제출일충기우수거근린도적 k-means 초시중심선취산법。해산법분위삼개계단：1）구건수거집적국부근린도；2）선취초시취류중심적후선집합；3）학정흡당적초시취류중심。실험결과표명，해산법선취적초시취류중심시합리적，동시，가이가쾌 k-means 적수렴속도。
K-means clustering algorithm is widely used in the fields of data mining,machine learning and computer vision for its conceptually simplicity and high computation efficiency.However,its performance severely relies on the initial clustering centre selection.Differentinitial cluste-ring centre results in the clustering results of k-means algorithm sharply varying.A reasonable solution is to choose the data sample in the region with relative dense data as the initial clustering centre.In view of this,we propose a data neighbourhood graph-basedinitial centre selection method for k-means algorithm,which takes three steps.The first step is to construct the neighbourhood graph of the dataset.The second step is to choose candidates collection of initial clustering centres.The last step is to decide appropriate initialclustering centre.Experimental results show that the initial clustering centre chosen by the proposed method is reasonable,and can speed up the convergence of k-means at the same time.