计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2015年
8期
973-984
,共12页
方差%标准差%邻域%初始聚类中心%K-medoids聚类
方差%標準差%鄰域%初始聚類中心%K-medoids聚類
방차%표준차%린역%초시취류중심%K-medoids취류
variance%standard deviation%neighborhood%initial seeds%K-medoids clustering
针对快速K-medoids聚类算法存在密度计算复杂耗时和初始聚类中心可能位于同一类簇的缺陷,以及基于邻域的K-medoids算法的邻域半径需要人为给定一个调节系数的主观性缺陷,分别以样本间距离均值和相应样本的标准差为邻域半径,以方差作为样本分布密集程度的度量,选取方差值最小且其间距离不低于邻域半径的样本为K-medoids的初始聚类中心,提出了两种方差优化初始中心的K-medoids算法。在UCI数据集和人工模拟数据集上进行了实验测试,并对各种聚类指标进行了比较,结果表明该算法需要的聚类时间短,得到的聚类结果优,适用于较大规模数据集的聚类。
針對快速K-medoids聚類算法存在密度計算複雜耗時和初始聚類中心可能位于同一類簇的缺陷,以及基于鄰域的K-medoids算法的鄰域半徑需要人為給定一箇調節繫數的主觀性缺陷,分彆以樣本間距離均值和相應樣本的標準差為鄰域半徑,以方差作為樣本分佈密集程度的度量,選取方差值最小且其間距離不低于鄰域半徑的樣本為K-medoids的初始聚類中心,提齣瞭兩種方差優化初始中心的K-medoids算法。在UCI數據集和人工模擬數據集上進行瞭實驗測試,併對各種聚類指標進行瞭比較,結果錶明該算法需要的聚類時間短,得到的聚類結果優,適用于較大規模數據集的聚類。
침대쾌속K-medoids취류산법존재밀도계산복잡모시화초시취류중심가능위우동일류족적결함,이급기우린역적K-medoids산법적린역반경수요인위급정일개조절계수적주관성결함,분별이양본간거리균치화상응양본적표준차위린역반경,이방차작위양본분포밀집정도적도량,선취방차치최소차기간거리불저우린역반경적양본위K-medoids적초시취류중심,제출료량충방차우화초시중심적K-medoids산법。재UCI수거집화인공모의수거집상진행료실험측시,병대각충취류지표진행료비교,결과표명해산법수요적취류시간단,득도적취류결과우,괄용우교대규모수거집적취류。
To overcome the deficiencies of fast K-medoids clustering algorithm of its computational load in computing the density of points and its initial seeds may locating in a same cluster, and to overcome the disadvantages of neighborhood-based K-medoids algorithm of its arbitrary in selecting a coefficient to adjust the radius of its neighborhood, this paper proposes two new variance based K-medoids clustering algorithms. These new algo-rithms respectively choose the mean distance between instances and the standard deviation of a specific instance as the radius of a neighborhood, and select the instances with minimum variance as initial seeds one by one where the distance between initial seeds is at least the radius of the neighborhood, so that the expected number of initial seeds have been got. This paper tests the proposed algorithms on the real datasets from UCI machine learning repository and on the synthetically generated datasets, and compares their performance in terms of many popular criteria for clustering. The experimental results demonstrate that the proposed new K-medoids clustering algorithms can obtain better clustering in short time, and they are scalable to cluster a comparable large scale dataset.