计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
8期
933-944
,共12页
高维数据%聚类%投影子空间%自适应性%特征权重
高維數據%聚類%投影子空間%自適應性%特徵權重
고유수거%취류%투영자공간%자괄응성%특정권중
high-dimensional data%clustering%projected subspace%adaptability%feature weighting
受“维度效应”的影响,许多传统聚类方法运用于高维数据时往往聚类效果不佳。近年来投影聚类方法获得广泛关注,其中软子空间聚类法更是得到了广泛的研究和应用。然而,现有的投影子空间聚类算法大多数均要求用户预先设置一些重要参数,且未能考虑簇类投影子空间的优化问题,从而降低了算法的聚类性能。为此,定义了一种新的优化目标函数,在最小化簇内紧凑度的同时,优化每个簇所在的子空间。通过数学推导得到了新的特征权重计算方法,并提出了一种自适应的k-均值型投影聚类算法。该算法在聚类过程中,依靠数据集自身的相关信息及推导获得的公式动态地计算各优化参数。实验结果表明,新算法通过对投影子空间的优化改善了聚类质量,其性能较已有投影聚类算法有了明显提升。
受“維度效應”的影響,許多傳統聚類方法運用于高維數據時往往聚類效果不佳。近年來投影聚類方法穫得廣汎關註,其中軟子空間聚類法更是得到瞭廣汎的研究和應用。然而,現有的投影子空間聚類算法大多數均要求用戶預先設置一些重要參數,且未能攷慮簇類投影子空間的優化問題,從而降低瞭算法的聚類性能。為此,定義瞭一種新的優化目標函數,在最小化簇內緊湊度的同時,優化每箇簇所在的子空間。通過數學推導得到瞭新的特徵權重計算方法,併提齣瞭一種自適應的k-均值型投影聚類算法。該算法在聚類過程中,依靠數據集自身的相關信息及推導穫得的公式動態地計算各優化參數。實驗結果錶明,新算法通過對投影子空間的優化改善瞭聚類質量,其性能較已有投影聚類算法有瞭明顯提升。
수“유도효응”적영향,허다전통취류방법운용우고유수거시왕왕취류효과불가。근년래투영취류방법획득엄범관주,기중연자공간취류법경시득도료엄범적연구화응용。연이,현유적투영자공간취류산법대다수균요구용호예선설치일사중요삼수,차미능고필족류투영자공간적우화문제,종이강저료산법적취류성능。위차,정의료일충신적우화목표함수,재최소화족내긴주도적동시,우화매개족소재적자공간。통과수학추도득도료신적특정권중계산방법,병제출료일충자괄응적k-균치형투영취류산법。해산법재취류과정중,의고수거집자신적상관신식급추도획득적공식동태지계산각우화삼수。실험결과표명,신산법통과대투영자공간적우화개선료취류질량,기성능교이유투영취류산법유료명현제승。
Due to the curse of dimensionality, many traditional algorithms cannot effectively cluster high dimensional data. In recent years, projective clustering methods spark wide interest. Therein, soft subspace clustering methods have been widely studied and applied. However, most of existing algorithms often require the users to set some impor-tant parameters in advance, and ignore the optimization problems of the projected subspace, thus affecting the perfor-mance of clustering algorithms. To address the problems, this paper proposes a new objective function, which aims at both minimizing the within-cluster compactness and optimizing the projected subspace associated with each cluster. A new expression for feature-weight computation is mathematically derived, with which a new adaptive projective clustering algorithm is defined based on the framework of classical k-means. In the process of clustering, the optimal values of parameters are automatically calculated, relying on datasets and the formula derived. The experimental results show that the proposed algorithm significantly improves the clustering quality and outperforms the other existing projective clustering algorithms.