运筹与管理
運籌與管理
운주여관리
OPERATIONS RESEARCH AND MANAGEMENT SCIENCE
2014年
6期
37-43
,共7页
武森%张桂琼%潘静%全敏
武森%張桂瓊%潘靜%全敏
무삼%장계경%반정%전민
聚类算法%泛化中心%分类属性%K-modes
聚類算法%汎化中心%分類屬性%K-modes
취류산법%범화중심%분류속성%K-modes
clustering algorithm%generalized centroid%categorical attribute%K-modes
针对采用经典划分思想的聚类算法以一个点来代表类的局限,提出一种基于泛化中心的分类属性数据聚类算法。该算法通过定义包含多个点的泛化中心来代表类,能够体现出类的数据分布特征,并进一步提出泛化中心距离及类间距离度量的新方法,给出泛化中心的确定方法及基于泛化中心进行对象到类分配的聚类策略,一般只需一次划分迭代就能得到最终聚类结果。将泛化中心算法应用到四个基准数据集,并与著名的划分聚类算法K-modes及其两种改进算法进行比较,结果表明泛化中心算法聚类正确率更高,迭代次数更少,是有效可行的。
針對採用經典劃分思想的聚類算法以一箇點來代錶類的跼限,提齣一種基于汎化中心的分類屬性數據聚類算法。該算法通過定義包含多箇點的汎化中心來代錶類,能夠體現齣類的數據分佈特徵,併進一步提齣汎化中心距離及類間距離度量的新方法,給齣汎化中心的確定方法及基于汎化中心進行對象到類分配的聚類策略,一般隻需一次劃分迭代就能得到最終聚類結果。將汎化中心算法應用到四箇基準數據集,併與著名的劃分聚類算法K-modes及其兩種改進算法進行比較,結果錶明汎化中心算法聚類正確率更高,迭代次數更少,是有效可行的。
침대채용경전화분사상적취류산법이일개점래대표류적국한,제출일충기우범화중심적분류속성수거취류산법。해산법통과정의포함다개점적범화중심래대표류,능구체현출류적수거분포특정,병진일보제출범화중심거리급류간거리도량적신방법,급출범화중심적학정방법급기우범화중심진행대상도류분배적취류책략,일반지수일차화분질대취능득도최종취류결과。장범화중심산법응용도사개기준수거집,병여저명적화분취류산법K-modes급기량충개진산법진행비교,결과표명범화중심산법취류정학솔경고,질대차수경소,시유효가행적。
A new partition algorithm is proposed to cluster categorical data based on generalized centroid , which is different from classic partition clustering algorithms that have the disadvantage of using only one centroid to represent a cluster.The algorithm defines a new concept “generalized centroid” to represent a cluster , which implies the data distribution feature;proposes the new distance measures not only between generalized centroids but also between clusters;and further gives the approach to get the generalized centroids and to assign the objects to clusters based on the generalized centroids , which supports the fact that the algorithm gets the clustering result normally with only once partition iteration .The generalized centroids algorithm is applied to four benchmark data-sets and compared to famous partition clustering algorithm K-modes and its two improved algorithms .Experimen-tal results reveal that the generalized centroids algorithm has higher clustering accuracy and less iteration times . It is effective and feasible .