CAJ | 학술논문

图聚类是一种重要的聚类算法，可有效应用于蛋白质作用网络和芯片数据聚类等领域．文中针对现有基因芯片数据图聚类方法的不足，提出了一种基于模块性指标和子图平滑度的全局图聚类方法．为防止算法陷入局部最优解，引入子图平滑度的定义，打散每次聚类结果中产生的平滑度较低的子图，再对得到的单节点进行下一次聚类，经多次迭代后得到全局最优的聚类结果．采用一组基因组表达数据，将该方法和其他4种常用聚类方法（经典图聚类、k均值、SOM及Fuzzy算法）进行比较，结果表明：该方法在聚类过程中的平均类间重叠度和FOM′值总体上优于其他4种算法，在将数据集分类到最佳聚类数39时，其FOM′值分别比上述4种方法低28．41％、19．21％、9．84％和24．67％；其分类准确度高于Fuzzy法和SOM算法，算法执行效率则与SOM算法相近，比Fuzzy法高5．94％．
도취류시일충중요적취류산법，가유효응용우단백질작용망락화심편수거취류등영역．문중침대현유기인심편수거도취류방법적불족，제출료일충기우모괴성지표화자도평활도적전국도취류방법．위방지산법함입국부최우해，인입자도평활도적정의，타산매차취류결과중산생적평활도교저적자도，재대득도적단절점진행하일차취류，경다차질대후득도전국최우적취류결과．채용일조기인조표체수거，장해방법화기타4충상용취류방법（경전도취류、k균치、SOM급Fuzzy산법）진행비교，결과표명：해방법재취류과정중적평균류간중첩도화FOM′치총체상우우기타4충산법，재장수거집분류도최가취류수39시，기FOM′치분별비상술4충방법저28．41％、19．21％、9．84％화24．67％；기분류준학도고우Fuzzy법화SOM산법，산법집행효솔칙여SOM산법상근，비Fuzzy법고5．94％．
As an important clustering algorithm,graph clustering can be effectively applied to protein interaction networks and microarray data clustering.In this paper,to overcome the shortcomings of the existing graph cluste-ring methods for gene microarray data,a global graph clustering method based on the modularity and the subgraph smoothness is proposed.In this algorithm,subgraph smoothness is introduced to avoid the local optimal solution, subgraphs with low smoothness values in the clustering results are split into singletons,and those newly-generated singletons are used in the next clustering step.After several iterations,the global optimal clustering result can be obtained.The proposed method is then compared with four commonly-used clustering methods (the classic graph clustering,the k-means algorithm,the SOM algorithm,and the Fuzzy algorithm)on a group of genome expression data,and the results show that (1 )the proposed method is superior to the other four methods in terms of average non-overlap proportion and FOM′value;(2)when the dataset is divided into 39 clusters,the FOM′value of the proposed method is respectively 28.41%,19.21%,9.84% and 24.67% lower than those of the other four me-thods;and (3 )the proposed method is of a classification accuracy,which is higher than that of the Fuzzy algorithm and the SOM algorithm,with an execution efficiency similar to that of the SOM algorithm but 5 .94% higher than that of the Fuzzy method.