华南理工大学学报(自然科学版)
華南理工大學學報(自然科學版)
화남리공대학학보(자연과학판)
JOURNAL OF SOUTH CHINA UNIVERSITY OF TECHNOLOGY(NATURAL SCIENCE EDITION)
2013年
12期
101-106
,共6页
基因芯片%图聚类%模块性%平滑度%算法
基因芯片%圖聚類%模塊性%平滑度%算法
기인심편%도취류%모괴성%평활도%산법
gene microarray%graph clustering%modularity%smoothness%algorithm
图聚类是一种重要的聚类算法,可有效应用于蛋白质作用网络和芯片数据聚类等领域.文中针对现有基因芯片数据图聚类方法的不足,提出了一种基于模块性指标和子图平滑度的全局图聚类方法.为防止算法陷入局部最优解,引入子图平滑度的定义,打散每次聚类结果中产生的平滑度较低的子图,再对得到的单节点进行下一次聚类,经多次迭代后得到全局最优的聚类结果.采用一组基因组表达数据,将该方法和其他4种常用聚类方法(经典图聚类、k均值、SOM及Fuzzy算法)进行比较,结果表明:该方法在聚类过程中的平均类间重叠度和FOM′值总体上优于其他4种算法,在将数据集分类到最佳聚类数39时,其FOM′值分别比上述4种方法低28.41%、19.21%、9.84%和24.67%;其分类准确度高于Fuzzy法和SOM算法,算法执行效率则与SOM算法相近,比Fuzzy法高5.94%.
圖聚類是一種重要的聚類算法,可有效應用于蛋白質作用網絡和芯片數據聚類等領域.文中針對現有基因芯片數據圖聚類方法的不足,提齣瞭一種基于模塊性指標和子圖平滑度的全跼圖聚類方法.為防止算法陷入跼部最優解,引入子圖平滑度的定義,打散每次聚類結果中產生的平滑度較低的子圖,再對得到的單節點進行下一次聚類,經多次迭代後得到全跼最優的聚類結果.採用一組基因組錶達數據,將該方法和其他4種常用聚類方法(經典圖聚類、k均值、SOM及Fuzzy算法)進行比較,結果錶明:該方法在聚類過程中的平均類間重疊度和FOM′值總體上優于其他4種算法,在將數據集分類到最佳聚類數39時,其FOM′值分彆比上述4種方法低28.41%、19.21%、9.84%和24.67%;其分類準確度高于Fuzzy法和SOM算法,算法執行效率則與SOM算法相近,比Fuzzy法高5.94%.
도취류시일충중요적취류산법,가유효응용우단백질작용망락화심편수거취류등영역.문중침대현유기인심편수거도취류방법적불족,제출료일충기우모괴성지표화자도평활도적전국도취류방법.위방지산법함입국부최우해,인입자도평활도적정의,타산매차취류결과중산생적평활도교저적자도,재대득도적단절점진행하일차취류,경다차질대후득도전국최우적취류결과.채용일조기인조표체수거,장해방법화기타4충상용취류방법(경전도취류、k균치、SOM급Fuzzy산법)진행비교,결과표명:해방법재취류과정중적평균류간중첩도화FOM′치총체상우우기타4충산법,재장수거집분류도최가취류수39시,기FOM′치분별비상술4충방법저28.41%、19.21%、9.84%화24.67%;기분류준학도고우Fuzzy법화SOM산법,산법집행효솔칙여SOM산법상근,비Fuzzy법고5.94%.
As an important clustering algorithm,graph clustering can be effectively applied to protein interaction networks and microarray data clustering.In this paper,to overcome the shortcomings of the existing graph cluste-ring methods for gene microarray data,a global graph clustering method based on the modularity and the subgraph smoothness is proposed.In this algorithm,subgraph smoothness is introduced to avoid the local optimal solution, subgraphs with low smoothness values in the clustering results are split into singletons,and those newly-generated singletons are used in the next clustering step.After several iterations,the global optimal clustering result can be obtained.The proposed method is then compared with four commonly-used clustering methods (the classic graph clustering,the k-means algorithm,the SOM algorithm,and the Fuzzy algorithm)on a group of genome expression data,and the results show that (1 )the proposed method is superior to the other four methods in terms of average non-overlap proportion and FOM′value;(2)when the dataset is divided into 39 clusters,the FOM′value of the proposed method is respectively 28.41%,19.21%,9.84% and 24.67% lower than those of the other four me-thods;and (3 )the proposed method is of a classification accuracy,which is higher than that of the Fuzzy algorithm and the SOM algorithm,with an execution efficiency similar to that of the SOM algorithm but 5 .94% higher than that of the Fuzzy method.