系统工程理论与实践
繫統工程理論與實踐
계통공정이론여실천
Systems Engineering—Theory & Practice
2014年
2期
437~443
,共null页
系统生物学 蛋白质相互作用网络 蛋白质复合物 基因表达数据
繫統生物學 蛋白質相互作用網絡 蛋白質複閤物 基因錶達數據
계통생물학 단백질상호작용망락 단백질복합물 기인표체수거
system biology; protein-protein interaction network; protein complexes; gene expression data
从大规模相互作用网络中识别蛋白质复合物,对解释特定的生物进程和预测蛋白质功能具有重要作用,同时也是后基因组时代一个最重要的研究课题.考虑到传统仅基于蛋白质相互作用网络(PPI网络)的蛋白质复合物识别算法可靠性不高,本文提出了一种新的融合PPI网络和基因表达数据的蛋白质复合物识别算法IPCIPG.区别于之前用基因表达数据评估PPI网络可靠性的做法,本文提出在蛋白质复合物的识别过程中将PPI网络和基因表达数据有机地结合起来.算法IPCIPG首先根据边聚集系数(ECC)与蛋白质问共表达的相关性(PCC)计算PPI网络中每个节点的权重,权重最大的节点作为种子,然后从种子节点开始扩充生成稠密子图.基于酵母数据集的实验结果表明,算法IPCIPG较其他算法HUNTER,HC—PIN,CMC,SPICI,MOCDE,MCL能够更准确,更有效地识别出具有特定生物意义的蛋白质复合物.
從大規模相互作用網絡中識彆蛋白質複閤物,對解釋特定的生物進程和預測蛋白質功能具有重要作用,同時也是後基因組時代一箇最重要的研究課題.攷慮到傳統僅基于蛋白質相互作用網絡(PPI網絡)的蛋白質複閤物識彆算法可靠性不高,本文提齣瞭一種新的融閤PPI網絡和基因錶達數據的蛋白質複閤物識彆算法IPCIPG.區彆于之前用基因錶達數據評估PPI網絡可靠性的做法,本文提齣在蛋白質複閤物的識彆過程中將PPI網絡和基因錶達數據有機地結閤起來.算法IPCIPG首先根據邊聚集繫數(ECC)與蛋白質問共錶達的相關性(PCC)計算PPI網絡中每箇節點的權重,權重最大的節點作為種子,然後從種子節點開始擴充生成稠密子圖.基于酵母數據集的實驗結果錶明,算法IPCIPG較其他算法HUNTER,HC—PIN,CMC,SPICI,MOCDE,MCL能夠更準確,更有效地識彆齣具有特定生物意義的蛋白質複閤物.
종대규모상호작용망락중식별단백질복합물,대해석특정적생물진정화예측단백질공능구유중요작용,동시야시후기인조시대일개최중요적연구과제.고필도전통부기우단백질상호작용망락(PPI망락)적단백질복합물식별산법가고성불고,본문제출료일충신적융합PPI망락화기인표체수거적단백질복합물식별산법IPCIPG.구별우지전용기인표체수거평고PPI망락가고성적주법,본문제출재단백질복합물적식별과정중장PPI망락화기인표체수거유궤지결합기래.산법IPCIPG수선근거변취집계수(ECC)여단백질문공표체적상관성(PCC)계산PPI망락중매개절점적권중,권중최대적절점작위충자,연후종충자절점개시확충생성주밀자도.기우효모수거집적실험결과표명,산법IPCIPG교기타산법HUNTER,HC—PIN,CMC,SPICI,MOCDE,MCL능구경준학,경유효지식별출구유특정생물의의적단백질복합물.
Identifying protein complexes from the large-scale protein interaction network is crucial to understand principles of cellular organization and predict protein functions, which is one of the most important issues in the post-genomic era. Generally, the traditional protein complex discovery algorithms are only based on the protein-protein interaction network (PPI network), and are not so accurate. In this paper, a novel algorithm IPCIPG is proposed based on the integration of the PPI network and the gene expression data. Different from other previous methods which use gene expression data to evaluate the reliability of PPIs, IPCIPG integrates the gene expression data into PPI network during the identification of protein complexes. IPCIPG uses the edge clustering coefficient (ECC) and the co-expression correlation between proteins (PCC) to calculate the weight of each node in the PPI network. And then the node with the highest weight is selected as seed, then, a dense sub-graph will be obtained by extending from the seed. The experiment results on the data of Saccharomyces cerevisiae show that IPCIPG can identify the protein complexes with specific biological meaning more effectively, precisely and comprehensively than the other algorithms HUNTER, HC-PIN, CMC, SPICI, MOCDE, and MCL.