北京生物医学工程
北京生物醫學工程
북경생물의학공정
BEIJING BIOMEDICAL ENGINEERING
2014年
3期
264-268
,共5页
邹亮%李骜%韩燕%冯焕清%王明会
鄒亮%李驁%韓燕%馮煥清%王明會
추량%리오%한연%풍환청%왕명회
蛋白激酶%磷酸化%贝叶斯决策理论%生物信息学
蛋白激酶%燐痠化%貝葉斯決策理論%生物信息學
단백격매%린산화%패협사결책이론%생물신식학
protein kinase%phosphorylation%Bayesian decision theory%bioinformatics
通过提出一种新颖的生物信息学算法,以准确识别已知磷酸化位点的蛋白激酶信息,进而解决蛋白激酶的信息缺乏问题。方法根据人类激酶的聚类规则,首先从最新版本的磷酸化数据库Phospho.ELM(9.0)中提取出激酶特异性的磷酸化数据,构建用于激酶识别的数据集。然后基于贝叶斯决策理论,分析阳性数据和阴性数据中磷酸化位点附近的氨基酸分布规律,进而给出相应的统计模型并使用留一法对模型进行评估。结果对 MAPK、PKA 和 RSK 3个激酶家族的测试表明,在假阳性率不超过1%的高置信度水平下,激酶识别的准确率分别达到了23%、24%和33%。同时,该算法的识别结果明显优于 KinasePhos、Netphosk 等蛋白质磷酸化位点预测方法。结论本文提出的基于贝叶斯决策理论的磷酸化位点激酶信息识别算法可有效提高对已知磷酸化位点的蛋白激酶识别性能,有助于理解蛋白质磷酸化的生物机制。
通過提齣一種新穎的生物信息學算法,以準確識彆已知燐痠化位點的蛋白激酶信息,進而解決蛋白激酶的信息缺乏問題。方法根據人類激酶的聚類規則,首先從最新版本的燐痠化數據庫Phospho.ELM(9.0)中提取齣激酶特異性的燐痠化數據,構建用于激酶識彆的數據集。然後基于貝葉斯決策理論,分析暘性數據和陰性數據中燐痠化位點附近的氨基痠分佈規律,進而給齣相應的統計模型併使用留一法對模型進行評估。結果對 MAPK、PKA 和 RSK 3箇激酶傢族的測試錶明,在假暘性率不超過1%的高置信度水平下,激酶識彆的準確率分彆達到瞭23%、24%和33%。同時,該算法的識彆結果明顯優于 KinasePhos、Netphosk 等蛋白質燐痠化位點預測方法。結論本文提齣的基于貝葉斯決策理論的燐痠化位點激酶信息識彆算法可有效提高對已知燐痠化位點的蛋白激酶識彆性能,有助于理解蛋白質燐痠化的生物機製。
통과제출일충신영적생물신식학산법,이준학식별이지린산화위점적단백격매신식,진이해결단백격매적신식결핍문제。방법근거인류격매적취류규칙,수선종최신판본적린산화수거고Phospho.ELM(9.0)중제취출격매특이성적린산화수거,구건용우격매식별적수거집。연후기우패협사결책이론,분석양성수거화음성수거중린산화위점부근적안기산분포규률,진이급출상응적통계모형병사용류일법대모형진행평고。결과대 MAPK、PKA 화 RSK 3개격매가족적측시표명,재가양성솔불초과1%적고치신도수평하,격매식별적준학솔분별체도료23%、24%화33%。동시,해산법적식별결과명현우우 KinasePhos、Netphosk 등단백질린산화위점예측방법。결론본문제출적기우패협사결책이론적린산화위점격매신식식별산법가유효제고대이지린산화위점적단백격매식별성능,유조우리해단백질린산화적생물궤제。
Objective A novel machine learning method is proposed to identify protein kinase for known phosphorylation sites,which can solve the problem of lacking kinase information.Methods According to the hierarchy structure of human kinases,we firstly constructed datasets for each kinase or kinase cluster by using the kinase-specific phosphorylation instances extracted from the latest version of Phospho.ELM(9.0).Based on Bayesian decision theory,we analyzed the amino acid distribution of each residue around the phosphorylation sites in positive and negative dataset respectively and constructed corresponding statistical models.In addition, we evaluated the performance of this algorithm by using leave one out strategy in various datasets.Results The sensitivities of MAPK,PKA and RSK reached 23%,24% and 33% when the false positive rate was 1%.The prediction performance was also significantly better than phosphorylation site prediction methods such as KinasePhos and Netphosk.Conclusions The proposed algorithm based on Bayesian decision theory effectively enhanced the identification performance and contributed to better understanding of the biological mechanism in protein phosphorylation process.