计算机与应用化学
計算機與應用化學
계산궤여응용화학
COMPUTERS AND APPLIED CHEMISTRY
2010年
2期
155-158
,共4页
袁友浪%刘亮%钮冰%陆文聪%蔡煜东
袁友浪%劉亮%鈕冰%陸文聰%蔡煜東
원우랑%류량%뉴빙%륙문총%채욱동
蛋白质%核酸%支持向量机%10折交叉验证%预测模型
蛋白質%覈痠%支持嚮量機%10摺交扠驗證%預測模型
단백질%핵산%지지향량궤%10절교차험증%예측모형
protein%nucleic acid%SVMs%10-crossvalidation%prediction model
与核酸作用的蛋白质在基因功能许多方面扮演着极其重要的角色,预测蛋白质是否与核酸作用在生物信息学领域受到广泛关注.本文用氨基酸组成、氨基酸物化特性和蛋白质结构等信息作为特征参数,通过支持向量机方法预测了与核酸作用的蛋白质.分别取与rRNA,RNA和DNA作用的3个蛋白质数据集,用SVM训练,筛选最优核函数,优化核函数参数,建立分类判别模型,并用于预测蛋白质是否与核酸作用.结果表明:即使对同源相似性低于40%的蛋白质,通过用10-crossvalidation(交叉验证)方法测试上述3个数据集都分别有93.75%、83.41%、81.85%的预测正确率.用外部测试集测试所得模型分别有93.8%、8.4.2%、81.9%的预测正确率.在此基础上,我们建立了1个预测蛋白质与核酸是否作用的网上在线软件系统.网址是:http://chemdata.shu.edu.cn/protein_na.
與覈痠作用的蛋白質在基因功能許多方麵扮縯著極其重要的角色,預測蛋白質是否與覈痠作用在生物信息學領域受到廣汎關註.本文用氨基痠組成、氨基痠物化特性和蛋白質結構等信息作為特徵參數,通過支持嚮量機方法預測瞭與覈痠作用的蛋白質.分彆取與rRNA,RNA和DNA作用的3箇蛋白質數據集,用SVM訓練,篩選最優覈函數,優化覈函數參數,建立分類判彆模型,併用于預測蛋白質是否與覈痠作用.結果錶明:即使對同源相似性低于40%的蛋白質,通過用10-crossvalidation(交扠驗證)方法測試上述3箇數據集都分彆有93.75%、83.41%、81.85%的預測正確率.用外部測試集測試所得模型分彆有93.8%、8.4.2%、81.9%的預測正確率.在此基礎上,我們建立瞭1箇預測蛋白質與覈痠是否作用的網上在線軟件繫統.網阯是:http://chemdata.shu.edu.cn/protein_na.
여핵산작용적단백질재기인공능허다방면분연착겁기중요적각색,예측단백질시부여핵산작용재생물신식학영역수도엄범관주.본문용안기산조성、안기산물화특성화단백질결구등신식작위특정삼수,통과지지향량궤방법예측료여핵산작용적단백질.분별취여rRNA,RNA화DNA작용적3개단백질수거집,용SVM훈련,사선최우핵함수,우화핵함수삼수,건립분류판별모형,병용우예측단백질시부여핵산작용.결과표명:즉사대동원상사성저우40%적단백질,통과용10-crossvalidation(교차험증)방법측시상술3개수거집도분별유93.75%、83.41%、81.85%적예측정학솔.용외부측시집측시소득모형분별유93.8%、8.4.2%、81.9%적예측정학솔.재차기출상,아문건립료1개예측단백질여핵산시부작용적망상재선연건계통.망지시:http://chemdata.shu.edu.cn/protein_na.
In this work,we integrated SVMs,protein sequence amino acid composition,and associated physicochemical properties into the study of nucleic-acid-binding proteins prediction.We developed the binary classification for rRNA-,RNA-,DNA-binding proteins that play an important role in the control of many cell processes.Each SVM model can be used to predict whether a protein belongs to rRNA-,RNA-,or DNA-hinding protein class.10-crossvalidation was performed on the protein data sets in which the sequences identity was~40%.Test results show that the accuracies of SVM models for rRNA-,RNA-,DNA-binding proteins are 93.75%,83.41%,81.85%,respectively.The predictions were also performed on the test data set.The results agree well with the prior knowledge of those proteins and show the effectiveness of physicochemical properties of sequence in the protein function prediction.On the basis of our work,an online server for predicting the nucleic acid-binding proteins using SVM was available on http://chemdata,shu.edu.cn/protein_na.