后勤工程学院学报
後勤工程學院學報
후근공정학원학보
JOURNAL OF LOGISTICAL ENGINEERING UNIVERSITY
2013年
5期
64-70
,共7页
方海洋%赵静%汪益川%宗福兴
方海洋%趙靜%汪益川%宗福興
방해양%조정%왕익천%종복흥
膜蛋白类型%神经网络集成%Bagging%投票策略
膜蛋白類型%神經網絡集成%Bagging%投票策略
막단백류형%신경망락집성%Bagging%투표책략
membrane protein types%neural network ensemble%Bagging%voting
由Chou等人提出的预测膜蛋白分类的机器学习算法在近年来不断改进,使得预测膜蛋白类型的准确率越来越高。但是由于膜蛋白类分布不均衡而导致少数类的预测准确率非常低,使用神经网络集成方法能解决此问题。该方法中Bagging算法通过对多数类欠采样和少数类过采样来解决膜蛋白训练数据集不均衡问题。此外,用神经网络集成方法对已训练数据集和独立数据集进行分类测试,得出神经网络集成方法预测效果优于单个最好神经网络。该方法为解决蛋白质分类预测问题提供了一种新的策略,特别是训练数据集不均衡时,该方法的优势更加明显。
由Chou等人提齣的預測膜蛋白分類的機器學習算法在近年來不斷改進,使得預測膜蛋白類型的準確率越來越高。但是由于膜蛋白類分佈不均衡而導緻少數類的預測準確率非常低,使用神經網絡集成方法能解決此問題。該方法中Bagging算法通過對多數類欠採樣和少數類過採樣來解決膜蛋白訓練數據集不均衡問題。此外,用神經網絡集成方法對已訓練數據集和獨立數據集進行分類測試,得齣神經網絡集成方法預測效果優于單箇最好神經網絡。該方法為解決蛋白質分類預測問題提供瞭一種新的策略,特彆是訓練數據集不均衡時,該方法的優勢更加明顯。
유Chou등인제출적예측막단백분류적궤기학습산법재근년래불단개진,사득예측막단백류형적준학솔월래월고。단시유우막단백류분포불균형이도치소수류적예측준학솔비상저,사용신경망락집성방법능해결차문제。해방법중Bagging산법통과대다수류흠채양화소수류과채양래해결막단백훈련수거집불균형문제。차외,용신경망락집성방법대이훈련수거집화독립수거집진행분류측시,득출신경망락집성방법예측효과우우단개최호신경망락。해방법위해결단백질분류예측문제제공료일충신적책략,특별시훈련수거집불균형시,해방법적우세경가명현。
As a continuous effort to develop machine learning algorithms to predict membrane protein types that was initiated by Chou and Elrod,this study focuses on dealing with the problem of imbalanced training set of membrane protein types with the neural network ensemble. Bagging algorithm of the neural network ensemble has the advantage of dealing with imbalanced training set of membrane protein types by over-sampling minority classes and under-sampling majority classes. Furthermore,the perfor-mance of the neural network ensemble is found to be superior to the single best model from the results obtained through resubstitu-tion and independent dataset tests. The current approach represents a new strategy to deal with the problems of protein attribute pre-diction,especially when the training set is imbalanced,and hence is quite promising in the area of proteomics.