上海第二工业大学学报
上海第二工業大學學報
상해제이공업대학학보
Journal of Shanghai Second Polytechnic University
2015年
3期
260-265
,共6页
蛋白质亚细胞定位%预测%半监督%降维
蛋白質亞細胞定位%預測%半鑑督%降維
단백질아세포정위%예측%반감독%강유
subcelluler localization of protein%prediction%semi-supervised%dimension reduction
首先采用伪氨基酸组成(PseAA)和特定位点记分矩阵(PSSM)2种方法组合的特征提取方法来表达蛋白质序列.通过该方法将蛋白质序列转化成特征向量,虽然该向量在很大程度上保留了蛋白质序列的原始信息,但是它产生的相应的维数会很高,这使得蛋白质亚细胞位置的预测过程变得很复杂.同时,就目前的情况来看,想要获取大量已标记的蛋白质亚细胞位置样本也很困难.为了解决这些问题,提出采用半监督降维算法(SS-MVP)对特征向量进行降维的同时能从标记和未标记的样本点中提取对分类有用的信息.基于降维后的样本利用支持向量机(SVM)的算法来预测蛋白质亚细胞位置类型.实验结果表明,采用上述方法既能简化蛋白质亚细胞位置的预测系统,又能提高其分类性能.
首先採用偽氨基痠組成(PseAA)和特定位點記分矩陣(PSSM)2種方法組閤的特徵提取方法來錶達蛋白質序列.通過該方法將蛋白質序列轉化成特徵嚮量,雖然該嚮量在很大程度上保留瞭蛋白質序列的原始信息,但是它產生的相應的維數會很高,這使得蛋白質亞細胞位置的預測過程變得很複雜.同時,就目前的情況來看,想要穫取大量已標記的蛋白質亞細胞位置樣本也很睏難.為瞭解決這些問題,提齣採用半鑑督降維算法(SS-MVP)對特徵嚮量進行降維的同時能從標記和未標記的樣本點中提取對分類有用的信息.基于降維後的樣本利用支持嚮量機(SVM)的算法來預測蛋白質亞細胞位置類型.實驗結果錶明,採用上述方法既能簡化蛋白質亞細胞位置的預測繫統,又能提高其分類性能.
수선채용위안기산조성(PseAA)화특정위점기분구진(PSSM)2충방법조합적특정제취방법래표체단백질서렬.통과해방법장단백질서렬전화성특정향량,수연해향량재흔대정도상보류료단백질서렬적원시신식,단시타산생적상응적유수회흔고,저사득단백질아세포위치적예측과정변득흔복잡.동시,취목전적정황래간,상요획취대량이표기적단백질아세포위치양본야흔곤난.위료해결저사문제,제출채용반감독강유산법(SS-MVP)대특정향량진행강유적동시능종표기화미표기적양본점중제취대분류유용적신식.기우강유후적양본이용지지향량궤(SVM)적산법래예측단백질아세포위치류형.실험결과표명,채용상술방법기능간화단백질아세포위치적예측계통,우능제고기분류성능.
Firstly, a fusion feature extraction method by combining Pseudo Amino Acid composition (PseAA) and Position-Specific Scoring Matrix (PSSM) is adopted to represent the features of proteins. Through this method, proteins are changed to feature vectors which can mostly retain the original information of protein sequence. But this high-dimensional feature vectors produced by using this fusion method may make the prediction system of protein subcelluler localization complex. At the same time, to obtain a large sample of marked protein subcellular location is also very difficult. To overcome these problems, a dimensionality reduction algorithm called Semi-Supervised Maximum Variance Projections (SS-MVP) is introduced to reduce the dimensional of feature vectors and extract useful information for classification from labeled and unlabeled sample points at the same time. Based on the reduced samples, Support Vector Machine (SVM) was applied for the prediction of protein subcelluler localization. Finally, the obtained results prove that the prediction system of protein subcelluler localization is simplified and classification performances are improved by adopting aboved methods.