生命科学研究
生命科學研究
생명과학연구
LIFE SCIENCE RESEARCH
2009年
5期
403-407
,共5页
启动子%离散增量%位置权重矩阵(PWM)%支持向量机(SVM)
啟動子%離散增量%位置權重矩陣(PWM)%支持嚮量機(SVM)
계동자%리산증량%위치권중구진(PWM)%지지향량궤(SVM)
promoter sequences%increment of diversity%position weight matrix (PWM)%support vector machines (SVM)
基于已知的人类Pol Ⅱ启动子序列数据,综合选取启动子序列内容和序列信号特征,构建启动子的支持向量机分类器.分别以启动子序列的6-mer频数作为离散源参数构建序列内容特征,同时选取24个位点的3-mer频数作为序列信号特征构建PWM,将所得到的两类参数输入支持向量机对人类启动子进行预测.用10折叠交叉检验和独立数据集来衡量算法的预测能力,相关系数指标达到95%以上,结果显示结合了支持向量机的离散增量算法能够有效的提高预测成功率,是进行真核生物启动子预测的一种很有效的方法.
基于已知的人類Pol Ⅱ啟動子序列數據,綜閤選取啟動子序列內容和序列信號特徵,構建啟動子的支持嚮量機分類器.分彆以啟動子序列的6-mer頻數作為離散源參數構建序列內容特徵,同時選取24箇位點的3-mer頻數作為序列信號特徵構建PWM,將所得到的兩類參數輸入支持嚮量機對人類啟動子進行預測.用10摺疊交扠檢驗和獨立數據集來衡量算法的預測能力,相關繫數指標達到95%以上,結果顯示結閤瞭支持嚮量機的離散增量算法能夠有效的提高預測成功率,是進行真覈生物啟動子預測的一種很有效的方法.
기우이지적인류Pol Ⅱ계동자서렬수거,종합선취계동자서렬내용화서렬신호특정,구건계동자적지지향량궤분류기.분별이계동자서렬적6-mer빈수작위리산원삼수구건서렬내용특정,동시선취24개위점적3-mer빈수작위서렬신호특정구건PWM,장소득도적량류삼수수입지지향량궤대인류계동자진행예측.용10절첩교차검험화독립수거집래형량산법적예측능력,상관계수지표체도95%이상,결과현시결합료지지향량궤적리산증량산법능구유효적제고예측성공솔,시진행진핵생물계동자예측적일충흔유효적방법.
Based on the six least increment diversity, three kinds of position weight matrix, and the percent of GC in the sequences, the content vectors and the signals vector were distilled from the promoter sequences. The vectors calculated were input into a support vector machine (SVM) algorithm to build a promoter classification model. The human Pol Ⅱ promoter sequences are predicted by using of support vector machine, the 10-fold cross-validation and the independent test data were used for validating the support vector machine model. The results showed that the overall prediction accuracies (sensitivity) and specificity were more than 95%. These results indicated that the increment of diversity and support vector machines algorithm is an effective method for predicting the Eukaryotic promoter sequences.