生物化学与生物物理进展
生物化學與生物物理進展
생물화학여생물물리진전
PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS
2002年
1期
56-59
,共4页
基因识别%隐马尔科夫模型%Viterbi算法
基因識彆%隱馬爾科伕模型%Viterbi算法
기인식별%은마이과부모형%Viterbi산법
gene finding%hidden Markov model%Viterbi
在基因组测序工作完成后,利用计算工具进行基因识别以及基因结构预测受到了越来越多人的重视.人们开发了大量的相关应用软件,如GenScan, Genemark, GRAIL等,这些软件在寻找新基因方面提供了很重要的线索.但基因的识别和预测问题仍未得到完全解决,当目标基因的编码序列有缺失和插入时,其预测结果和基因的实际结构相差很大.为了消除测序错误对预测结果的影响,希望能找出编码序列区的测序错误.基于这种想法,尝试根据DNA序列的一些统计特性,利用隐马尔科夫模型(Hidden Markov Model),引入缺失和插入状态,然后用Viterbi算法,从中找出含有缺失和插入的外显子序列片段.在常用的Burset/Guigo检测集进行检测,得到的结果在外显子水平上,Sn(sensitivity)和Sp(specificity)均达到84%以上.
在基因組測序工作完成後,利用計算工具進行基因識彆以及基因結構預測受到瞭越來越多人的重視.人們開髮瞭大量的相關應用軟件,如GenScan, Genemark, GRAIL等,這些軟件在尋找新基因方麵提供瞭很重要的線索.但基因的識彆和預測問題仍未得到完全解決,噹目標基因的編碼序列有缺失和插入時,其預測結果和基因的實際結構相差很大.為瞭消除測序錯誤對預測結果的影響,希望能找齣編碼序列區的測序錯誤.基于這種想法,嘗試根據DNA序列的一些統計特性,利用隱馬爾科伕模型(Hidden Markov Model),引入缺失和插入狀態,然後用Viterbi算法,從中找齣含有缺失和插入的外顯子序列片段.在常用的Burset/Guigo檢測集進行檢測,得到的結果在外顯子水平上,Sn(sensitivity)和Sp(specificity)均達到84%以上.
재기인조측서공작완성후,이용계산공구진행기인식별이급기인결구예측수도료월래월다인적중시.인문개발료대량적상관응용연건,여GenScan, Genemark, GRAIL등,저사연건재심조신기인방면제공료흔중요적선색.단기인적식별화예측문제잉미득도완전해결,당목표기인적편마서렬유결실화삽입시,기예측결과화기인적실제결구상차흔대.위료소제측서착오대예측결과적영향,희망능조출편마서렬구적측서착오.기우저충상법,상시근거DNA서렬적일사통계특성,이용은마이과부모형(Hidden Markov Model),인입결실화삽입상태,연후용Viterbi산법,종중조출함유결실화삽입적외현자서렬편단.재상용적Burset/Guigo검측집진행검측,득도적결과재외현자수평상,Sn(sensitivity)화Sp(specificity)균체도84%이상.
After more and more genome sequencing projects, like the "Human Genome Project", the prediction of genes, including their coding region and their regulatory region, has received a lot of attention. Softwares such as GENSCAN and GeneMark are powerful, but still do not meet the requirement of the practical application. The GENSCAN predicts exons accurately, if the sequences predicted does not have insertions and deletions in their coding regions. But if it does have, even only one, the prediction could be disturbed seriously and satisfactory results can not be obtained. A hidden Markov model with states of deletions, insertions and main state is introduced to find the error of deletions and insertions. The result shows that sensitivity and specificity in exon level are both higher than 84% on the Burset/Guigò test data set.