软件学报
軟件學報
연건학보
JOURNAL OF SOFTWARE
2007年
9期
2090-2099
,共10页
张强锋%徐云%陈国良%车皓阳
張彊鋒%徐雲%陳國良%車皓暘
장강봉%서운%진국량%차호양
基因型%单体型%SNP%单体分型%单体型频率估计%三元家庭%EM算法
基因型%單體型%SNP%單體分型%單體型頻率估計%三元傢庭%EM算法
기인형%단체형%SNP%단체분형%단체형빈솔고계%삼원가정%EM산법
genotype%haplotype%SNP%haplotyping%haplotype frequencies estimate%trio%EM algorithm
研究了在门德尔遗传定理和哈代-维恩伯格平衡假设下,三元家庭基因型数据的单体分型和单体型频率估计问题.过去的研究仅仅关注个体间没有联系或者含有一般家系信息的基因型数据,而对这种特殊的三元家庭关注得不够.考虑到HAPMAP数据库中有一部分数据就基于这种三元家庭,现在有越来越多的需求要求直接分析这种特殊的家系结构.提出一个两段式的三元家庭中单体型频率的估计方法:i) 分型阶段,找出每一个三元家庭零重组单体构型;ii) 频率估计阶段,在前一阶段得到的单体构型基础上,应用EM算法来估计单体型频率.在程序包TRIOHAP中用C语言实现了单体分型算法和EM算法,并且使用模拟和实际数据测试了TRIOHAP的有效性和效率.实验结果表明,TRIOHAP要比其他那些忽略了三元家庭信息的常见单体型频率估计软件运行快很多.进一步地,由于TRIOHAP利用了这些信息,其估计结果更加可靠.
研究瞭在門德爾遺傳定理和哈代-維恩伯格平衡假設下,三元傢庭基因型數據的單體分型和單體型頻率估計問題.過去的研究僅僅關註箇體間沒有聯繫或者含有一般傢繫信息的基因型數據,而對這種特殊的三元傢庭關註得不夠.攷慮到HAPMAP數據庫中有一部分數據就基于這種三元傢庭,現在有越來越多的需求要求直接分析這種特殊的傢繫結構.提齣一箇兩段式的三元傢庭中單體型頻率的估計方法:i) 分型階段,找齣每一箇三元傢庭零重組單體構型;ii) 頻率估計階段,在前一階段得到的單體構型基礎上,應用EM算法來估計單體型頻率.在程序包TRIOHAP中用C語言實現瞭單體分型算法和EM算法,併且使用模擬和實際數據測試瞭TRIOHAP的有效性和效率.實驗結果錶明,TRIOHAP要比其他那些忽略瞭三元傢庭信息的常見單體型頻率估計軟件運行快很多.進一步地,由于TRIOHAP利用瞭這些信息,其估計結果更加可靠.
연구료재문덕이유전정리화합대-유은백격평형가설하,삼원가정기인형수거적단체분형화단체형빈솔고계문제.과거적연구부부관주개체간몰유련계혹자함유일반가계신식적기인형수거,이대저충특수적삼원가정관주득불구.고필도HAPMAP수거고중유일부분수거취기우저충삼원가정,현재유월래월다적수구요구직접분석저충특수적가계결구.제출일개량단식적삼원가정중단체형빈솔적고계방법:i) 분형계단,조출매일개삼원가정령중조단체구형;ii) 빈솔고계계단,재전일계단득도적단체구형기출상,응용EM산법래고계단체형빈솔.재정서포TRIOHAP중용C어언실현료단체분형산법화EM산법,병차사용모의화실제수거측시료TRIOHAP적유효성화효솔.실험결과표명,TRIOHAP요비기타나사홀략료삼원가정신식적상견단체형빈솔고계연건운행쾌흔다.진일보지,유우TRIOHAP이용료저사신식,기고계결과경가가고.
The problems of haplotyping and haplotype frequency estimation on trio genotype data under the Mendelian law of inheritance and the assumption of Hardy-Weinberg equilibrium are studied in this paper. Since most past efforts only focused on haplotyping on genotype data of unrelated individuals and data with general pedigrees, but gave insufficient efforts to the special case of trio genotype data, there is coming an increasing demand in analyzing them in particular, especially when taking into account that part of HAPMAP database is exactly trio data. This paper presents a two-staged method to estimate haplotype frequencies in trios: i) haplotyping stage, find haplotype configurations without recombinant for each trio; ii) frequency estimation stage, use the expectation-maximization (EM) algorithm to estimate haplotype frequencies based on these inferred haplotype configurations. Both the haplotyping algorithm and the EM algorithm are implemented in software package TRIOHAP using C language. Its effectiveness and efficiency and tested on simulated and real data sets as well. The experimental results show that, TRIOHAP runs much faster than a popular frequency estimation software which discards trio information. Moreover, because TRIOHAP utilizes such information, its estimation is more reliable.