新型工业化
新型工業化
신형공업화
New Industrialization Straregy
2012年
9期
18-30
,共13页
霍红卫%林帅%于强%张懿璞
霍紅衛%林帥%于彊%張懿璞
곽홍위%림수%우강%장의박
模体发现%数据划分%可扩展性
模體髮現%數據劃分%可擴展性
모체발현%수거화분%가확전성
motif finding%data partitioning%scalability
模体发现对于基因发现和理解基因调控关系有着重要的意义,它是生物信息学中最具挑战性的问题之一。提出了针对PMSP算法的三种数据划分方法,并在此基础上提出了基于MapReduce的模体发现算法(PMSPMR)。针对不同难度的问题,在Hadoop集群上的实验结果表明,PMSPMR算法具有良好的可扩展性。特别地,对于难度较大的模体发现问题实例, PMSPMR 算法的加速比接近于 Hadoop 集群中节点的数目。此外,对于真实数据的实验, PMSPMR 算法能够识别出真核细胞和酿酒酵母中已知的转录调控模体,表明了算法的有效性。
模體髮現對于基因髮現和理解基因調控關繫有著重要的意義,它是生物信息學中最具挑戰性的問題之一。提齣瞭針對PMSP算法的三種數據劃分方法,併在此基礎上提齣瞭基于MapReduce的模體髮現算法(PMSPMR)。針對不同難度的問題,在Hadoop集群上的實驗結果錶明,PMSPMR算法具有良好的可擴展性。特彆地,對于難度較大的模體髮現問題實例, PMSPMR 算法的加速比接近于 Hadoop 集群中節點的數目。此外,對于真實數據的實驗, PMSPMR 算法能夠識彆齣真覈細胞和釀酒酵母中已知的轉錄調控模體,錶明瞭算法的有效性。
모체발현대우기인발현화리해기인조공관계유착중요적의의,타시생물신식학중최구도전성적문제지일。제출료침대PMSP산법적삼충수거화분방법,병재차기출상제출료기우MapReduce적모체발현산법(PMSPMR)。침대불동난도적문제,재Hadoop집군상적실험결과표명,PMSPMR산법구유량호적가확전성。특별지,대우난도교대적모체발현문제실례, PMSPMR 산법적가속비접근우 Hadoop 집군중절점적수목。차외,대우진실수거적실험, PMSPMR 산법능구식별출진핵세포화양주효모중이지적전록조공모체,표명료산법적유효성。
Motif search plays an important role in gene finding and understanding gene regulation relationship, and is one of the most challenging problems in bioinformatics. This paper presents three data partitioning methods for the PMSP algorithm and proposes the PMSP MapReduce algorithm (PMSPMR) for solving motif search problems. For problems of varying difficulty, the experimental results on the Hadoop cluster demonstrate that PMSPMR has good scalability. In particular, for motif search problems with high levels of difficulty, PMSPMR shows its advantage because the speedup is almost linearly proportional to the number of nodes in the Hadoop cluster. This paper also presents experimental results on realistic biological data by identifying known transcriptional regulatory motifs in eukaryotes as well as in actual promoter sequences extracted from Saccharomyces cerevisiae.