南京大学学报(自然科学版)
南京大學學報(自然科學版)
남경대학학보(자연과학판)
JOURNAL OF NANJING UNIVERSITY(NATURAL SCIENCES)
2009年
5期
576-584
,共9页
业宁%朱大铭%张倩倩%沈丽容
業寧%硃大銘%張倩倩%瀋麗容
업저%주대명%장천천%침려용
带约束最长公共子序列%快速算法%对偶算法
帶約束最長公共子序列%快速算法%對偶算法
대약속최장공공자서렬%쾌속산법%대우산법
constrained longest common subsequence%fast algorithm%primal-dual
带约束最长公共子序列(CLCS)问题有很深的生物学应用背景,常被用来表示同源基因序列相似性的度量,但计算CLCS时间代价很高,最早的CLCS算法的时间复杂度为O(rn~4),目前,最快的CLCS算法的时间复杂性为O(rn~2).运用对偶原理将带约束最长公共子序列问题转换为带约束最小覆盖集问题,并建立带权的ref树结构,构造包含约束序列的约束覆盖子集,约简带约束覆盖子集并从中搜索关键路径,再通过关键路径构造CLCS,该算法将算法时间复杂度提升到O(nlogn+(q+r)L),r是约束序列的长度,q是两序列序偶的个数,L是两序列的最长公共子序列(LCS)长度.
帶約束最長公共子序列(CLCS)問題有很深的生物學應用揹景,常被用來錶示同源基因序列相似性的度量,但計算CLCS時間代價很高,最早的CLCS算法的時間複雜度為O(rn~4),目前,最快的CLCS算法的時間複雜性為O(rn~2).運用對偶原理將帶約束最長公共子序列問題轉換為帶約束最小覆蓋集問題,併建立帶權的ref樹結構,構造包含約束序列的約束覆蓋子集,約簡帶約束覆蓋子集併從中搜索關鍵路徑,再通過關鍵路徑構造CLCS,該算法將算法時間複雜度提升到O(nlogn+(q+r)L),r是約束序列的長度,q是兩序列序偶的箇數,L是兩序列的最長公共子序列(LCS)長度.
대약속최장공공자서렬(CLCS)문제유흔심적생물학응용배경,상피용래표시동원기인서렬상사성적도량,단계산CLCS시간대개흔고,최조적CLCS산법적시간복잡도위O(rn~4),목전,최쾌적CLCS산법적시간복잡성위O(rn~2).운용대우원리장대약속최장공공자서렬문제전환위대약속최소복개집문제,병건립대권적ref수결구,구조포함약속서렬적약속복개자집,약간대약속복개자집병종중수색관건로경,재통과관건로경구조CLCS,해산법장산법시간복잡도제승도O(nlogn+(q+r)L),r시약속서렬적장도,q시량서렬서우적개수,L시량서렬적최장공공자서렬(LCS)장도.
The constrained longest common subsequence problem has deep background applications in biology. It is often used to express the measurement of similarity in homologous gene sequences, but the time complexity on computation of constrained longest common subsequence(CLCS) is high. The time complexity of the original CLCS algorithm is O(rn~4 ), while presently the time complexity of the fastest CLCS algorithm is O(rn~2). We use the principle of primal-dual which will convert CLCS to the constrained minimal covering set problem, and then establish ref tree structure with weight, structure constrained covering subset which contains the constrained sequence. We also reduce constrained covering subset and search critical paths from it,and finally structure CLCS through critical paths. The time complexity of this algorithm will be upgraded to O(nlogn+(q + r)L), where the r is length of the constrained sequence, q is the number of ordered pairs of the two given sequences and L is the longest common subsequence(LCS) length of the two given sequences.