TY - JOUR
T1 - Constrained pairwise and center-star sequences alignment problems
AU - Zhang, Yong
AU - Chan, Joseph Wun Tat
AU - Chin, Francis Y.L.
AU - Ting, Hing Fung
AU - Ye, Deshi
AU - Zhang, Feng
AU - Shi, Jianyu
N1 - Publisher Copyright:
© 2015, Springer Science+Business Media New York.
PY - 2016/7/1
Y1 - 2016/7/1
N2 - Sequence alignment is a fundamental problem in computational biology, which is also important in theoretical computer science. In this paper, we consider the problem of aligning a set of sequences subject to a given constrained sequence. Given two sequences A= a1a2… an and B= b1b2… bn with a given distance function and a constrained sequence C= c1c2… ck, our goal is to find the optimal sequence alignment of A and B w.r.t. the constraint C. We investigate several variants of this problem. If C= ck, i.e., all characters in C are same, the optimal constrained pairwise sequence alignment can be solved in O(min { kn2, (t- k) n2}) time, where t is the minimum number of occurrences of character c in A and B. If in the final alignment, the alignment score between any two consecutive constrained characters is upper bounded by some value, which is called GB-CPSA, we give a dynamic programming with the time complexity O(kn4/ log n). For the constrained center-star sequence alignment (CCSA), we prove that it is NP-hard to achieve the optimal alignment even over the binary alphabet. Furthermore, we show a negative result for CCSA, i.e., there is no polynomial-time algorithm to approximate the CCSA within any constant ratio.
AB - Sequence alignment is a fundamental problem in computational biology, which is also important in theoretical computer science. In this paper, we consider the problem of aligning a set of sequences subject to a given constrained sequence. Given two sequences A= a1a2… an and B= b1b2… bn with a given distance function and a constrained sequence C= c1c2… ck, our goal is to find the optimal sequence alignment of A and B w.r.t. the constraint C. We investigate several variants of this problem. If C= ck, i.e., all characters in C are same, the optimal constrained pairwise sequence alignment can be solved in O(min { kn2, (t- k) n2}) time, where t is the minimum number of occurrences of character c in A and B. If in the final alignment, the alignment score between any two consecutive constrained characters is upper bounded by some value, which is called GB-CPSA, we give a dynamic programming with the time complexity O(kn4/ log n). For the constrained center-star sequence alignment (CCSA), we prove that it is NP-hard to achieve the optimal alignment even over the binary alphabet. Furthermore, we show a negative result for CCSA, i.e., there is no polynomial-time algorithm to approximate the CCSA within any constant ratio.
KW - Complexity
KW - Dynamic programming
KW - Sequence alignment
UR - http://www.scopus.com/inward/record.url?scp=84930532706&partnerID=8YFLogxK
U2 - 10.1007/s10878-015-9914-6
DO - 10.1007/s10878-015-9914-6
M3 - Journal article
AN - SCOPUS:84930532706
SN - 1382-6905
VL - 32
SP - 79
EP - 94
JO - Journal of Combinatorial Optimization
JF - Journal of Combinatorial Optimization
IS - 1
ER -