TY - JOUR
T1 - A Branch Elimination-based Efficient Algorithm for Large-scale Multiple Longest Common Subsequence Problem
AU - Wei, Shiwei
AU - Wang, Yuping
AU - Cheung, Yiu Ming
N1 - Funding information:
This work was supported by the National Natural Science Foundation of China under Grants 61872281 and 61672444.
Publisher copyright:
© 2021 IEEE.
PY - 2021/9/30
Y1 - 2021/9/30
N2 - It is a key issue to find out all longest common subsequences of multiple sequences over a set of finite alphabets, namely MLCS problem, in computational biology, pattern recognition and information retrieval, to name a few. However, it is very challenging to tackle the large-scale MLCS problem effectively and efficiently due to the high complexity of time and space. To this end, this paper will therefore propose a Branch Elimination based Space and Time efficient algorithm called BEST-MLCS, which includes the following four key strategies: 1) Estimation scheme for the lower bound of the length of MLCS. 2) Estimation scheme for the upper bound of the length of the paths through the current node. 3) Branch elimination strategy by finding all useless match points and removing the branches not on the longest paths. 4) A new Directed Acyclic Graph (DAG) construction method for constructing the smallest DAG among the existing ones. As a result, the proposed algorithm BEST-MLCS can save a lot of space and time and can handle much larger scale MLCS problems than the existing algorithms. Extensive experiments conducted on biological DNA sequences show that the performance of the proposed algorithm BEST-MLCS outperforms three state-of-the-art algorithms in terms of run-time and memory consumption.
AB - It is a key issue to find out all longest common subsequences of multiple sequences over a set of finite alphabets, namely MLCS problem, in computational biology, pattern recognition and information retrieval, to name a few. However, it is very challenging to tackle the large-scale MLCS problem effectively and efficiently due to the high complexity of time and space. To this end, this paper will therefore propose a Branch Elimination based Space and Time efficient algorithm called BEST-MLCS, which includes the following four key strategies: 1) Estimation scheme for the lower bound of the length of MLCS. 2) Estimation scheme for the upper bound of the length of the paths through the current node. 3) Branch elimination strategy by finding all useless match points and removing the branches not on the longest paths. 4) A new Directed Acyclic Graph (DAG) construction method for constructing the smallest DAG among the existing ones. As a result, the proposed algorithm BEST-MLCS can save a lot of space and time and can handle much larger scale MLCS problems than the existing algorithms. Extensive experiments conducted on biological DNA sequences show that the performance of the proposed algorithm BEST-MLCS outperforms three state-of-the-art algorithms in terms of run-time and memory consumption.
KW - multiple longest common subsequences(MLCS)
KW - dominant point-based approach
KW - useless match point detection
KW - branch elimination
KW - smaller DAG
UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116933059&doi=10.1109%2fTKDE.2021.3115057&partnerID=40&md5=0dc164e116221e7720e49d192197290f
U2 - 10.1109/TKDE.2021.3115057
DO - 10.1109/TKDE.2021.3115057
M3 - Article
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
SN - 1041-4347
ER -