TY - JOUR
T1 - Dynamic-MLCS
T2 - Fast searching for dynamic multiple longest common subsequences in sequence stream data
AU - Fu, Yuanyuan
AU - Wang, Chunyang
AU - Zhu, Jixin
AU - Zhang, Qun
AU - Cheung, Yiuming
AU - Wang, Yuping
N1 - Publisher Copyright:
© 2024 Elsevier B.V. All rights reserved.
PY - 2024/6/7
Y1 - 2024/6/7
N2 - In recent years, data stream mining has become a popular research topic, wherein the dynamic multiple longest common subsequence problem for sequence streams (i.e., D-MLCS) is an important issue with a vast range of applications. However, the existing methods are only for static multiple longest common subsequence problem (i.e., S-MLCS) and there are no algorithms for the D-MLCS problem. Using the existing algorithms for the S-MLCS problem to solve the D-MLCS problem often leads to an overload of the main memory and is extremely time consuming as all the sequence segments should be stored into the main memory when the new segments flow in, and all updated problems should be resolved. To overcome this shortcoming, we designed a scheme for constructing dynamic Directed Acyclic Graphs (DAGs). It only needs to update the current DAG post each sequence segment flows in, without constructing the DAG from the beginning. We propose a novel algorithm for the D-MLCS problem called Dynamic-MLCS, which only needs to store a part of the key information of each streaming sequence segments, instead of storing all streaming sequence segments and scanning them repeatedly. Theoretical analyses demonstrate that the time and space complexities of the proposed algorithm are only relevant and linear to the number of the incomplete points and newly generated nodes, and the experiments conducted on the datasets of sequence streams indicate that the proposed Dynamic-MLCS is efficient and performs better than the compared state-of-the-art algorithms.
AB - In recent years, data stream mining has become a popular research topic, wherein the dynamic multiple longest common subsequence problem for sequence streams (i.e., D-MLCS) is an important issue with a vast range of applications. However, the existing methods are only for static multiple longest common subsequence problem (i.e., S-MLCS) and there are no algorithms for the D-MLCS problem. Using the existing algorithms for the S-MLCS problem to solve the D-MLCS problem often leads to an overload of the main memory and is extremely time consuming as all the sequence segments should be stored into the main memory when the new segments flow in, and all updated problems should be resolved. To overcome this shortcoming, we designed a scheme for constructing dynamic Directed Acyclic Graphs (DAGs). It only needs to update the current DAG post each sequence segment flows in, without constructing the DAG from the beginning. We propose a novel algorithm for the D-MLCS problem called Dynamic-MLCS, which only needs to store a part of the key information of each streaming sequence segments, instead of storing all streaming sequence segments and scanning them repeatedly. Theoretical analyses demonstrate that the time and space complexities of the proposed algorithm are only relevant and linear to the number of the incomplete points and newly generated nodes, and the experiments conducted on the datasets of sequence streams indicate that the proposed Dynamic-MLCS is efficient and performs better than the compared state-of-the-art algorithms.
KW - Complete point
KW - Directed Acyclic Graphs (DAG)
KW - Incomplete point
KW - Multiple Longest Common Subsequences (MLCS)
KW - Sequence stream
UR - http://www.scopus.com/inward/record.url?scp=85189701294&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.111654
DO - 10.1016/j.knosys.2024.111654
M3 - Journal article
AN - SCOPUS:85189701294
SN - 0950-7051
VL - 293
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 111654
ER -