TY - JOUR
T1 - DARKER
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
AU - Zuo, Rundong
AU - Li, Guozhong
AU - Cao, Rui
AU - Choi, Byron
AU - Xu, Jianliang
AU - Bhowmick, Sourav S.
N1 - This work is supported by the Hong Kong Research Grant Council
(RIF R2002-20F, R1015-23, and C2003-23Y), and Guangdong Basic and Applied Basic Research Foundation (2023B1515130002).
Guozhong Li and Byron Choi are the corresponding authors.
Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.
PY - 2024/7
Y1 - 2024/7
N2 - Transformer-based model shave facilitated numerous applications with superior performance. A key challenge in transformers is the quadratic dependency of it straining time complexity on the length of the input sequence. A recent popular solution is using random feature attention (RFA) to approximate the costly vanilla attention mechanism. However, RFA relies on only a single, fixed projection for approximation, which does not capture the input distribution and can lead to low efficiency and accuracy, especially on time series data. In this paper, we propose DARKER, an efficient transformer with an ovel DAta-dRiven KERnel-based attention mechanism. To precisely present the technical details, this paper discusses them with a fundamental time series task, namely, time series classification(tsc). First, the main novelty of DARKER lies in approximating the soft maxkernel by learning multiple machine learning models with train able weights as multiple projections offline, moving beyond the limitation of a fixed projection. Second, we propose a projection index (calledpIndex) to efficiently search the most suitable projection for the input for training transformer. As a result, the over all time complexity of DARKER is linear with the input length. Third, we propose an indexing technique for efficiently computing the inputs required for transformer training. Finally, we evaluate our method on 14 real-world and 2 synthetic time series data sets. The experiments show that DARKER is 3×4×faster than vanilla transformer and 1.5×-3×faster than other SOTAs for long sequences. In addition, the accuracy of DARKER is comparable to or higher than that of all compared transformers.
AB - Transformer-based model shave facilitated numerous applications with superior performance. A key challenge in transformers is the quadratic dependency of it straining time complexity on the length of the input sequence. A recent popular solution is using random feature attention (RFA) to approximate the costly vanilla attention mechanism. However, RFA relies on only a single, fixed projection for approximation, which does not capture the input distribution and can lead to low efficiency and accuracy, especially on time series data. In this paper, we propose DARKER, an efficient transformer with an ovel DAta-dRiven KERnel-based attention mechanism. To precisely present the technical details, this paper discusses them with a fundamental time series task, namely, time series classification(tsc). First, the main novelty of DARKER lies in approximating the soft maxkernel by learning multiple machine learning models with train able weights as multiple projections offline, moving beyond the limitation of a fixed projection. Second, we propose a projection index (calledpIndex) to efficiently search the most suitable projection for the input for training transformer. As a result, the over all time complexity of DARKER is linear with the input length. Third, we propose an indexing technique for efficiently computing the inputs required for transformer training. Finally, we evaluate our method on 14 real-world and 2 synthetic time series data sets. The experiments show that DARKER is 3×4×faster than vanilla transformer and 1.5×-3×faster than other SOTAs for long sequences. In addition, the accuracy of DARKER is comparable to or higher than that of all compared transformers.
UR - http://www.scopus.com/inward/record.url?scp=85205379885&partnerID=8YFLogxK
U2 - 10.14778/3681954.3681996
DO - 10.14778/3681954.3681996
M3 - Conference article
AN - SCOPUS:85205379885
SN - 2150-8097
VL - 17
SP - 3229
EP - 3242
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 11
Y2 - 26 August 2024 through 30 August 2024
ER -