Abstract
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection, respectively. These two models are intrinsically connected, and to understand their connection we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics |
Subtitle of host publication | Human Language Technologies |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 5187-5199 |
Number of pages | 13 |
ISBN (Electronic) | 9781955917711 |
Publication status | Published - Jul 2022 |
Event | 2022 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2022 - Virtual, Seattle, United States Duration: 10 Jul 2022 → 15 Jul 2022 https://2022.naacl.org/ (Conference website) |
Conference
Conference | 2022 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2022 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 10/07/22 → 15/07/22 |
Internet address |
|
Scopus Subject Areas
- Computer Networks and Communications
- Hardware and Architecture
- Information Systems
- Software