Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Yifan Chen, Qi Zeng, Dilek Hakkani-Tur, Di Jin, Heng Ji, Yun Yang

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

1 Citation (Scopus)

Abstract

Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection, respectively. These two models are intrinsically connected, and to understand their connection we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.

Original languageEnglish
Title of host publicationProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies
PublisherAssociation for Computational Linguistics (ACL)
Pages5187-5199
Number of pages13
ISBN (Electronic)9781955917711
Publication statusPublished - Jul 2022
Event2022 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2022 - Virtual, Seattle, United States
Duration: 10 Jul 202215 Jul 2022
https://2022.naacl.org/ (Conference website)

Conference

Conference2022 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2022
Country/TerritoryUnited States
CitySeattle
Period10/07/2215/07/22
Internet address

Scopus Subject Areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences'. Together they form a unique fingerprint.

Cite this