Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method

Yifan Chen, Qi Zeng, Heng Ji, Yun Yang

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

29 Citations (Scopus)

Abstract

Transformers are expensive to train due to the quadratic time and space complexity in the self-attention mechanism. On the other hand, although kernel machines suffer from the same computation bottleneck in pairwise dot products, several approximation schemes have been successfully incorporated to considerably reduce their computational cost without sacrificing too much accuracy. In this work, we leverage the computation methods for kernel machines to alleviate the high computational cost and introduce Skyformer, which replaces the softmax structure with a Gaussian kernel to stabilize the model training and adapts the Nyström method to a non-positive semidefinite matrix to accelerate the computation. We further conduct theoretical analysis by showing that the matrix approximation error of our proposed method is small in the spectral norm. Experiments on Long Range Arena benchmark show that the proposed method is sufficient in getting comparable or even better performance than the full self-attention while requiring fewer computation resources.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 34 (NeurIPS 2021)
EditorsMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
PublisherNeural Information Processing Systems Foundation
Pages2122-2135
Number of pages14
ISBN (Electronic)9781713845393
Publication statusPublished - Dec 2021
Event35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual
Duration: 6 Dec 202114 Dec 2021
https://nips.cc/Conferences/2021 (Conference website)
https://neurips.cc/Conferences/2021 (Conference website)
https://papers.nips.cc/paper_files/paper/2021 (Conference proceedings)
https://proceedings.neurips.cc/paper/2021 (Conference proceedings)

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258

Conference

Conference35th Conference on Neural Information Processing Systems, NeurIPS 2021
Period6/12/2114/12/21
Internet address

Fingerprint

Dive into the research topics of 'Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method'. Together they form a unique fingerprint.

Cite this