VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting

Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

3 Citations (Scopus)

Abstract

Combining Convolutional Neural Networks (CNNs) or Vision Transformers(ViTs) with Recurrent Neural Networks (RNNs) for spatiotemporal forecasting has yielded unparalleled results in predicting temporal and spatial dynamics. However, modeling extensive global information remains a formidable challenge; CNNs are limited by their narrow receptive fields, and ViTs struggle with the intensive computational demands of their attention mechanisms. The emergence of recent Mamba-based architectures has been met with enthusiasm for their exceptional long-sequence modeling capabilities, surpassing established vision models in efficiency and accuracy, which motivates us to develop an innovative architecture tailored for spatiotemporal forecasting. In this paper, we propose the VMRNN cell, a new recurrent unit that integrates the strengths of Vision Mamba blocks with LSTM. We construct a network centered on VMRNN cells to tackle spatiotemporal prediction tasks effectively. Our extensive evaluations show that our proposed approach secures competitive results on a variety of tasks while maintaining a smaller model size. Our code is available at https://github.com/yyyujintang/VMRNN-PyTorch.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PublisherIEEE
Pages5663-5673
Number of pages11
ISBN (Electronic)9798350365474
ISBN (Print)9798350365481
DOIs
Publication statusPublished - Jun 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 - Seattle Convention Center, Seattle, United States
Duration: 16 Jun 202422 Jun 2024
https://ieeexplore.ieee.org/xpl/conhome/10677511/proceeding (Conference proceeding)
https://cvpr.thecvf.com/virtual/2024/index.html (Conference website)
https://cvpr.thecvf.com/virtual/2024/calendar (Conference schedule)

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
Country/TerritoryUnited States
CitySeattle
Period16/06/2422/06/24
Internet address

Scopus Subject Areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting'. Together they form a unique fingerprint.

Cite this