PIMFuse: Kernel Fusion for Processing-in-Memory

  • Fan Yang
  • , Zhe Zhou
  • , Yusen Li
  • , Amelie Chi Zhou
  • , Guangyu Sun
  • , Gang Wang
  • , Xiaoguang Liu
  • , Ming Chang Yang

Research output: Contribution to journalJournal articlepeer-review

Abstract

Recent studies have demonstrated that processing-in-memory (PIM) can significantly accelerate memory-intensive applications across various domains. However, based on our experiments, we discover that PIM kernels in applications inherently have diverse resource requirements. To meet these resource requirements and fully utilize PIM resources, we first attempt to introduce the kernel fusion method to fuse PIM kernels with complementary resource requirements during the execution of multiple kernels. Based on our detailed study about PIM kernel fusion, we also reveal that both the kernel fusion combination and the thread allocation schemes during kernel fusion significantly impact overall performance. To determine the kernel fusion combination and thread allocation schemes that approach optimal performance, we propose PIMFuse, a framework leveraging kernel fusion to execute multiple PIM kernels. PIMFuse relies on accurate PIM kernel duration prediction models and fusion models to efficiently execute PIM kernels. Our experiments on real PIM devices show that PIMFuse reduces kernel execution time by up to 26.63% and on average by 15.51% compared to the baseline, and PIMFuse can be integrated with existing complex PIM applications to improve their performance. PIMFuse is publicly available at https://github.com/FanYang98/PIMFuse
Original languageEnglish
Number of pages25
JournalACM Transactions on Architecture and Code Optimization
DOIs
Publication statusE-pub ahead of print - 27 Nov 2025

User-Defined Keywords

  • Processing-in-memory
  • kernel fusion

Fingerprint

Dive into the research topics of 'PIMFuse: Kernel Fusion for Processing-in-Memory'. Together they form a unique fingerprint.

Cite this