Abstract
Recent studies have demonstrated that processing-in-memory (PIM) can significantly accelerate memory-intensive applications across various domains. However, based on our experiments, we discover that PIM kernels in applications inherently have diverse resource requirements. To meet these resource requirements and fully utilize PIM resources, we first attempt to introduce the kernel fusion method to fuse PIM kernels with complementary resource requirements during the execution of multiple kernels. Based on our detailed study about PIM kernel fusion, we also reveal that both the kernel fusion combination and the thread allocation schemes during kernel fusion significantly impact overall performance. To determine the kernel fusion combination and thread allocation schemes that approach optimal performance, we propose PIMFuse, a framework leveraging kernel fusion to execute multiple PIM kernels. PIMFuse relies on accurate PIM kernel duration prediction models and fusion models to efficiently execute PIM kernels. Our experiments on real PIM devices show that PIMFuse reduces kernel execution time by up to 26.63% and on average by 15.51% compared to the baseline, and PIMFuse can be integrated with existing complex PIM applications to improve their performance. PIMFuse is publicly available at https://github.com/FanYang98/PIMFuse
| Original language | English |
|---|---|
| Number of pages | 25 |
| Journal | ACM Transactions on Architecture and Code Optimization |
| DOIs | |
| Publication status | E-pub ahead of print - 27 Nov 2025 |
User-Defined Keywords
- Processing-in-memory
- kernel fusion