Abstract
Deep learning demonstrates effectiveness across a wide range of tasks. However, the dense and over-parameterized nature of these models results in significant resource consumption during deployment. In response to this issue, weight pruning, particularly through $N: M$ sparsity matrix multiplication, offers an efficient solution by transforming dense operations into semisparse ones. $N: M$ sparsity provides an option for balancing performance and model accuracy, but introduces more complex programming and optimization challenges. To address these issues, we design a systematic top-down performance analysis model for $N: M$ sparsity. Meanwhile, NM-SpMM is proposed as an efficient general $N: M$ sparsity implementation. Based on our performance analysis, NM-SpMM employs a hierarchical blocking mechanism as a general optimization to enhance data locality, while memory access optimization and pipeline design are introduced as sparsity-aware optimization, allowing it to achieve close-to-theoretical peak performance across different sparsity levels. Experimental results show that NM-SpMM is 2.1x faster than nmSPARSE (the state-of-the-art for general $N: M$ sparsity) and 1.4× to 6.3× faster than cuBLAS's dense GEMM operations, closely approaching the theoretical maximum speedup resulting from the reduction in computation due to sparsity. NM-SpMM is open source and publicly available at https://github.com/M-H482/NM-SpMM.
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS) |
| Editors | Lisa O'Conner |
| Place of Publication | Milano |
| Publisher | IEEE |
| Pages | 926-937 |
| Number of pages | 12 |
| ISBN (Electronic) | 9798331532376 |
| ISBN (Print) | 9798331532383 |
| DOIs | |
| Publication status | Published - 7 Jun 2025 |
| Event | 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS) - Politecnico di Milano, Milano, Italy Duration: 3 Jun 2025 → 7 Jun 2025 https://www.ipdps.org/ipdps2025/index.html (Conference website) https://ieeexplore.ieee.org/xpl/conhome/11078457/proceeding (Conference proceeding) https://www.ipdps.org/ipdps2025/2025-advance-program.html (Conference program) |
Publication series
| Name | International Symposium on Parallel and Distributed Processing |
|---|---|
| Publisher | IEEE |
Conference
| Conference | 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS) |
|---|---|
| Abbreviated title | IPDPS 2025 |
| Country/Territory | Italy |
| City | Milano |
| Period | 3/06/25 → 7/06/25 |
| Internet address |
|
User-Defined Keywords
- N:M sparsity
- GPU
- Performance Optimization