Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format

Shaohuai Shi, Qiang Wang, Xiaowen Chu

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

12 Citations (Scopus)

Abstract

Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing work under-looks the performance optimization of SpDM on modern manycore architectures like GPUs. The storage data structures help sparse matrices store in a memory-saving format, but they bring difficulties in optimizing the performance of SpDM on modern GPUs due to irregular data access of the sparse structure, which results in lower resource utilization and poorer performance. In this paper, we refer to the roofline performance model of GPUs to design an efficient SpDM algorithm called GCOOSpDM, in which we exploit coalescent global memory access, fast shared memory reuse, and more operations per byte of global memory traffic. Experiments are evaluated on three Nvidia GPUs (i.e., GTX 980, GTX Titan X Pascal, and Tesla P100) using a large number of matrices including a public dataset and randomly generated matrices. Experimental results show that GCOOSpDM achieves 1.5-8x speedup over Nvidia's library cuSPARSE in many matrices.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 26th International Conference on Parallel and Distributed Systems, ICPADS 2020
PublisherIEEE Computer Society
Pages19-26
Number of pages8
ISBN (Electronic)9781728190747
DOIs
Publication statusPublished - Dec 2020
Event26th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2020 - Virtual, Hong Kong
Duration: 2 Dec 20204 Dec 2020
https://ieeexplore.ieee.org/xpl/conhome/9359105/proceeding (Conference proceedings )

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume2020-December
ISSN (Print)1521-9097

Conference

Conference26th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2020
Country/TerritoryHong Kong
Period2/12/204/12/20
Internet address

Scopus Subject Areas

  • Hardware and Architecture

User-Defined Keywords

  • COO
  • GCOO
  • GPU
  • Sparse Matrix Multiplication

Fingerprint

Dive into the research topics of 'Efficient sparse-dense matrix-matrix multiplication on GPUs using the customized sparse storage format'. Together they form a unique fingerprint.

Cite this