Globally Homogeneous, Locally Adaptive Sparse Matrix-vector Multiplication on the GPU

Markus Steinberger, Rhaleb Zayer, Hans-Peter Seidel

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review


The rising popularity of the graphics processing unit (GPU) across various numerical computing applications triggered a breakneck race to optimize key numerical kernels and in particular, the sparse matrix-vector product (SpMV). Despite great strides, most existing GPU-SpMV approaches trade off one aspect of performance against another. They either require preprocessing, exhibit inconsistent behavior, lead to execution divergence, suffer load imbalance or induce detrimental memory access patterns. In this paper, we present an uncompromising approach for SpMV on the GPU. Our approach requires no separate preprocessing or knowledge of the matrix structure and works directly on the standard compressed sparse rows (CSR) data format. From a global perspective, it exhibits a homogeneous behavior reflected in efficient memory access patterns and steady per-thread workload. From a local perspective, it avoids heterogeneous execution paths by adapting its behavior to the work load at hand, it uses an efficient encoding to keep temporary data requirements for on-chip memory low, and leads to divergence-free execution. We evaluate our approach on more than 2500 matrices comparing to vendor provided, and state-of-the-art SpMV implementations. Our approach not only significantly outperforms approaches directly operating on the CSR format ( 20% average performance increase), but also outperforms approaches that preprocess the matrix even when preprocessing time is discarded. Additionally, the same strategies lead to significant performance increase when adapted for transpose SpMV.
Original languageEnglish
Title of host publicationICS '17: Proceedings of the International Conference on Supercomputing
Place of PublicationNew York, NY, USA
ISBN (Print)978-1-4503-5020-4
Publication statusPublished - 2017
Externally publishedYes
EventInternational Conference on Supercomputing: ICS 2017 - Chicago, United States
Duration: 14 Jun 201716 Jun 2017


ConferenceInternational Conference on Supercomputing
Abbreviated titleICS '17
Country/TerritoryUnited States


  • GPU, SpMV, linear algebra, sparse matrix


Dive into the research topics of 'Globally Homogeneous, Locally Adaptive Sparse Matrix-vector Multiplication on the GPU'. Together they form a unique fingerprint.

Cite this