MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions

Johanna Sommer; Matthias Boehm; Alexandre V. Evfimievski; Berthold Reinwald; Peter J. Haas

doi:10.1145/3299869.3319854

MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions

Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, Peter J. Haas

Institute of Interactive Systems and Data Science (7060)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

Efficiently computing linear algebra expressions is central to machine learning (ML) systems. Most systems support sparse formats and operations because sparse matrices are ubiquitous and their dense representation can cause prohibitive overheads. Estimating the sparsity of intermediates, however, remains a key challenge when generating execution plans or performing sparse operations. These sparsity estimates are used for cost and memory estimates, format decisions, and result allocation. Existing estimators tend to focus on matrix products only, and struggle to attain good accuracy with low estimation overhead. However, a key observation is that real-world sparse matrices commonly exhibit structural properties such as a single non-zero per row, or columns with varying sparsity. In this paper, we introduce MNC (Matrix Non-zero Count), a remarkably simple, count-based matrix synopsis that exploits these structural properties for efficient, accurate, and general sparsity estimation. We describe estimators and sketch propagation for realistic linear algebra expressions. Our experiments - on a new estimation benchmark called SparsEst - show that the MNC estimator yields good accuracy with very low overhead. This behavior makes MNC practical and broadly applicable in ML systems.

Original language	English
Title of host publication	SIGMOD
Pages	1607-1623
ISBN (Electronic)	978-1-4503-5643-5
DOIs	https://doi.org/10.1145/3299869.3319854
Publication status	Published - 2019
Event	SIGMOD '19 - Amsterdam, Netherlands Duration: 30 Jun 2019 → 5 Jul 2019

Conference

Conference	SIGMOD '19
Country/Territory	Netherlands
City	Amsterdam
Period	30/06/19 → 5/07/19

Access to Document

10.1145/3299869.3319854

Cite this

@inproceedings{848314d0e4d34887803ad0ef996e3f48,

title = "MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions",

abstract = "Efficiently computing linear algebra expressions is central to machine learning (ML) systems. Most systems support sparse formats and operations because sparse matrices are ubiquitous and their dense representation can cause prohibitive overheads. Estimating the sparsity of intermediates, however, remains a key challenge when generating execution plans or performing sparse operations. These sparsity estimates are used for cost and memory estimates, format decisions, and result allocation. Existing estimators tend to focus on matrix products only, and struggle to attain good accuracy with low estimation overhead. However, a key observation is that real-world sparse matrices commonly exhibit structural properties such as a single non-zero per row, or columns with varying sparsity. In this paper, we introduce MNC (Matrix Non-zero Count), a remarkably simple, count-based matrix synopsis that exploits these structural properties for efficient, accurate, and general sparsity estimation. We describe estimators and sketch propagation for realistic linear algebra expressions. Our experiments - on a new estimation benchmark called SparsEst - show that the MNC estimator yields good accuracy with very low overhead. This behavior makes MNC practical and broadly applicable in ML systems.",

author = "Johanna Sommer and Matthias Boehm and Evfimievski, {Alexandre V.} and Berthold Reinwald and Haas, {Peter J.}",

year = "2019",

doi = "10.1145/3299869.3319854",

language = "English",

pages = "1607--1623",

booktitle = "SIGMOD",

note = "SIGMOD '19 ; Conference date: 30-06-2019 Through 05-07-2019",

}

TY - GEN

T1 - MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions

AU - Sommer, Johanna

AU - Boehm, Matthias

AU - Evfimievski, Alexandre V.

AU - Reinwald, Berthold

AU - Haas, Peter J.

PY - 2019

Y1 - 2019

N2 - Efficiently computing linear algebra expressions is central to machine learning (ML) systems. Most systems support sparse formats and operations because sparse matrices are ubiquitous and their dense representation can cause prohibitive overheads. Estimating the sparsity of intermediates, however, remains a key challenge when generating execution plans or performing sparse operations. These sparsity estimates are used for cost and memory estimates, format decisions, and result allocation. Existing estimators tend to focus on matrix products only, and struggle to attain good accuracy with low estimation overhead. However, a key observation is that real-world sparse matrices commonly exhibit structural properties such as a single non-zero per row, or columns with varying sparsity. In this paper, we introduce MNC (Matrix Non-zero Count), a remarkably simple, count-based matrix synopsis that exploits these structural properties for efficient, accurate, and general sparsity estimation. We describe estimators and sketch propagation for realistic linear algebra expressions. Our experiments - on a new estimation benchmark called SparsEst - show that the MNC estimator yields good accuracy with very low overhead. This behavior makes MNC practical and broadly applicable in ML systems.

AB - Efficiently computing linear algebra expressions is central to machine learning (ML) systems. Most systems support sparse formats and operations because sparse matrices are ubiquitous and their dense representation can cause prohibitive overheads. Estimating the sparsity of intermediates, however, remains a key challenge when generating execution plans or performing sparse operations. These sparsity estimates are used for cost and memory estimates, format decisions, and result allocation. Existing estimators tend to focus on matrix products only, and struggle to attain good accuracy with low estimation overhead. However, a key observation is that real-world sparse matrices commonly exhibit structural properties such as a single non-zero per row, or columns with varying sparsity. In this paper, we introduce MNC (Matrix Non-zero Count), a remarkably simple, count-based matrix synopsis that exploits these structural properties for efficient, accurate, and general sparsity estimation. We describe estimators and sketch propagation for realistic linear algebra expressions. Our experiments - on a new estimation benchmark called SparsEst - show that the MNC estimator yields good accuracy with very low overhead. This behavior makes MNC practical and broadly applicable in ML systems.

U2 - 10.1145/3299869.3319854

DO - 10.1145/3299869.3319854

M3 - Conference paper

SP - 1607

EP - 1623

BT - SIGMOD

T2 - SIGMOD '19

Y2 - 30 June 2019 through 5 July 2019

ER -

MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions

Abstract

Conference

Access to Document

Fingerprint

Cite this