Cluster Purging: Efficient Outlier Detection based on Rate-Distortion Theory

Maximilian Toller; Bernhard Geiger; Roman Kern

doi:10.1109/tkde.2021.3103571

Cluster Purging: Efficient Outlier Detection based on Rate-Distortion Theory

Maximilian Toller, Bernhard Geiger, Roman Kern

Publikation: Beitrag in einer Fachzeitschrift › Artikel › Begutachtung

Abstract

Rate-distortion theory-based outlier detection builds upon the rationale that a good data compression will encode outliers with unique symbols. Based on this rationale, we propose Cluster Purging, which is an extension of clustering-based outlier detection. This extension allows one to assess the representivity of clusterings, and to find data that are best represented by individual unique clusters. We propose two efficient algorithms for performing Cluster Purging, one being parameter-free, while the other algorithm has a parameter that controls representivity estimations, allowing it to be tuned in supervised setups. In an experimental evaluation, we show that Cluster Purging improves upon outliers detected from raw clusterings, and that Cluster Purging competes strongly against state-of-the-art alternatives.

Originalsprache	englisch
Fachzeitschrift	IEEE Transactions on Knowledge and Data Engineering
DOIs	https://doi.org/10.1109/tkde.2021.3103571
Publikationsstatus	Elektronische Veröffentlichung vor Drucklegung. - 10 Aug. 2021

Zugriff auf Dokument

10.1109/tkde.2021.3103571

Andere Dateien und Links

http://dx.doi.org/10.1109/tkde.2021.3103571

Dieses zitieren

@article{b60793bb27ea422eab8d4ae85b3d96c1,

title = "Cluster Purging: Efficient Outlier Detection based on Rate-Distortion Theory",

abstract = "Rate-distortion theory-based outlier detection builds upon the rationale that a good data compression will encode outliers with unique symbols. Based on this rationale, we propose Cluster Purging, which is an extension of clustering-based outlier detection. This extension allows one to assess the representivity of clusterings, and to find data that are best represented by individual unique clusters. We propose two efficient algorithms for performing Cluster Purging, one being parameter-free, while the other algorithm has a parameter that controls representivity estimations, allowing it to be tuned in supervised setups. In an experimental evaluation, we show that Cluster Purging improves upon outliers detected from raw clusterings, and that Cluster Purging competes strongly against state-of-the-art alternatives.",

author = "Maximilian Toller and Bernhard Geiger and Roman Kern",

year = "2021",

month = aug,

day = "10",

doi = "10.1109/tkde.2021.3103571",

language = "English",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1558-2191",

publisher = "Institute of Electrical and Electronics Engineers",

}

TY - JOUR

T1 - Cluster Purging: Efficient Outlier Detection based on Rate-Distortion Theory

AU - Toller, Maximilian

AU - Geiger, Bernhard

AU - Kern, Roman

PY - 2021/8/10

Y1 - 2021/8/10

N2 - Rate-distortion theory-based outlier detection builds upon the rationale that a good data compression will encode outliers with unique symbols. Based on this rationale, we propose Cluster Purging, which is an extension of clustering-based outlier detection. This extension allows one to assess the representivity of clusterings, and to find data that are best represented by individual unique clusters. We propose two efficient algorithms for performing Cluster Purging, one being parameter-free, while the other algorithm has a parameter that controls representivity estimations, allowing it to be tuned in supervised setups. In an experimental evaluation, we show that Cluster Purging improves upon outliers detected from raw clusterings, and that Cluster Purging competes strongly against state-of-the-art alternatives.

AB - Rate-distortion theory-based outlier detection builds upon the rationale that a good data compression will encode outliers with unique symbols. Based on this rationale, we propose Cluster Purging, which is an extension of clustering-based outlier detection. This extension allows one to assess the representivity of clusterings, and to find data that are best represented by individual unique clusters. We propose two efficient algorithms for performing Cluster Purging, one being parameter-free, while the other algorithm has a parameter that controls representivity estimations, allowing it to be tuned in supervised setups. In an experimental evaluation, we show that Cluster Purging improves upon outliers detected from raw clusterings, and that Cluster Purging competes strongly against state-of-the-art alternatives.

UR - http://dx.doi.org/10.1109/tkde.2021.3103571

U2 - 10.1109/tkde.2021.3103571

DO - 10.1109/tkde.2021.3103571

M3 - Article

SN - 1558-2191

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

ER -

Cluster Purging: Efficient Outlier Detection based on Rate-Distortion Theory

Abstract

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren