An Approach to Automatically Extract Predictive Properties from Nominal Attributes in Relational Databases

Valentin Kassarnig, Franz Wotawa

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

Feature engineering is a fundamental step in data mining and yet it is both difficult and expensive. Hand-crafting features is not only a time-consuming task that requires specific domain knowledge, it also may prevent new information to emerge. The extraction of meaningful features from relational data is particularly difficult due to complex relationships between tables. In the last decade there is an emerging trend towards automating the process of constructing propositional features from relational data and such approaches have been successfully used for solving numerous real-world problems. Despite their success, most of them lack an adequate support of nominal attributes. We present a new approach helping propositionalization methods to extract meaningful features from nominal attributes and improve their predictive performance. In an experimental evaluation on three datasets we demonstrate that the proposed technique is capable of producing novel features that are highly correlated with the target attribute. Furthermore, those features can reveal relationships among the distinct categorical values allowing to compare and order them. Finally, experimental results show that those new features can significantly improve the predictive performance in classification tasks.

Originalspracheenglisch
TitelProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
Redakteure/-innenYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers
Seiten4932-4939
Seitenumfang8
ISBN (elektronisch)9781538650356
DOIs
PublikationsstatusVeröffentlicht - 22 Jan. 2019
Veranstaltung2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, USA / Vereinigte Staaten
Dauer: 10 Dez. 201813 Dez. 2018

Publikationsreihe

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Konferenz

Konferenz2018 IEEE International Conference on Big Data, Big Data 2018
Land/GebietUSA / Vereinigte Staaten
OrtSeattle
Zeitraum10/12/1813/12/18

ASJC Scopus subject areas

  • Angewandte Informatik
  • Information systems

Fingerprint

Untersuchen Sie die Forschungsthemen von „An Approach to Automatically Extract Predictive Properties from Nominal Attributes in Relational Databases“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren