An Approach to Automatically Extract Predictive Properties from Nominal Attributes in Relational Databases

Valentin Kassarnig, Franz Wotawa

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

Feature engineering is a fundamental step in data mining and yet it is both difficult and expensive. Hand-crafting features is not only a time-consuming task that requires specific domain knowledge, it also may prevent new information to emerge. The extraction of meaningful features from relational data is particularly difficult due to complex relationships between tables. In the last decade there is an emerging trend towards automating the process of constructing propositional features from relational data and such approaches have been successfully used for solving numerous real-world problems. Despite their success, most of them lack an adequate support of nominal attributes. We present a new approach helping propositionalization methods to extract meaningful features from nominal attributes and improve their predictive performance. In an experimental evaluation on three datasets we demonstrate that the proposed technique is capable of producing novel features that are highly correlated with the target attribute. Furthermore, those features can reveal relationships among the distinct categorical values allowing to compare and order them. Finally, experimental results show that those new features can significantly improve the predictive performance in classification tasks.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers
Pages4932-4939
Number of pages8
ISBN (Electronic)9781538650356
DOIs
Publication statusPublished - 22 Jan 2019
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: 10 Dec 201813 Dec 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
Country/TerritoryUnited States
CitySeattle
Period10/12/1813/12/18

Keywords

  • Aggregation
  • Automated feature engineering
  • Nominal data
  • Propositionalization
  • Relational data mining

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'An Approach to Automatically Extract Predictive Properties from Nominal Attributes in Relational Databases'. Together they form a unique fingerprint.

Cite this