Instance selection aims at selecting model training data in a way that the performance of the trained models is maximized. In the context of modeling chemical processes by artificial neural networks, it can serve as an essential preprocessing step since measurement data of such processes are commonly highly clustered and thus far away from being ideally normally distributed. In this paper, four filter methods from literature and a newly proposed method for data selection are tested and combined with a convex hull data selection algorithm, which results in ten different selection approaches. These approaches are applied to five selected datasets by training feed-forward artificial neural networks with the produced split datasets. The final mean model deviation is used to quantify the algorithms’ performance and their standard deviation to provide information about their reproducibility. It is found that the convex hull extended algorithms self-organizing maps based stratified sampling with a proportional allocation rule and the newly proposed self-information-based subset selection perform best for real-world chemical engineering data.
|Fachzeitschrift||Applied Soft Computing|
|Publikationsstatus||Veröffentlicht - Dez. 2021|
ASJC Scopus subject areas