Comparison of data selection methods for modeling chemical processes with artificial neural networks

Fabian Zapf; Thomas Wallek

doi:10.1016/j.asoc.2021.107938

Comparison of data selection methods for modeling chemical processes with artificial neural networks

Fabian Zapf, Thomas Wallek^*

^*Korrespondierende/r Autor/-in für diese Arbeit

Institut für Chemische Verfahrenstechnik und Umwelttechnik (6670)

Publikation: Beitrag in einer Fachzeitschrift › Artikel › Begutachtung

Abstract

Instance selection aims at selecting model training data in a way that the performance of the trained models is maximized. In the context of modeling chemical processes by artificial neural networks, it can serve as an essential preprocessing step since measurement data of such processes are commonly highly clustered and thus far away from being ideally normally distributed. In this paper, four filter methods from literature and a newly proposed method for data selection are tested and combined with a convex hull data selection algorithm, which results in ten different selection approaches. These approaches are applied to five selected datasets by training feed-forward artificial neural networks with the produced split datasets. The final mean model deviation is used to quantify the algorithms’ performance and their standard deviation to provide information about their reproducibility. It is found that the convex hull extended algorithms self-organizing maps based stratified sampling with a proportional allocation rule and the newly proposed self-information-based subset selection perform best for real-world chemical engineering data.

Originalsprache	englisch
Aufsatznummer	107938
Fachzeitschrift	Applied Soft Computing
Jahrgang	113
Ausgabenummer	B
DOIs	https://doi.org/10.1016/j.asoc.2021.107938
Publikationsstatus	Veröffentlicht - Dez. 2021

ASJC Scopus subject areas

Software

Zugriff auf Dokument

10.1016/j.asoc.2021.107938

Andere Dateien und Links

Verknüpfung zur Publikation in Scopus

Dieses zitieren

@article{c3c2f24d06564dafb32e1bbab6a6e8a3,

title = "Comparison of data selection methods for modeling chemical processes with artificial neural networks",

abstract = "Instance selection aims at selecting model training data in a way that the performance of the trained models is maximized. In the context of modeling chemical processes by artificial neural networks, it can serve as an essential preprocessing step since measurement data of such processes are commonly highly clustered and thus far away from being ideally normally distributed. In this paper, four filter methods from literature and a newly proposed method for data selection are tested and combined with a convex hull data selection algorithm, which results in ten different selection approaches. These approaches are applied to five selected datasets by training feed-forward artificial neural networks with the produced split datasets. The final mean model deviation is used to quantify the algorithms{\textquoteright} performance and their standard deviation to provide information about their reproducibility. It is found that the convex hull extended algorithms self-organizing maps based stratified sampling with a proportional allocation rule and the newly proposed self-information-based subset selection perform best for real-world chemical engineering data.",

keywords = "Artificial neural networks, Chemical processes, Data selection, Instance selection, Regression, Subset selection, Wolfram Mathematica",

author = "Fabian Zapf and Thomas Wallek",

note = "Funding Information: The authors gratefully acknowledge support from NAWI Graz for this work. Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = dec,

doi = "10.1016/j.asoc.2021.107938",

language = "English",

volume = "113",

journal = "Applied Soft Computing",

issn = "1568-4946",

publisher = "Elsevier B.V.",

number = "B",

}

TY - JOUR

T1 - Comparison of data selection methods for modeling chemical processes with artificial neural networks

AU - Zapf, Fabian

AU - Wallek, Thomas

PY - 2021/12

Y1 - 2021/12

N2 - Instance selection aims at selecting model training data in a way that the performance of the trained models is maximized. In the context of modeling chemical processes by artificial neural networks, it can serve as an essential preprocessing step since measurement data of such processes are commonly highly clustered and thus far away from being ideally normally distributed. In this paper, four filter methods from literature and a newly proposed method for data selection are tested and combined with a convex hull data selection algorithm, which results in ten different selection approaches. These approaches are applied to five selected datasets by training feed-forward artificial neural networks with the produced split datasets. The final mean model deviation is used to quantify the algorithms’ performance and their standard deviation to provide information about their reproducibility. It is found that the convex hull extended algorithms self-organizing maps based stratified sampling with a proportional allocation rule and the newly proposed self-information-based subset selection perform best for real-world chemical engineering data.

AB - Instance selection aims at selecting model training data in a way that the performance of the trained models is maximized. In the context of modeling chemical processes by artificial neural networks, it can serve as an essential preprocessing step since measurement data of such processes are commonly highly clustered and thus far away from being ideally normally distributed. In this paper, four filter methods from literature and a newly proposed method for data selection are tested and combined with a convex hull data selection algorithm, which results in ten different selection approaches. These approaches are applied to five selected datasets by training feed-forward artificial neural networks with the produced split datasets. The final mean model deviation is used to quantify the algorithms’ performance and their standard deviation to provide information about their reproducibility. It is found that the convex hull extended algorithms self-organizing maps based stratified sampling with a proportional allocation rule and the newly proposed self-information-based subset selection perform best for real-world chemical engineering data.

KW - Artificial neural networks

KW - Chemical processes

KW - Data selection

KW - Instance selection

KW - Regression

KW - Subset selection

KW - Wolfram Mathematica

UR - http://www.scopus.com/inward/record.url?scp=85117091142&partnerID=8YFLogxK

U2 - 10.1016/j.asoc.2021.107938

DO - 10.1016/j.asoc.2021.107938

M3 - Article

AN - SCOPUS:85117091142

SN - 1568-4946

VL - 113

JO - Applied Soft Computing

JF - Applied Soft Computing

IS - B

M1 - 107938

ER -

Comparison of data selection methods for modeling chemical processes with artificial neural networks

Abstract

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren