Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome

Andreas Philipp Hassler, Ernestina Menasalvas, Francisco José García-García, Leocadio Rodríguez-Mañas, Andreas Holzinger

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Background: Increasing life expectancy results in more elderly people struggling with age related diseases and functional conditions. This poses huge challenges towards establishing new approaches for maintaining health at a higher age. An important aspect for age related deterioration of the general patient condition is frailty. The frailty syndrome is associated with a high risk for falls, hospitalization, disability, and finally increased mortality. Using predictive data mining enables the discovery of potential risk factors and can be used as clinical decision support system, which provides the medical doctor with information on the probable clinical patient outcome. This enables the professional to react promptly and to avert likely adverse events in advance. Methods: Medical data of 474 study participants containing 284 health related parameters, including questionnaire answers, blood parameters and vital parameters from the Toledo Study for Healthy Aging (TSHA) was used. Binary classification models were built in order to distinguish between frail and non-frail study subjects. Results: Using the available TSHA data and the discovered potential predictors, it was possible to design, develop and evaluate a variety of different predictive models for the frailty syndrome. The best performing model was the support vector machine (SVM, 78.31%). Moreover, a methodology was developed, making it possible to explore and to use incomplete medical data and further identify potential predictors and enable interpretability. Conclusions: This work demonstrates that it is feasible to use incomplete, imbalanced medical data for the development of a predictive model for the frailty syndrome. Moreover, potential predictive factors have been discovered, which were clinically approved by the clinicians. Future work will improve prediction accuracy, especially with regard to separating the group of frail patients into frail and pre-frail ones and analyze the differences among them.

Original languageEnglish
Article number33
JournalBMC Medical Informatics and Decision Making
Volume19
Issue number1
DOIs
Publication statusPublished - 18 Feb 2019

Fingerprint

Clinical Decision Support Systems
Data Mining
Health
Life Expectancy
Hospitalization
Mortality
Support Vector Machine
Surveys and Questionnaires

Keywords

  • Data mining
  • Data preprocessing
  • Frailty syndrome
  • Health data analytics
  • Machine learning
  • Missing value imputation
  • Predictive modeling
  • Risk factor discovery

ASJC Scopus subject areas

  • Health Policy
  • Health Informatics

Cite this

Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome. / Hassler, Andreas Philipp; Menasalvas, Ernestina; García-García, Francisco José; Rodríguez-Mañas, Leocadio; Holzinger, Andreas.

In: BMC Medical Informatics and Decision Making, Vol. 19, No. 1, 33, 18.02.2019.

Research output: Contribution to journalArticleResearchpeer-review

Hassler, Andreas Philipp ; Menasalvas, Ernestina ; García-García, Francisco José ; Rodríguez-Mañas, Leocadio ; Holzinger, Andreas. / Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome. In: BMC Medical Informatics and Decision Making. 2019 ; Vol. 19, No. 1.
@article{1dd6a57c74354faeb3996c8044f65826,
title = "Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome",
abstract = "Background: Increasing life expectancy results in more elderly people struggling with age related diseases and functional conditions. This poses huge challenges towards establishing new approaches for maintaining health at a higher age. An important aspect for age related deterioration of the general patient condition is frailty. The frailty syndrome is associated with a high risk for falls, hospitalization, disability, and finally increased mortality. Using predictive data mining enables the discovery of potential risk factors and can be used as clinical decision support system, which provides the medical doctor with information on the probable clinical patient outcome. This enables the professional to react promptly and to avert likely adverse events in advance. Methods: Medical data of 474 study participants containing 284 health related parameters, including questionnaire answers, blood parameters and vital parameters from the Toledo Study for Healthy Aging (TSHA) was used. Binary classification models were built in order to distinguish between frail and non-frail study subjects. Results: Using the available TSHA data and the discovered potential predictors, it was possible to design, develop and evaluate a variety of different predictive models for the frailty syndrome. The best performing model was the support vector machine (SVM, 78.31{\%}). Moreover, a methodology was developed, making it possible to explore and to use incomplete medical data and further identify potential predictors and enable interpretability. Conclusions: This work demonstrates that it is feasible to use incomplete, imbalanced medical data for the development of a predictive model for the frailty syndrome. Moreover, potential predictive factors have been discovered, which were clinically approved by the clinicians. Future work will improve prediction accuracy, especially with regard to separating the group of frail patients into frail and pre-frail ones and analyze the differences among them.",
keywords = "Data mining, Data preprocessing, Frailty syndrome, Health data analytics, Machine learning, Missing value imputation, Predictive modeling, Risk factor discovery",
author = "Hassler, {Andreas Philipp} and Ernestina Menasalvas and Garc{\'i}a-Garc{\'i}a, {Francisco Jos{\'e}} and Leocadio Rodr{\'i}guez-Ma{\~n}as and Andreas Holzinger",
year = "2019",
month = "2",
day = "18",
doi = "10.1186/s12911-019-0747-6",
language = "English",
volume = "19",
journal = "BMC Medical Informatics and Decision Making",
issn = "1472-6947",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome

AU - Hassler, Andreas Philipp

AU - Menasalvas, Ernestina

AU - García-García, Francisco José

AU - Rodríguez-Mañas, Leocadio

AU - Holzinger, Andreas

PY - 2019/2/18

Y1 - 2019/2/18

N2 - Background: Increasing life expectancy results in more elderly people struggling with age related diseases and functional conditions. This poses huge challenges towards establishing new approaches for maintaining health at a higher age. An important aspect for age related deterioration of the general patient condition is frailty. The frailty syndrome is associated with a high risk for falls, hospitalization, disability, and finally increased mortality. Using predictive data mining enables the discovery of potential risk factors and can be used as clinical decision support system, which provides the medical doctor with information on the probable clinical patient outcome. This enables the professional to react promptly and to avert likely adverse events in advance. Methods: Medical data of 474 study participants containing 284 health related parameters, including questionnaire answers, blood parameters and vital parameters from the Toledo Study for Healthy Aging (TSHA) was used. Binary classification models were built in order to distinguish between frail and non-frail study subjects. Results: Using the available TSHA data and the discovered potential predictors, it was possible to design, develop and evaluate a variety of different predictive models for the frailty syndrome. The best performing model was the support vector machine (SVM, 78.31%). Moreover, a methodology was developed, making it possible to explore and to use incomplete medical data and further identify potential predictors and enable interpretability. Conclusions: This work demonstrates that it is feasible to use incomplete, imbalanced medical data for the development of a predictive model for the frailty syndrome. Moreover, potential predictive factors have been discovered, which were clinically approved by the clinicians. Future work will improve prediction accuracy, especially with regard to separating the group of frail patients into frail and pre-frail ones and analyze the differences among them.

AB - Background: Increasing life expectancy results in more elderly people struggling with age related diseases and functional conditions. This poses huge challenges towards establishing new approaches for maintaining health at a higher age. An important aspect for age related deterioration of the general patient condition is frailty. The frailty syndrome is associated with a high risk for falls, hospitalization, disability, and finally increased mortality. Using predictive data mining enables the discovery of potential risk factors and can be used as clinical decision support system, which provides the medical doctor with information on the probable clinical patient outcome. This enables the professional to react promptly and to avert likely adverse events in advance. Methods: Medical data of 474 study participants containing 284 health related parameters, including questionnaire answers, blood parameters and vital parameters from the Toledo Study for Healthy Aging (TSHA) was used. Binary classification models were built in order to distinguish between frail and non-frail study subjects. Results: Using the available TSHA data and the discovered potential predictors, it was possible to design, develop and evaluate a variety of different predictive models for the frailty syndrome. The best performing model was the support vector machine (SVM, 78.31%). Moreover, a methodology was developed, making it possible to explore and to use incomplete medical data and further identify potential predictors and enable interpretability. Conclusions: This work demonstrates that it is feasible to use incomplete, imbalanced medical data for the development of a predictive model for the frailty syndrome. Moreover, potential predictive factors have been discovered, which were clinically approved by the clinicians. Future work will improve prediction accuracy, especially with regard to separating the group of frail patients into frail and pre-frail ones and analyze the differences among them.

KW - Data mining

KW - Data preprocessing

KW - Frailty syndrome

KW - Health data analytics

KW - Machine learning

KW - Missing value imputation

KW - Predictive modeling

KW - Risk factor discovery

UR - http://www.scopus.com/inward/record.url?scp=85061854188&partnerID=8YFLogxK

U2 - 10.1186/s12911-019-0747-6

DO - 10.1186/s12911-019-0747-6

M3 - Article

VL - 19

JO - BMC Medical Informatics and Decision Making

JF - BMC Medical Informatics and Decision Making

SN - 1472-6947

IS - 1

M1 - 33

ER -