Comparison and evaluation of integrative methods for the analysis of multilevel omics data: a study based on simulated and experimental cancer data

Bettina M Pucher; Oana A Zeleznik; Gerhard G Thallinger

doi:10.1093/bib/bby027

Comparison and evaluation of integrative methods for the analysis of multilevel omics data: a study based on simulated and experimental cancer data

Bettina M Pucher, Oana A Zeleznik, Gerhard G Thallinger

Publikation: Beitrag in einer Fachzeitschrift › Artikel › Begutachtung

Abstract

Integrative analysis aims to identify the driving factors of a biological process by the joint exploration of data from multiple cellular levels. The volume of omics data produced is constantly increasing, and so too does the collection of tools for its analysis. Comparative studies assessing performance and the biological value of results, however, are rare but in great demand. We present a comprehensive comparison of three integrative analysis approaches, sparse canonical correlation analysis (sCCA), non-negative matrix factorization (NMF) and logic data mining MicroArray Logic Analyzer (MALA), by applying them to simulated and experimental omics data. We find that sCCA and NMF are able to identify differential features in simulated data, while the Logic Data Mining method, MALA, falls short. Applied to experimental data, we show that MALA performs best in terms of sample classification accuracy, and in general, the classification power of prioritized feature sets is high (97.1-99.5% accuracy). The proportion of features identified by at least one of the other methods, however, is approximately 60% for sCCA and NMF and nearly 30% for MALA, and the proportion of features jointly identified by all methods is only around 16%. Similarly, the congruence on functional levels (Gene Ontology, Reactome) is low. Furthermore, the agreement of identified feature sets with curated gene signatures relevant to the investigated disease is modest. We discuss possible reasons for the moderate overlap of identified feature sets with each other and with curated cancer signatures. The R code to create simulated data, results and figures is provided at https://github.com/ThallingerLab/IamComparison.

Originalsprache	englisch
Seiten (von - bis)	671-681
Fachzeitschrift	Briefings in Bioinformatics
Jahrgang	20
Ausgabenummer	2
Frühes Online-Datum	23 Apr. 2018
DOIs	https://doi.org/10.1093/bib/bby027
Publikationsstatus	Veröffentlicht - 25 März 2019

Fields of Expertise

Human- & Biotechnology
Information, Communication & Computing

Treatment code (Nähere Zuordnung)

Basic - Fundamental (Grundlagenforschung)

UN SDGs

Dieser Output leistet einen Beitrag zu folgendem(n) Ziel(en) für nachhaltige Entwicklung

Zugriff auf Dokument

10.1093/bib/bby027

Dieses zitieren

@article{9342e23b6b194467b29feb9c62633e50,

title = "Comparison and evaluation of integrative methods for the analysis of multilevel omics data: a study based on simulated and experimental cancer data",

abstract = "Integrative analysis aims to identify the driving factors of a biological process by the joint exploration of data from multiple cellular levels. The volume of omics data produced is constantly increasing, and so too does the collection of tools for its analysis. Comparative studies assessing performance and the biological value of results, however, are rare but in great demand. We present a comprehensive comparison of three integrative analysis approaches, sparse canonical correlation analysis (sCCA), non-negative matrix factorization (NMF) and logic data mining MicroArray Logic Analyzer (MALA), by applying them to simulated and experimental omics data. We find that sCCA and NMF are able to identify differential features in simulated data, while the Logic Data Mining method, MALA, falls short. Applied to experimental data, we show that MALA performs best in terms of sample classification accuracy, and in general, the classification power of prioritized feature sets is high (97.1-99.5% accuracy). The proportion of features identified by at least one of the other methods, however, is approximately 60% for sCCA and NMF and nearly 30% for MALA, and the proportion of features jointly identified by all methods is only around 16%. Similarly, the congruence on functional levels (Gene Ontology, Reactome) is low. Furthermore, the agreement of identified feature sets with curated gene signatures relevant to the investigated disease is modest. We discuss possible reasons for the moderate overlap of identified feature sets with each other and with curated cancer signatures. The R code to create simulated data, results and figures is provided at https://github.com/ThallingerLab/IamComparison.",

author = "Pucher, {Bettina M} and Zeleznik, {Oana A} and Thallinger, {Gerhard G}",

year = "2019",

month = mar,

day = "25",

doi = "10.1093/bib/bby027",

language = "English",

volume = "20",

pages = "671--681",

journal = "Briefings in Bioinformatics",

issn = "1477-4054",

publisher = "Oxford University Press",

number = "2",

}

TY - JOUR

T1 - Comparison and evaluation of integrative methods for the analysis of multilevel omics data

T2 - a study based on simulated and experimental cancer data

AU - Pucher, Bettina M

AU - Zeleznik, Oana A

AU - Thallinger, Gerhard G

PY - 2019/3/25

Y1 - 2019/3/25

N2 - Integrative analysis aims to identify the driving factors of a biological process by the joint exploration of data from multiple cellular levels. The volume of omics data produced is constantly increasing, and so too does the collection of tools for its analysis. Comparative studies assessing performance and the biological value of results, however, are rare but in great demand. We present a comprehensive comparison of three integrative analysis approaches, sparse canonical correlation analysis (sCCA), non-negative matrix factorization (NMF) and logic data mining MicroArray Logic Analyzer (MALA), by applying them to simulated and experimental omics data. We find that sCCA and NMF are able to identify differential features in simulated data, while the Logic Data Mining method, MALA, falls short. Applied to experimental data, we show that MALA performs best in terms of sample classification accuracy, and in general, the classification power of prioritized feature sets is high (97.1-99.5% accuracy). The proportion of features identified by at least one of the other methods, however, is approximately 60% for sCCA and NMF and nearly 30% for MALA, and the proportion of features jointly identified by all methods is only around 16%. Similarly, the congruence on functional levels (Gene Ontology, Reactome) is low. Furthermore, the agreement of identified feature sets with curated gene signatures relevant to the investigated disease is modest. We discuss possible reasons for the moderate overlap of identified feature sets with each other and with curated cancer signatures. The R code to create simulated data, results and figures is provided at https://github.com/ThallingerLab/IamComparison.

AB - Integrative analysis aims to identify the driving factors of a biological process by the joint exploration of data from multiple cellular levels. The volume of omics data produced is constantly increasing, and so too does the collection of tools for its analysis. Comparative studies assessing performance and the biological value of results, however, are rare but in great demand. We present a comprehensive comparison of three integrative analysis approaches, sparse canonical correlation analysis (sCCA), non-negative matrix factorization (NMF) and logic data mining MicroArray Logic Analyzer (MALA), by applying them to simulated and experimental omics data. We find that sCCA and NMF are able to identify differential features in simulated data, while the Logic Data Mining method, MALA, falls short. Applied to experimental data, we show that MALA performs best in terms of sample classification accuracy, and in general, the classification power of prioritized feature sets is high (97.1-99.5% accuracy). The proportion of features identified by at least one of the other methods, however, is approximately 60% for sCCA and NMF and nearly 30% for MALA, and the proportion of features jointly identified by all methods is only around 16%. Similarly, the congruence on functional levels (Gene Ontology, Reactome) is low. Furthermore, the agreement of identified feature sets with curated gene signatures relevant to the investigated disease is modest. We discuss possible reasons for the moderate overlap of identified feature sets with each other and with curated cancer signatures. The R code to create simulated data, results and figures is provided at https://github.com/ThallingerLab/IamComparison.

U2 - 10.1093/bib/bby027

DO - 10.1093/bib/bby027

M3 - Article

C2 - 29688321

SN - 1477-4054

VL - 20

SP - 671

EP - 681

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

IS - 2

ER -

Comparison and evaluation of integrative methods for the analysis of multilevel omics data: a study based on simulated and experimental cancer data

Abstract

Fields of Expertise

Treatment code (Nähere Zuordnung)

UN SDGs

Zugriff auf Dokument

Fingerprint

Dieses zitieren