High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies

Dmitry Suplatov, Yana Sharapova, Maxim Shegay, Nina Popova, Kateryna Fesko, Vladimir Voevodin, Vytas Švedas

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Abstract

Construction of a multiple alignment of proteins that implement different functions within a common structural fold of a superfamily is a valuable tool in bioinformatics, but represents a challenge. The process can be seen as a pipeline of independent sequential steps of an equivalent computational complexity each performed by a different set of algorithms. In this work the overall productivity of the corresponding Mustguseal protocol was significantly improved by selecting an appropriate optimization strategy for each step of the pipeline. This HPC-installation was used to collect and superimpose within 12 h a representative set of 299’976 sequences and structures of the fold-type I PLP-dependent enzymes what appears to be the largest alignment of a protein superfamily ever constructed. The use of hybrid acceleration strategies provided a routine access to a sequence/structure comparison of evolutionarily related proteins at a scale that would previously have been intractable to study the structure-function relationship and solve practically relevant problems, thus promoting the value of bioinformatics and HPC in protein engineering and drug discovery.

Original languageEnglish
Title of host publicationSupercomputing - 5th Russian Supercomputing Days, RuSCDays 2019, Revised Selected Papers
EditorsVladimir Voevodin, Sergey Sobolev
PublisherSpringer
Pages249-264
Number of pages16
ISBN (Print)9783030365912
DOIs
Publication statusPublished - 1 Jan 2019
Event5th Russian Supercomputing Days Conference, RuSCDays 2019 - Moscow, Russian Federation
Duration: 23 Sep 201924 Sep 2019

Publication series

NameCommunications in Computer and Information Science
Volume1129 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference5th Russian Supercomputing Days Conference, RuSCDays 2019
CountryRussian Federation
CityMoscow
Period23/09/1924/09/19

Fingerprint

Bioinformatics
High Performance
Proteins
Protein
Computing
Alignment
Fold
Pipelines
Drug Discovery
Structure-function
Productivity
Computational complexity
Computational Complexity
Enzymes
Engineering
Network protocols
Optimization
Dependent
Strategy

Keywords

  • Bioinformatics
  • High-performance computing
  • Hybrid computing
  • Multiple alignment
  • Mustguseal
  • Protein superfamilies

ASJC Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

Fields of Expertise

  • Human- & Biotechnology

Cite this

Suplatov, D., Sharapova, Y., Shegay, M., Popova, N., Fesko, K., Voevodin, V., & Švedas, V. (2019). High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies. In V. Voevodin, & S. Sobolev (Eds.), Supercomputing - 5th Russian Supercomputing Days, RuSCDays 2019, Revised Selected Papers (pp. 249-264). (Communications in Computer and Information Science; Vol. 1129 CCIS). Springer. https://doi.org/10.1007/978-3-030-36592-9_21

High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies. / Suplatov, Dmitry; Sharapova, Yana; Shegay, Maxim; Popova, Nina; Fesko, Kateryna; Voevodin, Vladimir; Švedas, Vytas.

Supercomputing - 5th Russian Supercomputing Days, RuSCDays 2019, Revised Selected Papers. ed. / Vladimir Voevodin; Sergey Sobolev. Springer, 2019. p. 249-264 (Communications in Computer and Information Science; Vol. 1129 CCIS).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Suplatov, D, Sharapova, Y, Shegay, M, Popova, N, Fesko, K, Voevodin, V & Švedas, V 2019, High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies. in V Voevodin & S Sobolev (eds), Supercomputing - 5th Russian Supercomputing Days, RuSCDays 2019, Revised Selected Papers. Communications in Computer and Information Science, vol. 1129 CCIS, Springer, pp. 249-264, 5th Russian Supercomputing Days Conference, RuSCDays 2019, Moscow, Russian Federation, 23/09/19. https://doi.org/10.1007/978-3-030-36592-9_21
Suplatov D, Sharapova Y, Shegay M, Popova N, Fesko K, Voevodin V et al. High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies. In Voevodin V, Sobolev S, editors, Supercomputing - 5th Russian Supercomputing Days, RuSCDays 2019, Revised Selected Papers. Springer. 2019. p. 249-264. (Communications in Computer and Information Science). https://doi.org/10.1007/978-3-030-36592-9_21
Suplatov, Dmitry ; Sharapova, Yana ; Shegay, Maxim ; Popova, Nina ; Fesko, Kateryna ; Voevodin, Vladimir ; Švedas, Vytas. / High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies. Supercomputing - 5th Russian Supercomputing Days, RuSCDays 2019, Revised Selected Papers. editor / Vladimir Voevodin ; Sergey Sobolev. Springer, 2019. pp. 249-264 (Communications in Computer and Information Science).
@inproceedings{7ba111f0143a4260ab20916bd6fc0adf,
title = "High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies",
abstract = "Construction of a multiple alignment of proteins that implement different functions within a common structural fold of a superfamily is a valuable tool in bioinformatics, but represents a challenge. The process can be seen as a pipeline of independent sequential steps of an equivalent computational complexity each performed by a different set of algorithms. In this work the overall productivity of the corresponding Mustguseal protocol was significantly improved by selecting an appropriate optimization strategy for each step of the pipeline. This HPC-installation was used to collect and superimpose within 12 h a representative set of 299’976 sequences and structures of the fold-type I PLP-dependent enzymes what appears to be the largest alignment of a protein superfamily ever constructed. The use of hybrid acceleration strategies provided a routine access to a sequence/structure comparison of evolutionarily related proteins at a scale that would previously have been intractable to study the structure-function relationship and solve practically relevant problems, thus promoting the value of bioinformatics and HPC in protein engineering and drug discovery.",
keywords = "Bioinformatics, High-performance computing, Hybrid computing, Multiple alignment, Mustguseal, Protein superfamilies",
author = "Dmitry Suplatov and Yana Sharapova and Maxim Shegay and Nina Popova and Kateryna Fesko and Vladimir Voevodin and Vytas Švedas",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-36592-9_21",
language = "English",
isbn = "9783030365912",
series = "Communications in Computer and Information Science",
publisher = "Springer",
pages = "249--264",
editor = "Vladimir Voevodin and Sergey Sobolev",
booktitle = "Supercomputing - 5th Russian Supercomputing Days, RuSCDays 2019, Revised Selected Papers",

}

TY - GEN

T1 - High-Performance Hybrid Computing for Bioinformatic Analysis of Protein Superfamilies

AU - Suplatov, Dmitry

AU - Sharapova, Yana

AU - Shegay, Maxim

AU - Popova, Nina

AU - Fesko, Kateryna

AU - Voevodin, Vladimir

AU - Švedas, Vytas

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Construction of a multiple alignment of proteins that implement different functions within a common structural fold of a superfamily is a valuable tool in bioinformatics, but represents a challenge. The process can be seen as a pipeline of independent sequential steps of an equivalent computational complexity each performed by a different set of algorithms. In this work the overall productivity of the corresponding Mustguseal protocol was significantly improved by selecting an appropriate optimization strategy for each step of the pipeline. This HPC-installation was used to collect and superimpose within 12 h a representative set of 299’976 sequences and structures of the fold-type I PLP-dependent enzymes what appears to be the largest alignment of a protein superfamily ever constructed. The use of hybrid acceleration strategies provided a routine access to a sequence/structure comparison of evolutionarily related proteins at a scale that would previously have been intractable to study the structure-function relationship and solve practically relevant problems, thus promoting the value of bioinformatics and HPC in protein engineering and drug discovery.

AB - Construction of a multiple alignment of proteins that implement different functions within a common structural fold of a superfamily is a valuable tool in bioinformatics, but represents a challenge. The process can be seen as a pipeline of independent sequential steps of an equivalent computational complexity each performed by a different set of algorithms. In this work the overall productivity of the corresponding Mustguseal protocol was significantly improved by selecting an appropriate optimization strategy for each step of the pipeline. This HPC-installation was used to collect and superimpose within 12 h a representative set of 299’976 sequences and structures of the fold-type I PLP-dependent enzymes what appears to be the largest alignment of a protein superfamily ever constructed. The use of hybrid acceleration strategies provided a routine access to a sequence/structure comparison of evolutionarily related proteins at a scale that would previously have been intractable to study the structure-function relationship and solve practically relevant problems, thus promoting the value of bioinformatics and HPC in protein engineering and drug discovery.

KW - Bioinformatics

KW - High-performance computing

KW - Hybrid computing

KW - Multiple alignment

KW - Mustguseal

KW - Protein superfamilies

UR - http://www.scopus.com/inward/record.url?scp=85076843356&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-36592-9_21

DO - 10.1007/978-3-030-36592-9_21

M3 - Conference contribution

SN - 9783030365912

T3 - Communications in Computer and Information Science

SP - 249

EP - 264

BT - Supercomputing - 5th Russian Supercomputing Days, RuSCDays 2019, Revised Selected Papers

A2 - Voevodin, Vladimir

A2 - Sobolev, Sergey

PB - Springer

ER -