Linking genomics and population genetics with R

Emmanuel Paradis, Thierry Gosselin, Jérôme Goudet, Thibaut Jombart, Klaus Schliep

Publikation: Beitrag in einer FachzeitschriftArtikelForschungBegutachtung

Abstract

Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.

Originalspracheenglisch
Seiten (von - bis)54-66
Seitenumfang13
FachzeitschriftMolecular Ecology Resources
Jahrgang17
Ausgabenummer1
DOIs
PublikationsstatusVeröffentlicht - Jan 2017
Extern publiziertJa

Fingerprint

Metagenomics
Population Genetics
Genomics
Software
Language Development
Linkage Disequilibrium
Haplotypes
Single Nucleotide Polymorphism
Reading
Multivariate Analysis
Technology
Handling (Psychology)
Datasets

Schlagwörter

    Dies zitieren

    Linking genomics and population genetics with R. / Paradis, Emmanuel; Gosselin, Thierry; Goudet, Jérôme; Jombart, Thibaut; Schliep, Klaus.

    in: Molecular Ecology Resources, Jahrgang 17, Nr. 1, 01.2017, S. 54-66.

    Publikation: Beitrag in einer FachzeitschriftArtikelForschungBegutachtung

    Paradis, Emmanuel ; Gosselin, Thierry ; Goudet, Jérôme ; Jombart, Thibaut ; Schliep, Klaus. / Linking genomics and population genetics with R. in: Molecular Ecology Resources. 2017 ; Jahrgang 17, Nr. 1. S. 54-66.
    @article{12350047295d46809b9848e9a0bd910e,
    title = "Linking genomics and population genetics with R",
    abstract = "Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.",
    keywords = "Biostatistics/methods, Computational Biology/methods, Genetics, Population/methods, Genomics/methods, Haplotypes, Linkage Disequilibrium, Polymorphism, Single Nucleotide, Software",
    author = "Emmanuel Paradis and Thierry Gosselin and J{\'e}r{\^o}me Goudet and Thibaut Jombart and Klaus Schliep",
    year = "2017",
    month = "1",
    doi = "10.1111/1755-0998.12577",
    language = "English",
    volume = "17",
    pages = "54--66",
    journal = "Molecular Ecology Resources",
    issn = "1755-098X",
    publisher = "Wiley-Blackwell",
    number = "1",

    }

    TY - JOUR

    T1 - Linking genomics and population genetics with R

    AU - Paradis, Emmanuel

    AU - Gosselin, Thierry

    AU - Goudet, Jérôme

    AU - Jombart, Thibaut

    AU - Schliep, Klaus

    PY - 2017/1

    Y1 - 2017/1

    N2 - Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.

    AB - Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.

    KW - Biostatistics/methods

    KW - Computational Biology/methods

    KW - Genetics, Population/methods

    KW - Genomics/methods

    KW - Haplotypes

    KW - Linkage Disequilibrium

    KW - Polymorphism, Single Nucleotide

    KW - Software

    U2 - 10.1111/1755-0998.12577

    DO - 10.1111/1755-0998.12577

    M3 - Article

    VL - 17

    SP - 54

    EP - 66

    JO - Molecular Ecology Resources

    JF - Molecular Ecology Resources

    SN - 1755-098X

    IS - 1

    ER -