Understanding the dynamics of gene expression is a major challenge in biology and central to understanding human evolution and disease development and progression. Gene expression is primarily controlled at the level of transcription, which is mediated by transcription factor activity and changes in the chromatin structure. Chromatin structure is largely determined by histone modifications and DNA methylation, collectively known as epigenetic marks. Large international efforts such as the ENCODE and FANTOM projects are now generating a wealth of transcriptomic and epigenomic data that should enable us to elucidate the mechanisms underlying health and disease. However, the analysis of these data is not trivial. In particular, repetitive regions in the genome pose an enormous challenge to current bioinformatics tools. It has been estimated that approximately two thirds of the human genome consists of repeats or repeat-derived sequences. A predominant part of these sequences are transposable elements (TEs). The proliferation of TEs has had multiple impacts on the mammalian genome. Furthermore, their regulatory role was already recognized in the mid-1900s. Nevertheless, TEs have long been dismissed as “junk” DNA, and their contribution to transcriptional regulation, for example, by facilitating the expansion of transcription factor binding sites, is only now beginning to be explored. Because they usually align to multiple genomic regions and their interpretation is ambiguous, next-generation sequencing (NGS) pipelines most commonly exclude reads aligning to TEs. This project aims at developing computational tools to aid the genome-wide characterization of sequences of transposon origin. Specifically, we propose to analyse the epigenetic profile of transposon groups and subgroups defined based on sequence similarities and structural relationships, rather than that of individual TE copies. For this purpose, we have designed a strategy to quantify the alignment of ChIP-seq (chromatin immunoprecipitation followed by sequencing), DNase-seq (DNase l hypersensitive sites sequencing) and WGBS (whole-genome bisulfite sequencing) reads to clusters of TE copies with indistinguishable sequences. On this basis, we will i) characterize the epigenetic profile of human and mouse transposon groups and subgroups across multiple tissues and cell lines, and identify those with gene regulatory functions; ii) examine the relationship between genetic and epigenetic variation; and iii) assess the contribution of TE-associated epigenetic dysregulation to human disease. Ultimately, we aim to uncover the repertoire of gene regulatory functions re-wired by TEs during mammalian evolution.
|Effective start/end date||1/04/20 → 31/03/23|