Sequence signatures extracted from proximal promoters can be used to predict distal enhancers

Leila Taher, Robin P Smith, Mee J Kim, Nadav Ahituv, Ivan Ovcharenko

Research output: Contribution to journalArticleResearchpeer-review

Abstract

BACKGROUND: Gene expression is controlled by proximal promoters and distal regulatory elements such as enhancers. While the activity of some promoters can be invariant across tissues, enhancers tend to be highly tissue-specific.

RESULTS: We compiled sets of tissue-specific promoters based on gene expression profiles of 79 human tissues and cell types. Putative transcription factor binding sites within each set of sequences were used to train a support vector machine classifier capable of distinguishing tissue-specific promoters from control sequences. We obtained reliable classifiers for 92% of the tissues, with an area under the receiver operating characteristic curve between 60% (for subthalamic nucleus promoters) and 98% (for heart promoters). We next used these classifiers to identify tissue-specific enhancers, scanning distal non-coding sequences in the loci of the 200 most highly and lowly expressed genes. Thirty percent of reliable classifiers produced consistent enhancer predictions, with significantly higher densities in the loci of the most highly expressed compared to lowly expressed genes. Liver enhancer predictions were assessed in vivo using the hydrodynamic tail vein injection assay. Fifty-eight percent of the predictions yielded significant enhancer activity in the mouse liver, whereas a control set of five sequences was completely negative.

CONCLUSIONS: We conclude that promoters of tissue-specific genes often contain unambiguous tissue-specific signatures that can be learned and used for the de novo prediction of enhancers.

Original languageEnglish
Pages (from-to)R117
JournalGenome biology
Volume14
Issue number10
DOIs
Publication statusPublished - 2013

Fingerprint

promoter regions
prediction
gene expression
gene
Genes
tissues
tissue
Subthalamic Nucleus
liver
loci
genes
Liver
Hydrodynamics
Transcriptome
ROC Curve
hydrodynamics
train
Tail
binding sites
Veins

Keywords

  • Animals
  • Binding Sites
  • Enhancer Elements, Genetic
  • Gene Expression Regulation
  • Genome-Wide Association Study
  • Genomics/methods
  • Humans
  • Mice
  • Nucleotide Motifs
  • Organ Specificity/genetics
  • Promoter Regions, Genetic
  • Regulatory Sequences, Nucleic Acid
  • Reproducibility of Results
  • Support Vector Machine
  • Transcription Factors

Cite this

Sequence signatures extracted from proximal promoters can be used to predict distal enhancers. / Taher, Leila; Smith, Robin P; Kim, Mee J; Ahituv, Nadav; Ovcharenko, Ivan.

In: Genome biology, Vol. 14, No. 10, 2013, p. R117.

Research output: Contribution to journalArticleResearchpeer-review

Taher, Leila ; Smith, Robin P ; Kim, Mee J ; Ahituv, Nadav ; Ovcharenko, Ivan. / Sequence signatures extracted from proximal promoters can be used to predict distal enhancers. In: Genome biology. 2013 ; Vol. 14, No. 10. pp. R117.
@article{68f2c833e6de4d0ea7f2eff9549757ca,
title = "Sequence signatures extracted from proximal promoters can be used to predict distal enhancers",
abstract = "BACKGROUND: Gene expression is controlled by proximal promoters and distal regulatory elements such as enhancers. While the activity of some promoters can be invariant across tissues, enhancers tend to be highly tissue-specific.RESULTS: We compiled sets of tissue-specific promoters based on gene expression profiles of 79 human tissues and cell types. Putative transcription factor binding sites within each set of sequences were used to train a support vector machine classifier capable of distinguishing tissue-specific promoters from control sequences. We obtained reliable classifiers for 92{\%} of the tissues, with an area under the receiver operating characteristic curve between 60{\%} (for subthalamic nucleus promoters) and 98{\%} (for heart promoters). We next used these classifiers to identify tissue-specific enhancers, scanning distal non-coding sequences in the loci of the 200 most highly and lowly expressed genes. Thirty percent of reliable classifiers produced consistent enhancer predictions, with significantly higher densities in the loci of the most highly expressed compared to lowly expressed genes. Liver enhancer predictions were assessed in vivo using the hydrodynamic tail vein injection assay. Fifty-eight percent of the predictions yielded significant enhancer activity in the mouse liver, whereas a control set of five sequences was completely negative.CONCLUSIONS: We conclude that promoters of tissue-specific genes often contain unambiguous tissue-specific signatures that can be learned and used for the de novo prediction of enhancers.",
keywords = "Animals, Binding Sites, Enhancer Elements, Genetic, Gene Expression Regulation, Genome-Wide Association Study, Genomics/methods, Humans, Mice, Nucleotide Motifs, Organ Specificity/genetics, Promoter Regions, Genetic, Regulatory Sequences, Nucleic Acid, Reproducibility of Results, Support Vector Machine, Transcription Factors",
author = "Leila Taher and Smith, {Robin P} and Kim, {Mee J} and Nadav Ahituv and Ivan Ovcharenko",
year = "2013",
doi = "10.1186/gb-2013-14-10-r117",
language = "English",
volume = "14",
pages = "R117",
journal = "Genome biology",
issn = "1474-7596",
publisher = "BioMed Central",
number = "10",

}

TY - JOUR

T1 - Sequence signatures extracted from proximal promoters can be used to predict distal enhancers

AU - Taher, Leila

AU - Smith, Robin P

AU - Kim, Mee J

AU - Ahituv, Nadav

AU - Ovcharenko, Ivan

PY - 2013

Y1 - 2013

N2 - BACKGROUND: Gene expression is controlled by proximal promoters and distal regulatory elements such as enhancers. While the activity of some promoters can be invariant across tissues, enhancers tend to be highly tissue-specific.RESULTS: We compiled sets of tissue-specific promoters based on gene expression profiles of 79 human tissues and cell types. Putative transcription factor binding sites within each set of sequences were used to train a support vector machine classifier capable of distinguishing tissue-specific promoters from control sequences. We obtained reliable classifiers for 92% of the tissues, with an area under the receiver operating characteristic curve between 60% (for subthalamic nucleus promoters) and 98% (for heart promoters). We next used these classifiers to identify tissue-specific enhancers, scanning distal non-coding sequences in the loci of the 200 most highly and lowly expressed genes. Thirty percent of reliable classifiers produced consistent enhancer predictions, with significantly higher densities in the loci of the most highly expressed compared to lowly expressed genes. Liver enhancer predictions were assessed in vivo using the hydrodynamic tail vein injection assay. Fifty-eight percent of the predictions yielded significant enhancer activity in the mouse liver, whereas a control set of five sequences was completely negative.CONCLUSIONS: We conclude that promoters of tissue-specific genes often contain unambiguous tissue-specific signatures that can be learned and used for the de novo prediction of enhancers.

AB - BACKGROUND: Gene expression is controlled by proximal promoters and distal regulatory elements such as enhancers. While the activity of some promoters can be invariant across tissues, enhancers tend to be highly tissue-specific.RESULTS: We compiled sets of tissue-specific promoters based on gene expression profiles of 79 human tissues and cell types. Putative transcription factor binding sites within each set of sequences were used to train a support vector machine classifier capable of distinguishing tissue-specific promoters from control sequences. We obtained reliable classifiers for 92% of the tissues, with an area under the receiver operating characteristic curve between 60% (for subthalamic nucleus promoters) and 98% (for heart promoters). We next used these classifiers to identify tissue-specific enhancers, scanning distal non-coding sequences in the loci of the 200 most highly and lowly expressed genes. Thirty percent of reliable classifiers produced consistent enhancer predictions, with significantly higher densities in the loci of the most highly expressed compared to lowly expressed genes. Liver enhancer predictions were assessed in vivo using the hydrodynamic tail vein injection assay. Fifty-eight percent of the predictions yielded significant enhancer activity in the mouse liver, whereas a control set of five sequences was completely negative.CONCLUSIONS: We conclude that promoters of tissue-specific genes often contain unambiguous tissue-specific signatures that can be learned and used for the de novo prediction of enhancers.

KW - Animals

KW - Binding Sites

KW - Enhancer Elements, Genetic

KW - Gene Expression Regulation

KW - Genome-Wide Association Study

KW - Genomics/methods

KW - Humans

KW - Mice

KW - Nucleotide Motifs

KW - Organ Specificity/genetics

KW - Promoter Regions, Genetic

KW - Regulatory Sequences, Nucleic Acid

KW - Reproducibility of Results

KW - Support Vector Machine

KW - Transcription Factors

U2 - 10.1186/gb-2013-14-10-r117

DO - 10.1186/gb-2013-14-10-r117

M3 - Article

VL - 14

SP - R117

JO - Genome biology

JF - Genome biology

SN - 1474-7596

IS - 10

ER -