On splice site prediction using weight array models: A comparison of smoothing techniques

Leila Taher; Peter Meinicke; Burkhard Morgenstern

doi:10.1088/1742-6596/90/1/012004

On splice site prediction using weight array models: A comparison of smoothing techniques

Leila Taher^*, Peter Meinicke, Burkhard Morgenstern

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.

Original language	English
Article number	012004
Journal	Journal of Physics: Conference Series
Volume	90
Issue number	1
DOIs	https://doi.org/10.1088/1742-6596/90/1/012004
Publication status	Published - 1 Nov 2007
Externally published	Yes

ASJC Scopus subject areas

Physics and Astronomy(all)

Access to Document

10.1088/1742-6596/90/1/012004

Cite this

@article{045d476df748471cb10b9aa01028c9f2,

title = "On splice site prediction using weight array models: A comparison of smoothing techniques",

abstract = "In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called {"}splicing{"}. The positions where introns are cut and exons are spliced together are called {"}splice sites{"}. Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.",

author = "Leila Taher and Peter Meinicke and Burkhard Morgenstern",

year = "2007",

month = nov,

day = "1",

doi = "10.1088/1742-6596/90/1/012004",

language = "English",

volume = "90",

journal = "Journal of Physics: Conference Series",

issn = "1742-6588",

publisher = "IOP Publishing Ltd.",

number = "1",

}

TY - JOUR

T1 - On splice site prediction using weight array models

T2 - A comparison of smoothing techniques

AU - Taher, Leila

AU - Meinicke, Peter

AU - Morgenstern, Burkhard

PY - 2007/11/1

Y1 - 2007/11/1

N2 - In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.

AB - In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.

UR - http://www.scopus.com/inward/record.url?scp=37449009457&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/90/1/012004

DO - 10.1088/1742-6596/90/1/012004

M3 - Article

AN - SCOPUS:37449009457

SN - 1742-6588

VL - 90

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

IS - 1

M1 - 012004

ER -

On splice site prediction using weight array models: A comparison of smoothing techniques

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Cite this