TY - JOUR
T1 - On splice site prediction using weight array models
T2 - A comparison of smoothing techniques
AU - Taher, Leila
AU - Meinicke, Peter
AU - Morgenstern, Burkhard
PY - 2007/11/1
Y1 - 2007/11/1
N2 - In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.
AB - In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.
UR - http://www.scopus.com/inward/record.url?scp=37449009457&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/90/1/012004
DO - 10.1088/1742-6596/90/1/012004
M3 - Article
AN - SCOPUS:37449009457
SN - 1742-6588
VL - 90
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012004
ER -