On splice site prediction using weight array models: A comparison of smoothing techniques

Leila Taher*, Peter Meinicke, Burkhard Morgenstern

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In most eukaryotic genes, protein-coding exons are separated by non-coding introns which are removed from the primary transcript by a process called "splicing". The positions where introns are cut and exons are spliced together are called "splice sites". Thus, computational prediction of splice sites is crucial for gene finding in eukaryotes. Weight array models are a powerful probabilistic approach to splice site detection. Parameters for these models are usually derived from m-tuple frequencies in trusted training data and subsequently smoothed to avoid zero probabilities. In this study we compare three different ways of parameter estimation for m-tuple frequencies, namely (a) non-smoothed probability estimation, (b) standard pseudo counts and (c) a Gaussian smoothing procedure that we recently developed.

Original languageEnglish
Article number012004
JournalJournal of Physics: Conference Series
Volume90
Issue number1
DOIs
Publication statusPublished - 1 Nov 2007
Externally publishedYes

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this