SPARC FIT-IT - Semantic Phonetic Automatic ReConstruction of dictations

  • Petrik, Stefan, (Co-Investigator (CoI))
  • Kubin, Gernot (Co-Investigator (CoI))

Project: Research project

Project Details

Description

The SPARC (Semantic Phonetic Automatic ReConstruction of dictations) project aims at laying the foundation for a radical improvement in the quality of automated dictation by making explicit use of semantic knowledge.

Current systems aim at a literal transcription of the dictation. But even trained persons do not necessarily formulate in the exact form required for the written document, because of inherent differences between spoken and written language. This is much more so for people with less experience. As a result, utterances must be expanded, restructured or reformulated to conform to the required form. Trained typists routinely perform this task.

To fully employ the potential of language technology dictation systems must perform in a similar way, i.e. systems have to move away from simply producing written drafts to producing documents conforming to the formal and informal requirements of texts in their espective class. To reach this aim reliable corpora are needed so that systems can be trained to perform in such a way.

Currently, 3 types of corpora are available:

(1) audio files of the original utterances;
(2) the draft transcriptions of the dictation system;
(3) final documents produced by the typist.

What is not available are error-free literal transcriptions of the original dictations. Exactly such literal transcriptions are needed though, to automatically learn the recurrent reformulations to be made to provide a draft close to the intended final document. Moreover, large corpora of literal transcriptions can be used as training data, to decrease the word error rate of speech recognition itself.

Therefore, we will develop methods for the automatic reconstruction of literal transcriptions on the basis of an automatic semantic annotation of draft transcriptions and final documents. Document pairs will be aligned to identify chunks of text that display differences. Using the semantic annotation these chunks will be measured for semantic "similarity" as well as for phonetic/acoustic similarity. This will make it possible to categorize differences as correction of speech recognition errors or stylistic reformulations or a combination of both. On the basis of this analysis a reconstruction of the original wording can be achieved.
StatusFinished
Effective start/end date1/01/0531/12/06