Automatic speech recognition (ASR) systems were originally designed to cope with carefully pronounced speech. Most real world applications of ASR systems, however, require the recognition of spontaneous, conversational speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.). Compared to prepared speech, conversational speech contains utterances that might be considered 'ungrammatical' and contain disfluencies such as “...oh, well, I think ahm exactly …”. Moreover, in spontaneous conversation, a word like “yesterday” may sound like yeshay and the German word “haben” (“to have”) may sound like ham. The pronunciation of the words depends on well-known factors, for instance on the regional background of the speakers and the formality of the situation. Highly influential, but not so well studied factors are those reflecting the prosodic characteristics of the word in the utterance. These prosodic characteristics describe the rhythm and melody of a sentence, and for instance whether a word is accented or not. The proposed project aims at investigating which role prosody plays for pronunciation variation from a linguistic point of view and at incorporating gained knowledge into an ASR system. In our investigations, we will use speech material from German and Austrian speakers.
In contrast to most research in the field of prosody which used read sentences or prepared speech, we will annotate and analyze speech from free conversations between speakers who know each other well. Such speech material is not only more naturalistic, but also richer in pronunciation variation. In sum, our project will deliver the first prosodically annotated database for conversational Austrian German, automatic tools for the creation of prosodic annotations and a prosody-dependent ASR system for conversational speech from German and Austrian German speakers.