CLCS - Cross-layer pronunciation modeling for conversational speech/Cross-layer Aussprachemodelle für Spontansprache (FWF Hertha Firnberg Program T572)

Schuppler, Barbara (Principal Investigator (PI))

Institute of Signal Processing and Speech Communication (4420)

Project: Research project

Description

ASR systems have originally been designed to cope with carefully pronounced speech. As a consequence, these systems cannot deal well with spontaneous, conversational speech. Read and conversational speech are different in many aspects. On the linguistic level, conversational speech contains disfluencies and many utterances that might be considered as ungrammatical'. On the phonetic level, a much higher degree of pronunciation variation is observed in spontaneous than in read speech. Words are more often acoustically reduced compared to their full pronunciations, such that a word like yesterday may sound like yeshay or a German word like haben my sound like ham. Since most real world applications of ASR systems require the recognition of spontaneous speech (e.g., dialogue systems, voice input aids for physically disabled, medical dictation systems, etc.), the investigation of new methods to model every-day speech has received a lot of attention among speech technologists. Also in the linguistic and psycholinguistic domain, casual conversations are studied on the search for an answer to how every-day speech production and comprehension works. Their studies have indicated that certain higher level linguistic functions and structures of utterances condition the details of their pronunciation. It is likely that the kind of analysis that is becoming feasible with the growing availability of large speech corpora will bring to light yet unknown factors that affect pronunciation variation. The research envisioned in this proposal is designed to increase our knowledge about spontaneous, conversational speech and to use this knowledge to improve Automatic Speech Recognition (ASR) systems. The first objective is to identify which higher level linguistic structures and functions condition pronunciation variation by means of quantitative phonetic analyses. Studies will be carried out on Dutch and on Austrian German material, which will allow to draw conclusions about which findings are language specific and which are characteristic for conversational speech in general. The second objective is to improve ASR technology by incorporating the gained knowledge about the conditions for pronunciation variation. Most ASR systems still deal with acoustic and linguistic information independently of each other. In contrast, I propose a Cross-layer pronunciation modeling technique, which (1) makes use of the gained knowledge about the effects of several layers of linguistic structures and functions on pronunciation variation, and (2) which means that the recognizer makes use of lexicons in more than just one layer of its architecture. Additional deliverables of this project are the collected speech material along with the created tools for its automatic annotation, which both would be of great value for future studies of linguists and engineers

Status	Finished
Effective start/end date	1/09/12 → 30/04/17

5 Conference paper
4 Article
1 Book
1 (Old data) Lecture or Presentation
More
- 1 Poster

On the use of acoustic features for automatic disambiguation of homophones in spontaneous German
Schuppler, B. & Schrank, T., 2018, In: Computer Speech and Language . 52, p. 209-224 52.
Research output: Contribution to journal › Article › peer-review

File
Rethinking Reduction: Interdisciplinary Perspectives on Conditions, Mechanisms, and Domains for Phonetic Variation
Cangemi, F. (ed.), Clayards, M. (ed.), Niebuhr, O. (ed.), Schuppler, B. (ed.) & Zellers, M. (ed.), 2018, Berlin: de Gruyter Mouton. 306 p. (Phonetics and Phonology; vol. 25)
Research output: Book/Report › Book › peer-review
A corpus of read and conversational Austrian German
Schuppler, B., Hagmüller, M. & Zahrer, A., 1 Nov 2017, In: Speech Communication. 94, p. 62-74 13 p.
Research output: Contribution to journal › Article › peer-review

2 Editorial activity
1 Hosting a researcher (Inland)
1 Talk at conference or symposium

de Gruyter Mouton (Publisher)
Barbara Schuppler (Peer reviewer)
Jun 2022
Activity: Publication peer-review or editorial work › Editorial activity
de Gruyter Mouton (Publisher)
Francesco Cangemi (Editor), Meghan Clayards (Editor), Oliver Niebuhr (Editor), Barbara Schuppler (Editor) & Margaret Zellers (Editor)
2018
Activity: Publication peer-review or editorial work › Editorial activity
International Conference on Statistical Language and Speech Processing (SLSP)
Barbara Schuppler (Speaker)
11 Oct 2016 → 12 Oct 2016
Activity: Talk or presentation › Talk at conference or symposium › Science to science

CLCS - Cross-layer pronunciation modeling for conversational speech/Cross-layer Aussprachemodelle für Spontansprache (FWF Hertha Firnberg Program T572)

Project Details

Description

Fingerprint

On the use of acoustic features for automatic disambiguation of homophones in spontaneous German

Rethinking Reduction: Interdisciplinary Perspectives on Conditions, Mechanisms, and Domains for Phonetic Variation

A corpus of read and conversational Austrian German

de Gruyter Mouton (Publisher)

de Gruyter Mouton (Publisher)

International Conference on Statistical Language and Speech Processing (SLSP)

CLCS - Cross-layer pronunciation modeling for conversational speech/Cross-layer Aussprachemodelle für Spontansprache (FWF Hertha Firnberg Program T572)

Project Details

Description

Fingerprint

Research output

On the use of acoustic features for automatic disambiguation of homophones in spontaneous German

Rethinking Reduction: Interdisciplinary Perspectives on Conditions, Mechanisms, and Domains for Phonetic Variation

A corpus of read and conversational Austrian German

Activities

de Gruyter Mouton (Publisher)

de Gruyter Mouton (Publisher)

International Conference on Statistical Language and Speech Processing (SLSP)