Projects per year
Abstract
Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study "ah, ach, auch, eine and "er" may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Pages | 3198-3202 |
Number of pages | 5 |
Volume | 2022-September |
DOIs | |
Publication status | Published - 2022 |
Event | 23rd Annual Conference of the International Speech Communication Association: INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022 https://interspeech2022.org |
Publication series
Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
---|---|
ISSN (Print) | 2308-457X |
Conference
Conference | 23rd Annual Conference of the International Speech Communication Association |
---|---|
Abbreviated title | INTERSPEECH 2022 |
Country/Territory | Korea, Republic of |
City | Incheon |
Period | 18/09/22 → 22/09/22 |
Internet address |
Keywords
- Austrian German
- CNN
- conversational speech
- homophone disambiguation
- prosodic features
- Random Forest
ASJC Scopus subject areas
- Software
- Signal Processing
- Language and Linguistics
- Human-Computer Interaction
- Modelling and Simulation
Fingerprint
Dive into the research topics of 'Homophone Disambiguation Profits from Durational Information'. Together they form a unique fingerprint.Projects
- 1 Finished
-
FWF - CLCS_2 - Cross-layer prosodic models for conversational speech
1/10/18 → 30/11/21
Project: Research project