Homophone Disambiguation Profits from Durational Information

Barbara Schuppler; Emil Berger; Xenia Kogler; Franz Pernkopf

doi:10.21437/Interspeech.2022-10109

Homophone Disambiguation Profits from Durational Information

Barbara Schuppler^*, Emil Berger, Xenia Kogler, Franz Pernkopf

^*Corresponding author for this work

Institute of Signal Processing and Speech Communication (4420)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study "ah, ach, auch, eine and "er" may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system

Original language	English
Title of host publication	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages	3198-3202
Number of pages	5
Volume	2022-September
DOIs	https://doi.org/10.21437/Interspeech.2022-10109
Publication status	Published - 2022
Event	23rd Annual Conference of the International Speech Communication Association: INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022 https://interspeech2022.org

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)	2308-457X

Conference

Conference	23rd Annual Conference of the International Speech Communication Association
Abbreviated title	INTERSPEECH 2022
Country/Territory	Korea, Republic of
City	Incheon
Period	18/09/22 → 22/09/22
Internet address	https://interspeech2022.org

Keywords

Austrian German
CNN
conversational speech
homophone disambiguation
prosodic features
Random Forest

ASJC Scopus subject areas

Software
Signal Processing
Language and Linguistics
Human-Computer Interaction
Modelling and Simulation

Access to Document

10.21437/Interspeech.2022-10109Licence: Other

Cite this

Schuppler, B., Berger, E., Kogler, X., & Pernkopf, F. (2022). Homophone Disambiguation Profits from Durational Information. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2022-September, pp. 3198-3202). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). https://doi.org/10.21437/Interspeech.2022-10109

Homophone Disambiguation Profits from Durational Information. / Schuppler, Barbara ; Berger, Emil; Kogler, Xenia et al.
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 2022-September 2022. p. 3198-3202 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Schuppler, B , Berger, E, Kogler, X & Pernkopf, F 2022, Homophone Disambiguation Profits from Durational Information. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. vol. 2022-September, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 3198-3202, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, Republic of, 18/09/22. https://doi.org/10.21437/Interspeech.2022-10109

Schuppler B , Berger E, Kogler X, Pernkopf F. Homophone Disambiguation Profits from Durational Information. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 2022-September. 2022. p. 3198-3202. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/Interspeech.2022-10109

Schuppler, Barbara ; Berger, Emil ; Kogler, Xenia et al. / Homophone Disambiguation Profits from Durational Information. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 2022-September 2022. pp. 3198-3202 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

@inproceedings{622ed1dd9a4445e192b8b83edcd991a9,

title = "Homophone Disambiguation Profits from Durational Information",

abstract = "Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study {"}ah, ach, auch, eine and {"}er{"} may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system",

keywords = "Austrian German, CNN, conversational speech, homophone disambiguation, prosodic features, Random Forest",

author = "Barbara Schuppler and Emil Berger and Xenia Kogler and Franz Pernkopf",

year = "2022",

doi = "10.21437/Interspeech.2022-10109",

language = "English",

volume = "2022-September",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

pages = "3198--3202",

booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

note = "23rd Annual Conference of the International Speech Communication Association : INTERSPEECH 2022, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

url = "https://interspeech2022.org",

}

TY - GEN

T1 - Homophone Disambiguation Profits from Durational Information

AU - Schuppler, Barbara

AU - Berger, Emil

AU - Kogler, Xenia

AU - Pernkopf, Franz

PY - 2022

Y1 - 2022

N2 - Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study "ah, ach, auch, eine and "er" may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system

AB - Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study "ah, ach, auch, eine and "er" may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system

KW - Austrian German

KW - CNN

KW - conversational speech

KW - homophone disambiguation

KW - prosodic features

KW - Random Forest

UR - http://www.scopus.com/inward/record.url?scp=85140075041&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-10109

DO - 10.21437/Interspeech.2022-10109

M3 - Conference paper

VL - 2022-September

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 3198

EP - 3202

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 23rd Annual Conference of the International Speech Communication Association

Y2 - 18 September 2022 through 22 September 2022

ER -

Homophone Disambiguation Profits from Durational Information

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

FWF - CLCS_2 - Cross-layer prosodic models for conversational speech

Elise Richter

Cite this

Homophone Disambiguation Profits from Durational Information

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Projects

FWF - CLCS_2 - Cross-layer prosodic models for conversational speech

Prizes

Elise Richter

Cite this