Homophone Disambiguation Profits from Durational Information

Barbara Schuppler; Emil Berger; Xenia Kogler; Franz Pernkopf

doi:10.21437/Interspeech.2022-10109

Homophone Disambiguation Profits from Durational Information

Barbara Schuppler^*, Emil Berger, Xenia Kogler, Franz Pernkopf

^*Korrespondierende/r Autor/-in für diese Arbeit

Institut für Signalverarbeitung und Sprachkommunikation (4420)

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study "ah, ach, auch, eine and "er" may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system

Originalsprache	englisch
Titel	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Seiten	3198-3202
Seitenumfang	5
Band	2022-September
DOIs	https://doi.org/10.21437/Interspeech.2022-10109
Publikationsstatus	Veröffentlicht - 2022
Veranstaltung	23rd Annual Conference of the International Speech Communication Association: INTERSPEECH 2022 - Incheon, Südkorea Dauer: 18 Sept. 2022 → 22 Sept. 2022 https://interspeech2022.org

Publikationsreihe

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)	2308-457X

Konferenz

Konferenz	23rd Annual Conference of the International Speech Communication Association
Kurztitel	INTERSPEECH 2022
Land/Gebiet	Südkorea
Ort	Incheon
Zeitraum	18/09/22 → 22/09/22
Internetadresse	https://interspeech2022.org

ASJC Scopus subject areas

Software
Signalverarbeitung
Sprache und Linguistik
Human-computer interaction
Modellierung und Simulation

Zugriff auf Dokument

10.21437/Interspeech.2022-10109Lizenz: Andere

Andere Dateien und Links

Verknüpfung zur Publikation in Scopus

FWF - CLCS_2 - Cross-layer Prosodie Modelle für Spontansprache
Schuppler, B.
1/10/18 → 30/11/21
Projekt: Forschungsprojekt

Elise Richter
Schuppler, Barbara (Empfänger/-in), Nov. 2017
Auszeichnung: Forschungsstipendium

Dieses zitieren

Schuppler, B., Berger, E., Kogler, X., & Pernkopf, F. (2022). Homophone Disambiguation Profits from Durational Information. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Band 2022-September, S. 3198-3202). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). https://doi.org/10.21437/Interspeech.2022-10109

Homophone Disambiguation Profits from Durational Information. / Schuppler, Barbara ; Berger, Emil; Kogler, Xenia et al.
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Band 2022-September 2022. S. 3198-3202 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Schuppler, B , Berger, E, Kogler, X & Pernkopf, F 2022, Homophone Disambiguation Profits from Durational Information. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Bd. 2022-September, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, S. 3198-3202, 23rd Annual Conference of the International Speech Communication Association, Incheon, Südkorea, 18/09/22. https://doi.org/10.21437/Interspeech.2022-10109

Schuppler B , Berger E, Kogler X, Pernkopf F. Homophone Disambiguation Profits from Durational Information. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Band 2022-September. 2022. S. 3198-3202. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/Interspeech.2022-10109

Schuppler, Barbara ; Berger, Emil ; Kogler, Xenia et al. / Homophone Disambiguation Profits from Durational Information. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Band 2022-September 2022. S. 3198-3202 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

@inproceedings{622ed1dd9a4445e192b8b83edcd991a9,

title = "Homophone Disambiguation Profits from Durational Information",

abstract = "Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study {"}ah, ach, auch, eine and {"}er{"} may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system",

keywords = "Austrian German, CNN, conversational speech, homophone disambiguation, prosodic features, Random Forest",

author = "Barbara Schuppler and Emil Berger and Xenia Kogler and Franz Pernkopf",

year = "2022",

doi = "10.21437/Interspeech.2022-10109",

language = "English",

volume = "2022-September",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

pages = "3198--3202",

booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

note = "23rd Annual Conference of the International Speech Communication Association : INTERSPEECH 2022, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

url = "https://interspeech2022.org",

}

TY - GEN

T1 - Homophone Disambiguation Profits from Durational Information

AU - Schuppler, Barbara

AU - Berger, Emil

AU - Kogler, Xenia

AU - Pernkopf, Franz

PY - 2022

Y1 - 2022

N2 - Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study "ah, ach, auch, eine and "er" may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system

AB - Given the high degree of segmental reduction in conversational speech, a large number of words become homophoneous that in read speech are not. For instance, the tokens considered in this study "ah, ach, auch, eine and "er" may all be reduced to [a] in conversational Austrian German. Homophones pose a serious problem for automatic speech recognition (ASR), where homophone disambiguation is typically solved using lexical context. In contrast, we propose two approaches to disambiguate homophones on the basis of prosodic and spectral features. First, we build a Random Forest classifier with a large set of acoustic features, which reaches good performance given the small data size, and allows us to gain insight into how these homophones are distinct with respect to phonetic detail. Since for the extraction of the features annotations are required, this approach would not be practical for the integration into an ASR system. We thus explored a second, convolutional neural network (CNN) based approach. The performance of this approach is on par with the one based on Random Forest, and the results indicate a high potential of this approach to facilitate homophone disambiguation when combined with a stochastic language model as part of an ASR system

KW - Austrian German

KW - CNN

KW - conversational speech

KW - homophone disambiguation

KW - prosodic features

KW - Random Forest

UR - http://www.scopus.com/inward/record.url?scp=85140075041&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-10109

DO - 10.21437/Interspeech.2022-10109

M3 - Conference paper

VL - 2022-September

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 3198

EP - 3202

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 23rd Annual Conference of the International Speech Communication Association

Y2 - 18 September 2022 through 22 September 2022

ER -

Homophone Disambiguation Profits from Durational Information

Abstract

Publikationsreihe

Konferenz

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Projekte

FWF - CLCS_2 - Cross-layer Prosodie Modelle für Spontansprache

Auszeichnungen

Elise Richter

Dieses zitieren