Towards Building a Cross-Lingual Speech Recognition System for Slovenian and Austrian German

Andrej Žgank; Barbara Schuppler

Towards Building a Cross-Lingual Speech Recognition System for Slovenian and Austrian German

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

Methods of cross-lingual speech recognition have a high potential to overcome
limitations on resources of spoken language in under-resourced languages. Not
only can they be applied to build automatic speech recognition (ASR) systems
for such languages, they can also be utilized to generate further resources of
spoken language. This paper presents a cross-lingual ASR system based on data
from two languages, Slovenian and Austrian German. Both were used as a
source and target language for cross-lingual transfer (i.e., the acoustic models
were trained on material from the source language, and recognition was tested
on material from the target language). The cross-lingual mapping between the
Slovenian phone set (40 phones) and the Austrian German phone set (33 phones)
was carried out using expert knowledge about the acoustic-phonetic properties
of the phones. For the experiments, we used data from two speech corpora: the
Slovenian BNSI Broadcast News speech database and the Austrian German
GRASS corpus. We trained HMM and DNN acoustic models for monolingual
and cross-lingual speech recognition. Evaluating the results, it became clear that
the DNN acoustic models outperformed the HMM models. The speech
recognition results for Austrian German as the target language clearly
outperformed those with Slovenian as the target language. Possible explanations
for this difference in performance are: 1) The higher number of phones in the
Slovenian language, 2) The speaking style discrepancies of the databases (i.e., a
mix of read and spontaneous speech in the Slovenian data vs. read speech only
in the Austrian data), and 3) the recording quality mismatch (i.e., GRASS is
recorded under better conditions than BNSI).

Original language	English
Pages (from-to)	19-33
Journal	The Phonetician
Volume	117
Issue number	Spec. Iss.
Publication status	Published - 2020

Access to Document

http://isphs.org/Phonetician/ThePhonetician117.pdf

FWF - CLCS_2 - Cross-layer prosodic models for conversational speech
Schuppler, B.
1/10/18 → 30/11/21
Project: Research project

Cite this

@article{8d4389bab0064094aa52fafc52e1f49a,

title = "Towards Building a Cross-Lingual Speech Recognition System for Slovenian and Austrian German",

abstract = "Methods of cross-lingual speech recognition have a high potential to overcomelimitations on resources of spoken language in under-resourced languages. Notonly can they be applied to build automatic speech recognition (ASR) systemsfor such languages, they can also be utilized to generate further resources ofspoken language. This paper presents a cross-lingual ASR system based on datafrom two languages, Slovenian and Austrian German. Both were used as asource and target language for cross-lingual transfer (i.e., the acoustic modelswere trained on material from the source language, and recognition was testedon material from the target language). The cross-lingual mapping between theSlovenian phone set (40 phones) and the Austrian German phone set (33 phones)was carried out using expert knowledge about the acoustic-phonetic propertiesof the phones. For the experiments, we used data from two speech corpora: theSlovenian BNSI Broadcast News speech database and the Austrian GermanGRASS corpus. We trained HMM and DNN acoustic models for monolingualand cross-lingual speech recognition. Evaluating the results, it became clear thatthe DNN acoustic models outperformed the HMM models. The speechrecognition results for Austrian German as the target language clearlyoutperformed those with Slovenian as the target language. Possible explanationsfor this difference in performance are: 1) The higher number of phones in theSlovenian language, 2) The speaking style discrepancies of the databases (i.e., amix of read and spontaneous speech in the Slovenian data vs. read speech onlyin the Austrian data), and 3) the recording quality mismatch (i.e., GRASS isrecorded under better conditions than BNSI).",

author = "Andrej {\v Z}gank and Barbara Schuppler",

year = "2020",

language = "English",

volume = "117",

pages = "19--33",

journal = "The Phonetician",

issn = "0741-6164",

publisher = "International Society of Phonetic Sciences, ISPhS",

number = "Spec. Iss.",

}

TY - JOUR

T1 - Towards Building a Cross-Lingual Speech Recognition System for Slovenian and Austrian German

AU - Žgank , Andrej

AU - Schuppler, Barbara

PY - 2020

Y1 - 2020

N2 - Methods of cross-lingual speech recognition have a high potential to overcomelimitations on resources of spoken language in under-resourced languages. Notonly can they be applied to build automatic speech recognition (ASR) systemsfor such languages, they can also be utilized to generate further resources ofspoken language. This paper presents a cross-lingual ASR system based on datafrom two languages, Slovenian and Austrian German. Both were used as asource and target language for cross-lingual transfer (i.e., the acoustic modelswere trained on material from the source language, and recognition was testedon material from the target language). The cross-lingual mapping between theSlovenian phone set (40 phones) and the Austrian German phone set (33 phones)was carried out using expert knowledge about the acoustic-phonetic propertiesof the phones. For the experiments, we used data from two speech corpora: theSlovenian BNSI Broadcast News speech database and the Austrian GermanGRASS corpus. We trained HMM and DNN acoustic models for monolingualand cross-lingual speech recognition. Evaluating the results, it became clear thatthe DNN acoustic models outperformed the HMM models. The speechrecognition results for Austrian German as the target language clearlyoutperformed those with Slovenian as the target language. Possible explanationsfor this difference in performance are: 1) The higher number of phones in theSlovenian language, 2) The speaking style discrepancies of the databases (i.e., amix of read and spontaneous speech in the Slovenian data vs. read speech onlyin the Austrian data), and 3) the recording quality mismatch (i.e., GRASS isrecorded under better conditions than BNSI).

AB - Methods of cross-lingual speech recognition have a high potential to overcomelimitations on resources of spoken language in under-resourced languages. Notonly can they be applied to build automatic speech recognition (ASR) systemsfor such languages, they can also be utilized to generate further resources ofspoken language. This paper presents a cross-lingual ASR system based on datafrom two languages, Slovenian and Austrian German. Both were used as asource and target language for cross-lingual transfer (i.e., the acoustic modelswere trained on material from the source language, and recognition was testedon material from the target language). The cross-lingual mapping between theSlovenian phone set (40 phones) and the Austrian German phone set (33 phones)was carried out using expert knowledge about the acoustic-phonetic propertiesof the phones. For the experiments, we used data from two speech corpora: theSlovenian BNSI Broadcast News speech database and the Austrian GermanGRASS corpus. We trained HMM and DNN acoustic models for monolingualand cross-lingual speech recognition. Evaluating the results, it became clear thatthe DNN acoustic models outperformed the HMM models. The speechrecognition results for Austrian German as the target language clearlyoutperformed those with Slovenian as the target language. Possible explanationsfor this difference in performance are: 1) The higher number of phones in theSlovenian language, 2) The speaking style discrepancies of the databases (i.e., amix of read and spontaneous speech in the Slovenian data vs. read speech onlyin the Austrian data), and 3) the recording quality mismatch (i.e., GRASS isrecorded under better conditions than BNSI).

M3 - Article

SN - 0741-6164

VL - 117

SP - 19

EP - 33

JO - The Phonetician

JF - The Phonetician

IS - Spec. Iss.

ER -

Towards Building a Cross-Lingual Speech Recognition System for Slovenian and Austrian German

Abstract

Access to Document

Projects

FWF - CLCS_2 - Cross-layer prosodic models for conversational speech

Cite this