Towards Building a Cross-Lingual Speech Recognition System for Slovenian and Austrian German

Andrej Žgank , Barbara Schuppler

Research output: Contribution to journalArticlepeer-review


Methods of cross-lingual speech recognition have a high potential to overcome
limitations on resources of spoken language in under-resourced languages. Not
only can they be applied to build automatic speech recognition (ASR) systems
for such languages, they can also be utilized to generate further resources of
spoken language. This paper presents a cross-lingual ASR system based on data
from two languages, Slovenian and Austrian German. Both were used as a
source and target language for cross-lingual transfer (i.e., the acoustic models
were trained on material from the source language, and recognition was tested
on material from the target language). The cross-lingual mapping between the
Slovenian phone set (40 phones) and the Austrian German phone set (33 phones)
was carried out using expert knowledge about the acoustic-phonetic properties
of the phones. For the experiments, we used data from two speech corpora: the
Slovenian BNSI Broadcast News speech database and the Austrian German
GRASS corpus. We trained HMM and DNN acoustic models for monolingual
and cross-lingual speech recognition. Evaluating the results, it became clear that
the DNN acoustic models outperformed the HMM models. The speech
recognition results for Austrian German as the target language clearly
outperformed those with Slovenian as the target language. Possible explanations
for this difference in performance are: 1) The higher number of phones in the
Slovenian language, 2) The speaking style discrepancies of the databases (i.e., a
mix of read and spontaneous speech in the Slovenian data vs. read speech only
in the Austrian data), and 3) the recording quality mismatch (i.e., GRASS is
recorded under better conditions than BNSI).
Original languageEnglish
Pages (from-to)19-33
JournalThe Phonetician
Issue numberSpec. Iss.
Publication statusPublished - 2020


Dive into the research topics of 'Towards Building a Cross-Lingual Speech Recognition System for Slovenian and Austrian German'. Together they form a unique fingerprint.

Cite this