Tracking of Multiple Fundamental Frequencies in Diplophonic Voices

Philipp Aichinger; Martin Hagmüller; Berit Schneider-Stickler; Jean Schoentgen; Franz Pernkopf

doi:10.1109/TASLP.2017.2761233

Tracking of Multiple Fundamental Frequencies in Diplophonic Voices

Philipp Aichinger, Martin Hagmüller, Berit Schneider-Stickler, Jean Schoentgen, Franz Pernkopf

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

Diplophonia is a type of pathological voice, in which two fundamental frequencies (<formula><tex>$f_o$</tex></formula>) are present simultaneously. Specialized audio analyzers that can handle up to two <formula><tex>$f_o$</tex></formula>s in diplophonic voices are in their infancy. We propose the tracking of up to two <formula><tex>$f_o$</tex></formula>s in diplophonic voices by audio waveform modeling (AWM), which involves obtaining candidates by repetitive execution of the Viterbi algorithm, followed by waveform Fourier synthesis, and heuristic candidate selection with majority voting. Our approach is evaluated with reference <formula><tex>$f_o$</tex></formula>-tracks obtained from laryngeal high-speed videos of 29 sustained phonations and compared to state-of-the-art tracking algorithms for multiple <formula><tex>$f_o$</tex></formula>s. An accurate and a fast variant of our algorithm are tested. The median error rate of the accurate variant is 6.52%, while the most accurate benchmark achieves 11.11%. The fast variant is more than twice as fast as the fastest relevant benchmark, and the median error rate is 9.52%. Furthermore, illustrative results of connected speech analysis are reported. Our approach may help to improve detection and analysis of diplophonia in clinical research and practice, as well as to advance synthesis of disordered voices.

Original language	English
Pages (from-to)	330-341
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	26
Issue number	2
DOIs	https://doi.org/10.1109/TASLP.2017.2761233
Publication status	Published - 2018

Keywords

audio waveform modeling
Benchmark testing
Diplophonia
Error analysis
Hidden Markov models
laryngeal highspeed imaging
multiple fundamental frequencies
Oscillators
pathological voice
Speech
Speech processing
Videos

ASJC Scopus subject areas

Signal Processing
Media Technology
Instrumentation
Acoustics and Ultrasonics
Linguistics and Language
Electrical and Electronic Engineering
Speech and Hearing

Access to Document

10.1109/TASLP.2017.2761233

Cite this

@article{6c205c53c9c440898eceaf543a90697e,

title = "Tracking of Multiple Fundamental Frequencies in Diplophonic Voices",

abstract = "Diplophonia is a type of pathological voice, in which two fundamental frequencies ($f_o$) are present simultaneously. Specialized audio analyzers that can handle up to two $f_o$s in diplophonic voices are in their infancy. We propose the tracking of up to two $f_o$s in diplophonic voices by audio waveform modeling (AWM), which involves obtaining candidates by repetitive execution of the Viterbi algorithm, followed by waveform Fourier synthesis, and heuristic candidate selection with majority voting. Our approach is evaluated with reference $f_o$-tracks obtained from laryngeal high-speed videos of 29 sustained phonations and compared to state-of-the-art tracking algorithms for multiple $f_o$s. An accurate and a fast variant of our algorithm are tested. The median error rate of the accurate variant is 6.52%, while the most accurate benchmark achieves 11.11%. The fast variant is more than twice as fast as the fastest relevant benchmark, and the median error rate is 9.52%. Furthermore, illustrative results of connected speech analysis are reported. Our approach may help to improve detection and analysis of diplophonia in clinical research and practice, as well as to advance synthesis of disordered voices.",

keywords = "audio waveform modeling, Benchmark testing, Diplophonia, Error analysis, Hidden Markov models, laryngeal highspeed imaging, multiple fundamental frequencies, Oscillators, pathological voice, Speech, Speech processing, Videos",

author = "Philipp Aichinger and Martin Hagm{\"u}ller and Berit Schneider-Stickler and Jean Schoentgen and Franz Pernkopf",

year = "2018",

doi = "10.1109/TASLP.2017.2761233",

language = "English",

volume = "26",

pages = "330--341",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "Institute of Electrical and Electronics Engineers",

number = "2",

}

TY - JOUR

T1 - Tracking of Multiple Fundamental Frequencies in Diplophonic Voices

AU - Aichinger, Philipp

AU - Hagmüller, Martin

AU - Schneider-Stickler, Berit

AU - Schoentgen, Jean

AU - Pernkopf, Franz

PY - 2018

Y1 - 2018

N2 - Diplophonia is a type of pathological voice, in which two fundamental frequencies ($f_o$) are present simultaneously. Specialized audio analyzers that can handle up to two $f_o$s in diplophonic voices are in their infancy. We propose the tracking of up to two $f_o$s in diplophonic voices by audio waveform modeling (AWM), which involves obtaining candidates by repetitive execution of the Viterbi algorithm, followed by waveform Fourier synthesis, and heuristic candidate selection with majority voting. Our approach is evaluated with reference $f_o$-tracks obtained from laryngeal high-speed videos of 29 sustained phonations and compared to state-of-the-art tracking algorithms for multiple $f_o$s. An accurate and a fast variant of our algorithm are tested. The median error rate of the accurate variant is 6.52%, while the most accurate benchmark achieves 11.11%. The fast variant is more than twice as fast as the fastest relevant benchmark, and the median error rate is 9.52%. Furthermore, illustrative results of connected speech analysis are reported. Our approach may help to improve detection and analysis of diplophonia in clinical research and practice, as well as to advance synthesis of disordered voices.

AB - Diplophonia is a type of pathological voice, in which two fundamental frequencies ($f_o$) are present simultaneously. Specialized audio analyzers that can handle up to two $f_o$s in diplophonic voices are in their infancy. We propose the tracking of up to two $f_o$s in diplophonic voices by audio waveform modeling (AWM), which involves obtaining candidates by repetitive execution of the Viterbi algorithm, followed by waveform Fourier synthesis, and heuristic candidate selection with majority voting. Our approach is evaluated with reference $f_o$-tracks obtained from laryngeal high-speed videos of 29 sustained phonations and compared to state-of-the-art tracking algorithms for multiple $f_o$s. An accurate and a fast variant of our algorithm are tested. The median error rate of the accurate variant is 6.52%, while the most accurate benchmark achieves 11.11%. The fast variant is more than twice as fast as the fastest relevant benchmark, and the median error rate is 9.52%. Furthermore, illustrative results of connected speech analysis are reported. Our approach may help to improve detection and analysis of diplophonia in clinical research and practice, as well as to advance synthesis of disordered voices.

KW - audio waveform modeling

KW - Benchmark testing

KW - Diplophonia

KW - Error analysis

KW - Hidden Markov models

KW - laryngeal highspeed imaging

KW - multiple fundamental frequencies

KW - Oscillators

KW - pathological voice

KW - Speech

KW - Speech processing

KW - Videos

UR - http://www.scopus.com/inward/record.url?scp=85031790491&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2017.2761233

DO - 10.1109/TASLP.2017.2761233

M3 - Article

AN - SCOPUS:85031790491

SN - 2329-9290

VL - 26

SP - 330

EP - 341

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

IS - 2

ER -

Tracking of Multiple Fundamental Frequencies in Diplophonic Voices

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this