Tracking of Multiple Fundamental Frequencies in Diplophonic Voices

Philipp Aichinger, Martin Hagmüller, Berit Schneider-Stickler, Jean Schoentgen, Franz Pernkopf

Research output: Contribution to journalArticle

Abstract

Diplophonia is a type of pathological voice, in which two fundamental frequencies (<formula><tex>$f_o$</tex></formula>) are present simultaneously. Specialized audio analyzers that can handle up to two <formula><tex>$f_o$</tex></formula>s in diplophonic voices are in their infancy. We propose the tracking of up to two <formula><tex>$f_o$</tex></formula>s in diplophonic voices by audio waveform modeling (AWM), which involves obtaining candidates by repetitive execution of the Viterbi algorithm, followed by waveform Fourier synthesis, and heuristic candidate selection with majority voting. Our approach is evaluated with reference <formula><tex>$f_o$</tex></formula>-tracks obtained from laryngeal high-speed videos of 29 sustained phonations and compared to state-of-the-art tracking algorithms for multiple <formula><tex>$f_o$</tex></formula>s. An accurate and a fast variant of our algorithm are tested. The median error rate of the accurate variant is 6.52%, while the most accurate benchmark achieves 11.11%. The fast variant is more than twice as fast as the fastest relevant benchmark, and the median error rate is 9.52%. Furthermore, illustrative results of connected speech analysis are reported. Our approach may help to improve detection and analysis of diplophonia in clinical research and practice, as well as to advance synthesis of disordered voices.

Original languageEnglish
Pages (from-to)330-341
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume26
Issue number2
DOIs
Publication statusE-pub ahead of print - 6 Oct 2017

Keywords

  • audio waveform modeling
  • Benchmark testing
  • Diplophonia
  • Error analysis
  • Hidden Markov models
  • laryngeal highspeed imaging
  • multiple fundamental frequencies
  • Oscillators
  • pathological voice
  • Speech
  • Speech processing
  • Videos

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Instrumentation
  • Acoustics and Ultrasonics
  • Linguistics and Language
  • Electrical and Electronic Engineering
  • Speech and Hearing

Fingerprint Dive into the research topics of 'Tracking of Multiple Fundamental Frequencies in Diplophonic Voices'. Together they form a unique fingerprint.

Cite this