Fundamental Frequency Tracking in Diplophonic Voices

Philipp Aichinger; Martin Hagmüller; Imme Roesner; Berit Schneider-Stickler; J. Schoentgen; Franz Pernkopf

doi:10.1016/j.bspc.2016.10.002

Fundamental Frequency Tracking in Diplophonic Voices

Philipp Aichinger, Martin Hagmüller, Imme Roesner, Berit Schneider-Stickler, J. Schoentgen, Franz Pernkopf

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

Background and objectives
Fundamental frequency (fo) extraction in disordered voices is a prerequisite for many types of clinical analyses. Special attention must be paid if multiple oscillators with different fos are active simultaneously. Two independent approaches to fo tracking in diplophonic voices are proposed and compared with a benchmark from the literature.

Material and methods
Six samples of sustained phonations were analyzed. High-speed videos were obtained in addition to audio recordings. Video-based fo tracks were obtained from cycle marks that report maximal vocal fold deflection in digital kymograms. Audio waveform modeling based extraction involved candidate tracking, oscillator waveform synthesis and track selection. Audio subband auto-correlation based extraction served as a benchmark.

Results and discussion
Promising qualitative and quantitative agreement of audio waveform modeling based estimates with kymogram-based tracks was observed. With reference to the kymogram-based tracks, audio waveform modeling based extraction had a median total error rate of 1.9%, which is an improvement over the benchmark method (17.7%).

Conclusion
The results illustrate that fos of diplophonic voices may be validly obtained from kymogram cycle marks, as well as via audio waveform modeling. The acquisition of two simultaneous fo tracks in diplophonic voices may increase the validity of clinical voice analysis procedures in the future.

Original language	English
Pages (from-to)	69-81
Journal	Biomedical Signal Processing and Control
Volume	37
DOIs	https://doi.org/10.1016/j.bspc.2016.10.002
Publication status	Published - Aug 2017

Access to Document

10.1016/j.bspc.2016.10.002

Cite this

@article{452f3d9273b2434e912f26dcc85342f2,

title = "Fundamental Frequency Tracking in Diplophonic Voices",

abstract = "Background and objectivesFundamental frequency (fo) extraction in disordered voices is a prerequisite for many types of clinical analyses. Special attention must be paid if multiple oscillators with different fos are active simultaneously. Two independent approaches to fo tracking in diplophonic voices are proposed and compared with a benchmark from the literature.Material and methodsSix samples of sustained phonations were analyzed. High-speed videos were obtained in addition to audio recordings. Video-based fo tracks were obtained from cycle marks that report maximal vocal fold deflection in digital kymograms. Audio waveform modeling based extraction involved candidate tracking, oscillator waveform synthesis and track selection. Audio subband auto-correlation based extraction served as a benchmark.Results and discussionPromising qualitative and quantitative agreement of audio waveform modeling based estimates with kymogram-based tracks was observed. With reference to the kymogram-based tracks, audio waveform modeling based extraction had a median total error rate of 1.9%, which is an improvement over the benchmark method (17.7%).ConclusionThe results illustrate that fos of diplophonic voices may be validly obtained from kymogram cycle marks, as well as via audio waveform modeling. The acquisition of two simultaneous fo tracks in diplophonic voices may increase the validity of clinical voice analysis procedures in the future.",

author = "Philipp Aichinger and Martin Hagm{\"u}ller and Imme Roesner and Berit Schneider-Stickler and J. Schoentgen and Franz Pernkopf",

year = "2017",

month = aug,

doi = "10.1016/j.bspc.2016.10.002",

language = "English",

volume = "37",

pages = "69--81",

journal = "Biomedical Signal Processing and Control",

issn = "1746-8108 ",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Fundamental Frequency Tracking in Diplophonic Voices

AU - Aichinger, Philipp

AU - Hagmüller, Martin

AU - Roesner, Imme

AU - Schneider-Stickler, Berit

AU - Schoentgen, J.

AU - Pernkopf, Franz

PY - 2017/8

Y1 - 2017/8

N2 - Background and objectivesFundamental frequency (fo) extraction in disordered voices is a prerequisite for many types of clinical analyses. Special attention must be paid if multiple oscillators with different fos are active simultaneously. Two independent approaches to fo tracking in diplophonic voices are proposed and compared with a benchmark from the literature.Material and methodsSix samples of sustained phonations were analyzed. High-speed videos were obtained in addition to audio recordings. Video-based fo tracks were obtained from cycle marks that report maximal vocal fold deflection in digital kymograms. Audio waveform modeling based extraction involved candidate tracking, oscillator waveform synthesis and track selection. Audio subband auto-correlation based extraction served as a benchmark.Results and discussionPromising qualitative and quantitative agreement of audio waveform modeling based estimates with kymogram-based tracks was observed. With reference to the kymogram-based tracks, audio waveform modeling based extraction had a median total error rate of 1.9%, which is an improvement over the benchmark method (17.7%).ConclusionThe results illustrate that fos of diplophonic voices may be validly obtained from kymogram cycle marks, as well as via audio waveform modeling. The acquisition of two simultaneous fo tracks in diplophonic voices may increase the validity of clinical voice analysis procedures in the future.

AB - Background and objectivesFundamental frequency (fo) extraction in disordered voices is a prerequisite for many types of clinical analyses. Special attention must be paid if multiple oscillators with different fos are active simultaneously. Two independent approaches to fo tracking in diplophonic voices are proposed and compared with a benchmark from the literature.Material and methodsSix samples of sustained phonations were analyzed. High-speed videos were obtained in addition to audio recordings. Video-based fo tracks were obtained from cycle marks that report maximal vocal fold deflection in digital kymograms. Audio waveform modeling based extraction involved candidate tracking, oscillator waveform synthesis and track selection. Audio subband auto-correlation based extraction served as a benchmark.Results and discussionPromising qualitative and quantitative agreement of audio waveform modeling based estimates with kymogram-based tracks was observed. With reference to the kymogram-based tracks, audio waveform modeling based extraction had a median total error rate of 1.9%, which is an improvement over the benchmark method (17.7%).ConclusionThe results illustrate that fos of diplophonic voices may be validly obtained from kymogram cycle marks, as well as via audio waveform modeling. The acquisition of two simultaneous fo tracks in diplophonic voices may increase the validity of clinical voice analysis procedures in the future.

U2 - 10.1016/j.bspc.2016.10.002

DO - 10.1016/j.bspc.2016.10.002

M3 - Article

SN - 1746-8108

VL - 37

SP - 69

EP - 81

JO - Biomedical Signal Processing and Control

JF - Biomedical Signal Processing and Control

ER -

Fundamental Frequency Tracking in Diplophonic Voices

Abstract

Access to Document

Fingerprint

Cite this