Localization and characterization of multiple harmonic sources

Hannes Pessentheiner, Martin Hagmüller, Gernot Kubin

Research output: Contribution to journalArticleResearchpeer-review

Abstract

We introduce a new and intuitive algorithm to characterize and localize multiple harmonic sources intersecting in the spatial and frequency domains. It jointly estimates their fundamental frequencies, their respective amplitudes, and their directions of arrival based on an intelligent non-parametric signal representation. To obtain these parameters, we first apply variable-scale sampling on unbiased cross-correlation functions between pairs of microphone signals to generate a joint parameter space. Then, we employ a multidimensional maxima detector to represent the parameters in a sparse joint parameter space. In comparison to others, our algorithm solves the issue of pitch-period doubling when using cross-correlation functions, it estimates multiple harmonic sources with a signal power smaller than the signal power of the dominant harmonic source, and it associates the estimated parameters to their corresponding sources in a multidimensional sparse joint parameter space, which can be directly fed into a tracker. We tested our algorithm and three others on synthetic data and speech data recorded in a real reverberant environment and evaluated their performance by employing the joint recall measure, the root-mean-square error, and the cumulative distribution function of fundamental frequencies and directions of arrival. The evaluations show promising results: Our algorithm outperforms the others in terms of the joint recall measure, and it can achieve root-mean-square errors of 1 Hz or 1 circ and smaller, which facilitates, e.g., distant-speech enhancement or source separation.

Original languageEnglish
Article number7457364
Pages (from-to)1348-1363
Number of pages16
JournalIEEE ACM Transactions on Audio Speech and Language Processing
Volume24
Issue number8
DOIs
Publication statusPublished - 1 Aug 2016

Fingerprint

Joints
harmonics
Direction of arrival
Mean square error
root-mean-square errors
Source separation
Speech enhancement
cross correlation
arrivals
Microphones
Distribution functions
period doubling
estimates
Sampling
Detectors
microphones
distribution functions
sampling
augmentation
evaluation

Keywords

  • data association
  • direction of arrival
  • fundamental frequency
  • Joint estimation
  • microphone array
  • pitch estimation
  • pitch-period doubling
  • sparse joint parameter space

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Instrumentation
  • Acoustics and Ultrasonics
  • Linguistics and Language
  • Speech and Hearing
  • Electrical and Electronic Engineering

Fields of Expertise

  • Information, Communication & Computing

Cite this

Localization and characterization of multiple harmonic sources. / Pessentheiner, Hannes; Hagmüller, Martin; Kubin, Gernot.

In: IEEE ACM Transactions on Audio Speech and Language Processing, Vol. 24, No. 8, 7457364, 01.08.2016, p. 1348-1363.

Research output: Contribution to journalArticleResearchpeer-review

@article{de4084bc1fa442df8b64da13a647d3fe,
title = "Localization and characterization of multiple harmonic sources",
abstract = "We introduce a new and intuitive algorithm to characterize and localize multiple harmonic sources intersecting in the spatial and frequency domains. It jointly estimates their fundamental frequencies, their respective amplitudes, and their directions of arrival based on an intelligent non-parametric signal representation. To obtain these parameters, we first apply variable-scale sampling on unbiased cross-correlation functions between pairs of microphone signals to generate a joint parameter space. Then, we employ a multidimensional maxima detector to represent the parameters in a sparse joint parameter space. In comparison to others, our algorithm solves the issue of pitch-period doubling when using cross-correlation functions, it estimates multiple harmonic sources with a signal power smaller than the signal power of the dominant harmonic source, and it associates the estimated parameters to their corresponding sources in a multidimensional sparse joint parameter space, which can be directly fed into a tracker. We tested our algorithm and three others on synthetic data and speech data recorded in a real reverberant environment and evaluated their performance by employing the joint recall measure, the root-mean-square error, and the cumulative distribution function of fundamental frequencies and directions of arrival. The evaluations show promising results: Our algorithm outperforms the others in terms of the joint recall measure, and it can achieve root-mean-square errors of 1 Hz or 1 circ and smaller, which facilitates, e.g., distant-speech enhancement or source separation.",
keywords = "data association, direction of arrival, fundamental frequency, Joint estimation, microphone array, pitch estimation, pitch-period doubling, sparse joint parameter space",
author = "Hannes Pessentheiner and Martin Hagm{\"u}ller and Gernot Kubin",
year = "2016",
month = "8",
day = "1",
doi = "10.1109/TASLP.2016.2556282",
language = "English",
volume = "24",
pages = "1348--1363",
journal = "IEEE ACM Transactions on Audio Speech and Language Processing",
issn = "2329-9290",
publisher = "Institute of Electrical and Electronics Engineers",
number = "8",

}

TY - JOUR

T1 - Localization and characterization of multiple harmonic sources

AU - Pessentheiner, Hannes

AU - Hagmüller, Martin

AU - Kubin, Gernot

PY - 2016/8/1

Y1 - 2016/8/1

N2 - We introduce a new and intuitive algorithm to characterize and localize multiple harmonic sources intersecting in the spatial and frequency domains. It jointly estimates their fundamental frequencies, their respective amplitudes, and their directions of arrival based on an intelligent non-parametric signal representation. To obtain these parameters, we first apply variable-scale sampling on unbiased cross-correlation functions between pairs of microphone signals to generate a joint parameter space. Then, we employ a multidimensional maxima detector to represent the parameters in a sparse joint parameter space. In comparison to others, our algorithm solves the issue of pitch-period doubling when using cross-correlation functions, it estimates multiple harmonic sources with a signal power smaller than the signal power of the dominant harmonic source, and it associates the estimated parameters to their corresponding sources in a multidimensional sparse joint parameter space, which can be directly fed into a tracker. We tested our algorithm and three others on synthetic data and speech data recorded in a real reverberant environment and evaluated their performance by employing the joint recall measure, the root-mean-square error, and the cumulative distribution function of fundamental frequencies and directions of arrival. The evaluations show promising results: Our algorithm outperforms the others in terms of the joint recall measure, and it can achieve root-mean-square errors of 1 Hz or 1 circ and smaller, which facilitates, e.g., distant-speech enhancement or source separation.

AB - We introduce a new and intuitive algorithm to characterize and localize multiple harmonic sources intersecting in the spatial and frequency domains. It jointly estimates their fundamental frequencies, their respective amplitudes, and their directions of arrival based on an intelligent non-parametric signal representation. To obtain these parameters, we first apply variable-scale sampling on unbiased cross-correlation functions between pairs of microphone signals to generate a joint parameter space. Then, we employ a multidimensional maxima detector to represent the parameters in a sparse joint parameter space. In comparison to others, our algorithm solves the issue of pitch-period doubling when using cross-correlation functions, it estimates multiple harmonic sources with a signal power smaller than the signal power of the dominant harmonic source, and it associates the estimated parameters to their corresponding sources in a multidimensional sparse joint parameter space, which can be directly fed into a tracker. We tested our algorithm and three others on synthetic data and speech data recorded in a real reverberant environment and evaluated their performance by employing the joint recall measure, the root-mean-square error, and the cumulative distribution function of fundamental frequencies and directions of arrival. The evaluations show promising results: Our algorithm outperforms the others in terms of the joint recall measure, and it can achieve root-mean-square errors of 1 Hz or 1 circ and smaller, which facilitates, e.g., distant-speech enhancement or source separation.

KW - data association

KW - direction of arrival

KW - fundamental frequency

KW - Joint estimation

KW - microphone array

KW - pitch estimation

KW - pitch-period doubling

KW - sparse joint parameter space

UR - http://www.scopus.com/inward/record.url?scp=84976347097&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2016.2556282

DO - 10.1109/TASLP.2016.2556282

M3 - Article

VL - 24

SP - 1348

EP - 1363

JO - IEEE ACM Transactions on Audio Speech and Language Processing

JF - IEEE ACM Transactions on Audio Speech and Language Processing

SN - 2329-9290

IS - 8

M1 - 7457364

ER -