Phase-Processing For Voice Activity Detection: A Statistical Approach

Johannes Stahl, Pejman Mowlaee Beikzadehmahaleh, Josef Kulmer

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Abstract

Conventional voice activity detectors (VAD) mostly rely on the magnitude of the complex valued DFT spectral coefficients. In this paper, the circular variance of the Discrete Fourier transform (DFT) coefficients is investigated in terms of its ability to represent speech activity in noise. To this end we
model the circular variance as a random variable with different underlying distributions for the speech and the noise class. Based on this, we derive a binary hypothesis test relying only on the
circular variance estimated from the noisy speech. The experimental results show a reasonable VAD performance justifying that amplitude-independent information can characterize speech
in a convenient way.
Original languageEnglish
Title of host publicationEUSIPCO 2016
DOIs
Publication statusPublished - Aug 2016

Fingerprint

Processing
Discrete Fourier transforms
Detectors
Random variables

Cite this

Phase-Processing For Voice Activity Detection: A Statistical Approach. / Stahl, Johannes; Mowlaee Beikzadehmahaleh, Pejman; Kulmer, Josef.

EUSIPCO 2016. 2016.

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

@inproceedings{4d9c0a65227d4d279001a61a95d4c5e1,
title = "Phase-Processing For Voice Activity Detection: A Statistical Approach",
abstract = "Conventional voice activity detectors (VAD) mostly rely on the magnitude of the complex valued DFT spectral coefficients. In this paper, the circular variance of the Discrete Fourier transform (DFT) coefficients is investigated in terms of its ability to represent speech activity in noise. To this end wemodel the circular variance as a random variable with different underlying distributions for the speech and the noise class. Based on this, we derive a binary hypothesis test relying only on thecircular variance estimated from the noisy speech. The experimental results show a reasonable VAD performance justifying that amplitude-independent information can characterize speechin a convenient way.",
author = "Johannes Stahl and {Mowlaee Beikzadehmahaleh}, Pejman and Josef Kulmer",
year = "2016",
month = "8",
doi = "10.1109/EUSIPCO.2016.7760439",
language = "English",
booktitle = "EUSIPCO 2016",

}

TY - GEN

T1 - Phase-Processing For Voice Activity Detection: A Statistical Approach

AU - Stahl, Johannes

AU - Mowlaee Beikzadehmahaleh, Pejman

AU - Kulmer, Josef

PY - 2016/8

Y1 - 2016/8

N2 - Conventional voice activity detectors (VAD) mostly rely on the magnitude of the complex valued DFT spectral coefficients. In this paper, the circular variance of the Discrete Fourier transform (DFT) coefficients is investigated in terms of its ability to represent speech activity in noise. To this end wemodel the circular variance as a random variable with different underlying distributions for the speech and the noise class. Based on this, we derive a binary hypothesis test relying only on thecircular variance estimated from the noisy speech. The experimental results show a reasonable VAD performance justifying that amplitude-independent information can characterize speechin a convenient way.

AB - Conventional voice activity detectors (VAD) mostly rely on the magnitude of the complex valued DFT spectral coefficients. In this paper, the circular variance of the Discrete Fourier transform (DFT) coefficients is investigated in terms of its ability to represent speech activity in noise. To this end wemodel the circular variance as a random variable with different underlying distributions for the speech and the noise class. Based on this, we derive a binary hypothesis test relying only on thecircular variance estimated from the noisy speech. The experimental results show a reasonable VAD performance justifying that amplitude-independent information can characterize speechin a convenient way.

U2 - 10.1109/EUSIPCO.2016.7760439

DO - 10.1109/EUSIPCO.2016.7760439

M3 - Conference contribution

BT - EUSIPCO 2016

ER -