Stochastic mutual information gradient estimation for dimensionality reduction networks

Ozan Özdenizci; Deniz Erdogmus

doi:10.1016/j.ins.2021.04.066

Stochastic mutual information gradient estimation for dimensionality reduction networks

Ozan Özdenizci, Deniz Erdogmus

Institute of Theoretical Computer Science (7080)

Research output: Contribution to journal › Article › peer-review

Abstract

Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection algorithms based on any criterion leading to potentially sub-optimal solutions for class separability. In that regard, we introduce emerging information theoretic feature transformation protocols as an end-to-end neural network training approach. We present a dimensionality reduction network (MMINet) training procedure based on the stochastic estimate of the mutual information gradient. The network projects high-dimensional features onto an output feature space where lower dimensional representations of features carry maximum mutual information with their associated class labels. Furthermore, we formulate the training objective to be estimated non-parametrically with no distributional assumptions. We experimentally evaluate our method with applications to high-dimensional biological data sets, and relate it to conventional feature selection algorithms to form a special case of our approach.

Original language	English
Pages (from-to)	298-305
Number of pages	8
Journal	Information Sciences
Volume	570
DOIs	https://doi.org/10.1016/j.ins.2021.04.066
Publication status	Published - Sept 2021

Keywords

Dimensionality reduction
Feature projection
Information theoretic learning
MMINet
Mutual information
Neural networks
Stochastic gradient estimation

ASJC Scopus subject areas

Software
Information Systems and Management
Artificial Intelligence
Theoretical Computer Science
Control and Systems Engineering
Computer Science Applications

Fields of Expertise

Information, Communication & Computing

Access to Document

10.1016/j.ins.2021.04.066

https://arxiv.org/abs/2105.00191Licence: CC BY-NC-ND 4.0

Cite this

@article{fd53a8fd2d1c4abf9b5e794142a21b3e,

title = "Stochastic mutual information gradient estimation for dimensionality reduction networks",

abstract = "Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection algorithms based on any criterion leading to potentially sub-optimal solutions for class separability. In that regard, we introduce emerging information theoretic feature transformation protocols as an end-to-end neural network training approach. We present a dimensionality reduction network (MMINet) training procedure based on the stochastic estimate of the mutual information gradient. The network projects high-dimensional features onto an output feature space where lower dimensional representations of features carry maximum mutual information with their associated class labels. Furthermore, we formulate the training objective to be estimated non-parametrically with no distributional assumptions. We experimentally evaluate our method with applications to high-dimensional biological data sets, and relate it to conventional feature selection algorithms to form a special case of our approach.",

keywords = "Dimensionality reduction, Feature projection, Information theoretic learning, MMINet, Mutual information, Neural networks, Stochastic gradient estimation",

author = "Ozan {\"O}zdenizci and Deniz Erdogmus",

year = "2021",

month = sep,

doi = "10.1016/j.ins.2021.04.066",

language = "English",

volume = "570",

pages = "298--305",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Stochastic mutual information gradient estimation for dimensionality reduction networks

AU - Özdenizci, Ozan

AU - Erdogmus, Deniz

PY - 2021/9

Y1 - 2021/9

N2 - Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection algorithms based on any criterion leading to potentially sub-optimal solutions for class separability. In that regard, we introduce emerging information theoretic feature transformation protocols as an end-to-end neural network training approach. We present a dimensionality reduction network (MMINet) training procedure based on the stochastic estimate of the mutual information gradient. The network projects high-dimensional features onto an output feature space where lower dimensional representations of features carry maximum mutual information with their associated class labels. Furthermore, we formulate the training objective to be estimated non-parametrically with no distributional assumptions. We experimentally evaluate our method with applications to high-dimensional biological data sets, and relate it to conventional feature selection algorithms to form a special case of our approach.

AB - Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection algorithms based on any criterion leading to potentially sub-optimal solutions for class separability. In that regard, we introduce emerging information theoretic feature transformation protocols as an end-to-end neural network training approach. We present a dimensionality reduction network (MMINet) training procedure based on the stochastic estimate of the mutual information gradient. The network projects high-dimensional features onto an output feature space where lower dimensional representations of features carry maximum mutual information with their associated class labels. Furthermore, we formulate the training objective to be estimated non-parametrically with no distributional assumptions. We experimentally evaluate our method with applications to high-dimensional biological data sets, and relate it to conventional feature selection algorithms to form a special case of our approach.

KW - Dimensionality reduction

KW - Feature projection

KW - Information theoretic learning

KW - MMINet

KW - Mutual information

KW - Neural networks

KW - Stochastic gradient estimation

UR - http://www.scopus.com/inward/record.url?scp=85107650168&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2021.04.066

DO - 10.1016/j.ins.2021.04.066

M3 - Article

SN - 0020-0255

VL - 570

SP - 298

EP - 305

JO - Information Sciences

JF - Information Sciences

ER -

Stochastic mutual information gradient estimation for dimensionality reduction networks

Abstract

Keywords

ASJC Scopus subject areas

Fields of Expertise

Access to Document

Other files and links

Fingerprint

Cite this