Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle

Rana Ali Amjad, Bernhard Geiger

Research output: Contribution to journalArticlepeer-review

Abstract

In this theory paper, we investigate training deep neural networks (DNNs) for classification via minimizing the information bottleneck (IB) functional. We show that the resulting optimization problem suffers from two severe issues: First, for deterministic DNNs, either the IB functional is infinite for almost all values of network parameters, making the optimization problem ill-posed, or it is piecewise constant, hence not admitting gradient-based optimization methods. Second, the invariance of the IB functional under bijections prevents it from capturing properties of the learned representation that are desirable for classification, such as robustness and simplicity. We argue that these issues are partly resolved for stochastic DNNs, DNNs that include a (hard or soft) decision rule, or by replacing the IB functional with related, but more well-behaved cost functions. We conclude that recent successes reported about training DNNs using the IB framework must be attributed to such solutions. As a side effect, our results indicate limitations of the IB framework for the analysis of DNNs. We also note that rather than trying to repair the inherent problems in the IB functional, a better approach may be to design regularizers on latent representation enforcing the desired properties directly.

Original languageEnglish
Article number8680020
Pages (from-to)2225-2239
Number of pages15
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume42
Issue number9
DOIs
Publication statusPublished - 1 Sept 2020

Keywords

  • classification
  • Deep learning
  • information bottleneck
  • neural networks
  • regularization
  • representation learning
  • stochastic neural networks

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Applied Mathematics
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle'. Together they form a unique fingerprint.

Cite this