Reward-based stochastic self-configuration of neural circuits

David Kappel; Robert Legenstein; Stefan Habenschuss; Michael Hsieh; Wolfgang Maass

Reward-based stochastic self-configuration of neural circuits

David Kappel^*, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass

^*Corresponding author for this work

Institute of Theoretical Computer Science (7080)

Research output: Working paper › Preprint

Abstract

Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.

Original language	English
Number of pages	32
Volume	arXiv preprint arXiv:1704.04238
Publication status	Published - 2017

Publication series

Name	arXiv.org e-Print archive
Publisher	Cornell University Library

Keywords

spine dynamics, rewiring, stochastic synaptic plasticity, reward-modulated STDP, reinforcement learning, policy gradient, sampling

Fields of Expertise

Information, Communication & Computing

Access to Document

https://arxiv.org/abs/1704.04238

Cite this

@techreport{8e878c41560845dba50f4cc82764fb80,

title = "Reward-based stochastic self-configuration of neural circuits",

abstract = "Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.",

keywords = "spine dynamics, rewiring, stochastic synaptic plasticity, reward-modulated STDP, reinforcement learning, policy gradient, sampling",

author = "David Kappel and Robert Legenstein and Stefan Habenschuss and Michael Hsieh and Wolfgang Maass",

year = "2017",

language = "English",

volume = "arXiv preprint arXiv:1704.04238",

series = "arXiv.org e-Print archive",

publisher = "Cornell University Library",

type = "WorkingPaper",

institution = "Cornell University Library",

}

TY - UNPB

T1 - Reward-based stochastic self-configuration of neural circuits

AU - Kappel, David

AU - Legenstein, Robert

AU - Habenschuss, Stefan

AU - Hsieh, Michael

AU - Maass, Wolfgang

PY - 2017

Y1 - 2017

N2 - Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.

AB - Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.

KW - spine dynamics, rewiring, stochastic synaptic plasticity, reward-modulated STDP, reinforcement learning, policy gradient, sampling

M3 - Preprint

VL - arXiv preprint arXiv:1704.04238

T3 - arXiv.org e-Print archive

BT - Reward-based stochastic self-configuration of neural circuits

ER -

Reward-based stochastic self-configuration of neural circuits

Abstract

Publication series

Keywords

Fields of Expertise

Access to Document

Fingerprint

Cite this