Reward-based stochastic self-configuration of neural circuits

David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.
Original languageEnglish
Number of pages32
JournalarXiv.org e-Print archive
VolumearXiv preprint arXiv:1704.04238
Publication statusPublished - 2017

Fingerprint

Plasticity
Networks (circuits)
Fokker Planck equation
Random processes
Sampling
Dopamine

Keywords

  • spine dynamics, rewiring, stochastic synaptic plasticity, reward-modulated STDP, reinforcement learning, policy gradient, sampling

Fields of Expertise

  • Information, Communication & Computing

Cite this

Kappel, D., Legenstein, R., Habenschuss, S., Hsieh, M., & Maass, W. (2017). Reward-based stochastic self-configuration of neural circuits. arXiv.org e-Print archive, arXiv preprint arXiv:1704.04238.

Reward-based stochastic self-configuration of neural circuits. / Kappel, David; Legenstein, Robert; Habenschuss, Stefan; Hsieh, Michael; Maass, Wolfgang.

In: arXiv.org e-Print archive, Vol. arXiv preprint arXiv:1704.04238, 2017.

Research output: Contribution to journalArticleResearchpeer-review

Kappel, D, Legenstein, R, Habenschuss, S, Hsieh, M & Maass, W 2017, 'Reward-based stochastic self-configuration of neural circuits' arXiv.org e-Print archive, vol. arXiv preprint arXiv:1704.04238.
Kappel D, Legenstein R, Habenschuss S, Hsieh M, Maass W. Reward-based stochastic self-configuration of neural circuits. arXiv.org e-Print archive. 2017;arXiv preprint arXiv:1704.04238.
Kappel, David ; Legenstein, Robert ; Habenschuss, Stefan ; Hsieh, Michael ; Maass, Wolfgang. / Reward-based stochastic self-configuration of neural circuits. In: arXiv.org e-Print archive. 2017 ; Vol. arXiv preprint arXiv:1704.04238.
@article{8e878c41560845dba50f4cc82764fb80,
title = "Reward-based stochastic self-configuration of neural circuits",
abstract = "Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.",
keywords = "spine dynamics, rewiring, stochastic synaptic plasticity, reward-modulated STDP, reinforcement learning, policy gradient, sampling",
author = "David Kappel and Robert Legenstein and Stefan Habenschuss and Michael Hsieh and Wolfgang Maass",
year = "2017",
language = "English",
volume = "arXiv preprint arXiv:1704.04238",
journal = "arXiv.org e-Print archive",
publisher = "Cornell University Library",

}

TY - JOUR

T1 - Reward-based stochastic self-configuration of neural circuits

AU - Kappel, David

AU - Legenstein, Robert

AU - Habenschuss, Stefan

AU - Hsieh, Michael

AU - Maass, Wolfgang

PY - 2017

Y1 - 2017

N2 - Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.

AB - Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.

KW - spine dynamics, rewiring, stochastic synaptic plasticity, reward-modulated STDP, reinforcement learning, policy gradient, sampling

M3 - Article

VL - arXiv preprint arXiv:1704.04238

JO - arXiv.org e-Print archive

JF - arXiv.org e-Print archive

ER -