Reward-based stochastic self-configuration of neural circuits

David Kappel*, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, Wolfgang Maass

*Corresponding author for this work

Research output: Working paperPreprint


Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.
Original languageEnglish
Number of pages32
VolumearXiv preprint arXiv:1704.04238
Publication statusPublished - 2017

Publication series e-Print archive
PublisherCornell University Library


  • spine dynamics, rewiring, stochastic synaptic plasticity, reward-modulated STDP, reinforcement learning, policy gradient, sampling

Fields of Expertise

  • Information, Communication & Computing


Dive into the research topics of 'Reward-based stochastic self-configuration of neural circuits'. Together they form a unique fingerprint.

Cite this