Skip to main content
Dryad logo

Reinforcement learning links spontaneous dopamine transients to reward


Foo, Conrad (2021), Reinforcement learning links spontaneous dopamine transients to reward, Dryad, Dataset,


In their pioneering study on dopamine release, Romo and Schultz speculated "... that the amount of dopamine released by unmodulated spontaneous impulse activity exerts a tonic, permissive influence on neuronal processes more actively engaged in preparation of self-initiated movements, ...". Motivated by the suggestion of "spontaneous impulses", we asked two questions. First, are there spontaneous impulses of dopamine that are released in cortex? This possibility is further motivated by the "ramp up" of dopaminergic neuronal activity that occurs when rodents navigate to a reward. Using cell-based optical sensors of extrasynaptic dopamine, [DA]ex, we found that spontaneous dopamine impulses in cortex of naive mice occur at a rate of ~ 0.01 per second. Next, can mice be trained to change the amplitude and/or timing of dopamine events triggered by internal brain dynamics, much as they can change the amplitude and timing of dopamine impulses based on an external cue? Using a reinforcement learning paradigm based solely on rewards that were gated by feedback from real-time measurements of [DA]ex, we found that mice can volitionally modulate their spontaneous [DA]ex. In particular, by only the second session of daily, hour-long training, mice increased the rate of impulses of [DA]ex, increased the amplitude of the impulses, and increased their tonic level of [DA]ex for a reward. Critically, mice learned to reliably elicit [DA]ex impulses prior to receiving a reward. These effects reversed when the reward was removed. We posit that spontaneous dopamine impulses may serve as a salient cognitive event in behavioral planning.


Imaging data was collected using two photon scanning microscopy (2PLSM). The CNiFER FRET response was calculated by drawing an ROI in each frame corresponding to the CNiFER injection site and averaging the intensity over this ROI. Behavioral data was collected from analog inputs to an ADInstruments PowerLab device. Running speed was calculated from the rotary encoder counts by taking a moving average of the counts over 1s windows centered around each collected datapoint. Licking rate was calculated from the optical lickometer signal by thresholding the intensity to generate a lick count. This was then converted to a licking rate using a moving average over 1s windows, as with running data. 

Usage Notes

The dataset requires MATLAB to load, and is stored as a structure array. A readme file ("README.txt") has been uploaded as well, which documents the various fields within the structure array and how they correspond to the data taken.