Reward-based option competition in human dorsal stream and transition from stochastic exploration to exploitation in continuous space

Hallquist, Michael1 ; Hwang, Kai2 ; Luna, Beatriz3 ; Dombrovski, Alexandre3

Published Feb 06, 2024 on Dryad. https://doi.org/10.5061/dryad.hmgqnk9qc

Data files

Feb 06, 2024 version files 98.83 MB

hallquist_etal_supplemental_data.zip
98.82 MB
README.md
3.25 KB

Abstract

Primates exploring and exploiting a continuous sensorimotor space rely on dynamic maps in the dorsal stream. Two complementary perspectives exist on how these maps encode rewards. Reinforcement learning models integrate rewards incrementally over time, efficiently resolving the exploration/exploitation dilemma. Working memory buffer models explain rapid plasticity of parietal maps but lack a plausible exploration/exploitation policy. The reinforcement learning model presented here unifies both accounts, enabling rapid, information-compressing map updates and efficient transition from exploration to exploitation. As predicted by our model, activity in human fronto-parietal dorsal stream regions, but not in MT+, tracks the number of competing options, as preferred options are selectively maintained on the map while spatiotemporally distant alternatives are compressed out. When valuable new options are uncovered, posterior beta₁/alpha oscillations desynchronize within 0.4-0.7 s, consistent with option encoding by competing beta₁-stabilized subpopulations. Altogether, outcomes matching locally cached reward representations rapidly update parietal maps, biasing choices toward often-sampled, rewarded options.

README: Reward-based option competition in human dorsal stream and transition from stochastic exploration to exploitation in continuous space

https://doi.org/10.5061/dryad.hmgqnk9qc

Behavioral, fMRI and MEG data.

Description of the data and file structure

Directories and file within hallquist_etal_supplemental_data.zip:

########################################################

fig_1: behavioral data from the fMRI study

trial_data_compact.RData - RData file with the following variables:

$ dataset: study name

$ id: participants's numeric id

$ run: sequential number of the current 50-trial block, 1-8

$ trial: trial

$ rewFunc: Contingency, "DEV","CEV","CEVR", "IEV"

$ rt_csv: response time in seconds

$ magnitude: expected reward magnitude

$ probability: expected reward probability

$ ev: expected reward value

$ rt_vmax: response time with the highest learned value, as predicted by the SCEPTIC model

$ score_csv: reward received

########################################################

fig_2: DAN parcellation, whole-brain statistical parametric maps (BOLD signal)

entropy_change_wb_unthresholded_1mm.nii.gz: .nii file of the un-thresholded parametric entropy change map

entropy_wb_unthresholded_1mm.nii.gz: .nii file of the un-thresholded parametric entropy map

Schaefer_444_final_2009c_1.0mm.nii.gz: .nii file of the Schaefer et al.'s 400 parcellation in MNI 2009c space

Schaefer2018_DAN_2009c_FINAL47.nii.gz: same, but only dorsal attention stream regions

########################################################

fig_3: deconvolved DAN BOLD signal, same parcellation as in fig_2

rt_aligned_deconvolved_bold.RData: RData file with the following variables:

$ id: participants's numeric id

$ run: sequential number of the current 50-trial block, 1-8

$ run_trial: trial within run (1:50), note the difference from the behavioral data file "trial" variable

$ feedback_onset: onset of feedback, in seconds

$ rewFunc: Contingency, "DEV","CEV","CEVR", "IEV"

$ atlas_value: number of dorsal stream node as in Table S2

$ label: label of dorsal stream node as in Table S2 and Figure S2

$ decon_interp: deconvolved BOLD signal

$ side: right ("R") or left ("L")

########################################################

fig_4: BOLD regional regression coefficients corresponding to entropy change maps in fig_2

entropy_change_betas.csv.gz: text file with the following variables:

$ id: participant's numeric id

$ atlas_value: number of dorsal stream node as in Table S2

$ x, y, z: MNI coordinates

$ value: mean regional regression coefficient for entropy change

########################################################

fig_5: MEG time-frequency domain statistics for entropy change

meg_time_frequency_entropy_change_ri.rds: .rds (R Data Serialization) file with the following variables:

$ Time: time in seconds relative to feedback

$ Freq: frequency, Hz

$ estimate: regression coefficient, estimate

$ std.error: regression coefficient, standard error

$ statistic: test statistic

$ df: degrees of freedom

$ p.value: uncorrected p-value

$ p_fdr: FDR-corrected p-value

Code/Software

SCEPTIC computational model: 10.5281/zenodo.1336285

Methods

fMRI acquisition

Neuroimaging data during the clock task were acquired in a Siemens Tim Trio 3T scanner for the original study and Siemens Tim Prisma 3T scanner for the replication study at the Magnetic Resonance Research Center, University of Pittsburgh. Due participant-dependent variation in response times on the task, each fMRI run varied in length from 3.15 to 5.87 minutes (M = 4.57 minutes, SD = 0.52). Functional imaging data for the original/replication study were acquired using a simultaneous multislice sequence sensitive to BOLD contrast, TR = 1.0/0.6s, TE = 30/27ms, flip angle = 55/45°, multiband acceleration factor = 5/5, voxel size = 2.3/3.1mm³. We also obtained a sagittal MPRAGE T1-weighted scan, voxel size = 1/1mm³, TR = 2.2/2.3s, TE = 3.58/3.35ms, GRAPPA 2/2x acceleration. The anatomical scan was used for coregistration and nonlinear transformation to functional and stereotaxic templates. We also acquired gradient echo fieldmap images (TEs = 4.93/4.47ms and 7.39/6.93ms) for each subject to mitigate inhomogeneity-related distortions in the functional MRI data.

Preprocessing of fMRI data

Anatomical scans were registered to the MNI152 template (82) using both affine (ANTS SyN) and nonlinear (FSL FNIRT) transformations. Functional images were preprocessed using tools from NiPy (83), AFNI (version 19.0.26) (84), and the FMRIB software library (FSL version 6.0.1) (85). First, slice timing and motion coregistration were performed simultaneously using a four-dimensional registration algorithm implemented in NiPy (86). Non-brain voxels were removed from functional images by masking voxels with low intensity and by the ROBEX brain extraction algorithm (87). We reduced distortion due to susceptibility artifacts using fieldmap correction implemented in FSL FUGUE.

Participants’ functional images were aligned to their anatomical scan using the white matter segmentation of each image and a boundary-based registration algorithm (88), augmented by fieldmap unwarping coefficients. Given the low contrast between gray and white matter in echoplanar scans with fast repetition times, we first aligned functional scans to a single-band fMRI reference image with better contrast. The reference image was acquired using the same scanning parameters, but without multiband acceleration. Functional scans were then warped into MNI152 template space (2.3mm output resolution) in one step using the concatenation of functional-reference, fieldmap unwarping, reference-structural, and structural-MNI152 transforms. Images were spatially smoothed using a 5mm full-width at half maximum (FWHM) kernel using a nonlinear smoother implemented in FSL SUSAN. To reduce head motion artifacts, we then conducted an independent component analysis for each run using FSL MELODIC. The spatiotemporal components were then passed to a classification algorithm, ICA-AROMA, validated to identify and remove motion-related artifacts (89). Components identified as noise were regressed out of the data using FSL regfilt (non-aggressive regression approach). ICA-AROMA has performed very well in head-to-head comparisons of alternative strategies for reducing head motion artifacts (90). We then applied a .008 Hz temporal high-pass filter to remove slow-frequency signal changes (91); the same filter was applied to all regressors in GLM analyses. Finally, we renormalized each voxel time series to have a mean of 100 to provide similar scaling of voxelwise regression coefficients across runs and participants.

Treatment of head motion

In addition to mitigating head motion-related artifacts using ICA-AROMA, we excluded runs in which more than 10% of volumes had a framewise displacement (FD) of 0.9mm or greater, as well as runs in which head movement exceeded 5mm at any point in the acquisition. This led to the exclusion of 11 runs total, yielding 549 total usable runs across participants. Furthermore, in voxelwise GLMs, we included the mean time series from deep cerebral white matter and the ventricles, as well as first derivatives of these signals, as confound regressors (90).

MEG Data acquisition

MEG data were acquired using an Elekta Neuromag VectorView MEG system (Elekta Oy, Helsinki, Finland) in a three-layer magnetically shielded room. The system comprised of 306 sensors, with 204 planar gradiometers and 102 magnetometers. In this project we only included data from the gradiometers, as data from magnetometers added noise and had a different amplitude scale. MEG data were recorded continuously with a sampling rate of 1000 Hz. We measured head position relative to the MEG sensors throughout the recording period using 4 continuous head position indicators (cHPI) that continuously emit sinusoidal signals, and head movements were corrected offline during preprocessing. To monitor saccades and eye blinks, we used two bipolar electrode pairs to record vertical and horizontal electrooculogram (EOG).

Preprocessing of MEG data

Flat or noisy channels were identified with manual inspections, and all data preprocessed using the temporal signal space separation (TSSS) method (92, 93). TSSS suppresses environmental artifacts from outside the MEG helmet and performs head movement correction by aligning sensor-level data to a common reference (94). This realignment allowed sensor-level data to be pooled across subjects group analyses of sensor-space data. Cardiac and ocular artifacts were then removed using an independent component analysis by decomposing MEG sensor data into independent components (ICs) using the infomax algorithm (95). Each IC was then correlated with ECG and EOG recordings, and an IC was designated as an artifact if the absolute value of the correlation was at least three standard deviations higher than the mean of all correlations. The non-artifact ICs were projected back to the sensor space to reconstruct the signals for analysis. After preprocessing, data were epoched to the onset of feedback, with a window from -0.7 to 1.0 seconds. Trials with gradiometer peak-to-peak amplitudes exceeded 3000 fT/cm were excluded.

Please note that the following processing step has NOT been applied to MEG data: "For each sensor, we computed the time-frequency decomposition of activity on each trial by convolving time-domain signals with Morlet wavelet, stepping from 2 to 40 Hz in logarithmic scale using 6 wavelet cycles".