# Dataset from "Attention recruits frontal cortex in human infants" Ellis, C. T., Skalaban, L. J., Yates, T. S., & Turk-Browne, N. B. (2021) Attention recruits frontal cortex in human infants. *Proceedings of the National Academy of Sciences. 118*, e2021474118. doi: 10.1073/pnas.2021474118 This directory contains data used for the manuscript's analyses. To analyze this data, you should refer to https://github.com/ntblab/infant\_neuropipe/tree/PosnerCuing/. In particular, https://github.com/ntblab/infant\_neuropipe/tree/PosnerCuing/scripts/PosnerCuing/PosnerCuing.ipynb is a notebook which utilizes these files to recreate the figures reported in the paper. This notebook will also contextualize and demonstrate how you should use the files. For instance, the contents of 'RT\_data' are numpy pickle files that are hard to interpret but the notebook provides context. To integrate the data with 'infant\_neuropipe' and the notebook mentioned above, this directory's contents should be in 'data/PosnerCuing'. This notebook can be adapted to explore other analyses. The scan sequences are as follows: >PETRA: TR1= 3.32 ms, TR2= 2250 ms, TE = 0.07 ms,flip angle = 6 degrees, matrix = 320 x 320, slices = 320, resolution = 0.94 mm isotropic, radial slices = 30,000 >T2* gradient-echo EPI: TR = 2 s, TE = 28/30 ms, flip angle = 71 degrees,matrix = 64 x 64, slices = 36/34, resolution = 3 mm isotropic, interleaved slice acquisition. ## File/directory descriptions: - anatomicals: Anatomical images used for alignment. Facial information was stripped for anonymity. These were collected using the PETRA sequence defined above. In some cases, we averaged more than one scan to improve quality. - contrast\_maps: Statistical (z-stat) maps for each participant used as the input to the main ROI analysis. This data is in MNI 152 1mm space and thus aligned across participants. Use the matrices in 'transformation_mats' (see below) to align different participants. Included in this folder are 9 contrasts, for each participant, run as part of the GLM: >> 1 - valid\_invalid: valid > invalid >> 2 - valid\_neutral: valid > neutral >> 3 - invalid\_neutral: invalid > neutral >> 4 - valid\_neutral-invalid: valid > (neutral + invalid) >> 5 - invalid\_neutral-valid: invalid > (neutral + valid) >> 6 - valid: valid >> 7 - neutral: neutral >> 8 - invalid: invalid >> 9 - valid-invalid\_neutral: (valid + invalid) > neutral - LOO\_ROIs: Contains the data for the leave one participant out analysis. Each participant has a folder for when **their** data is held out. Hence the remaining 23 participants are used to create the relevant files. This process generates multiple large files that we deleted for sharing, but they can be recreated using the `merge\_ppts\_LOO` function in the notebook. Folders contain the contrasts, as used in 'contrast\_maps', as well as one of the following suffixes: >> \_tstat1.nii.gz: The t statistic map for the relevant contrast, used as the input into FSL's cluster algorithm to find ROIs. By default, this is not the t statistic map used. >> \_tstat2.nii.gz: The t statistic map just as above, but for the opposite direction of the relevant contrast. This is the map that is used since it is the direction of the effect observed (i.e., invalid shows greater activity across the brain than valid). >> \_ROIs.txt: The output of FSL cluster listing the clusters found, along with information about their size and location >> \_ROI\_names.txt: Lists the clusters formed, their index number, and the name of the region in the Harvard-Oxford atlas. >> \_ROIs.nii.gz: A volume with the clusters that formed, labeled by their index in ROI\_names. - masked\_data: Data directly used for analyses that are stored as nifti files. In particular, these are the standard space contrast maps for each participant, concatenated in time and masked to include only voxels that are present in all participants. The order of the volumes is the order of the data in participant\_information.csv - participant\_information.csv: Summary table of the relevant participant information. The description of the column labels is below: >> ID: Unique identifier for the session. These names (e.g., sXXXX\_Y\_Z) comprise three parts: sXXXX describes the unique family ID, the \_Y that follows is the sibling ID (counting up from the first child to participate in the family) and the final \_Z is the session number. Hence: s0001\_2\_4 would be the 4th session for the 2nd sibling in family s0001. >> Age: Age in months of the participant at the time of scanning. >> Sex: Sex assigned to the child by the parent. >> Location: Site where we collected data. Either the Magnetic Resonance Research Center (MRRC) or the Brain Imaging Center (BIC). >> Total blocks: How many blocks (contains 8 trials) were attempted with this session? >> Valid events: How many usable valid trials (cue was congruent with the target location) were there for this session? >> Neutral events: How many usable neutral trials (two cues were presented) were there for this session? >> Invalid events: How many usable invalid trials (cue was incongruent with the target location) were there for this session? >> prop\_TR\_included: What proportion of TRs in the included blocks of this task were included after motion exclusion (i.e., exceeding 3mm of framewise translational motion) >> Intraframe reliability: On what proportion of frames did coders report the same code for the participant's looking behavior? >> Coder number: How many coders were there for the gaze of the participant in this session? - plots: Where the notebook (PosnerCuing.ipynb) stores the plots created in the analysis. - raw\_nifti: Raw functional data for each run that we collected Posner Cuing task data. If another task, not reported here, was completed in the same run, then a pseudo-run was created in which the TRs corresponding to this task were sliced and separated. - raw\_timing: Timing information for the start of each block and event for each participant. For each file, the first column is the onset of the event or block, the second column is the duration of the event or the block, and the third column is the weight. Different file types are provided to facilitate ease of analysis: >> \_PosnerCuing-Exogenous: Onset and duration information for each block. Weight refers to whether the block is included. >> \_PosnerCuing-Exogenous\_Event: Onset and duration information for each event, pooling across blocks. Weight refers to whether the event is included. >> \_PosnerCuing-Condition\_Exogenous\_Valid: Same as '\_PosnerCuing-Exogenous\_Event' but only for Valid events. >> \_PosnerCuing-Condition\_Exogenous\_Neutral: Same as '\_PosnerCuing-Exogenous\_Event' but only for Neutral events. >> \_PosnerCuing-Condition\_Exogenous\_Invalid: Same as '\_PosnerCuing-Exogenous\_Event' but only for Invalid events. >> \_PosnerCuing-Condition\_Exogenous\_Left: Same as '\_PosnerCuing-Exogenous\_Event' but only for events where the target appeared on the left. >> \_PosnerCuing-Condition\_Exogenous\_Right: Same as '\_PosnerCuing-Exogenous\_Event' but only for events where the target appeared on the right. >> \_PosnerCuing-Condition\_Exogenous\_RT: Same as '\_PosnerCuing-Exogenous\_Event' but each event has their RT (in seconds) for their weight. - ROIs: Contains the ROIs generated from neurosynth used in the analysis. The 'ns\_raw.nii.gz' volume is the main one considered in the manuscript. The ROIs in this file were based on clustering the map 'ns\_tstat.nii.gz' and manually splitting two of these clusters into ROIs. 'ns\_dilated.nii.gz' is a dilated version of 'ns\_raw.nii.gz'. 'ns\_spheres.nii.gz' has 10mm radius spheres around the peak voxels ('ns\_peaks.nii.gz'). - RT\_data: Summary data of participant behavior needed to do the reaction time analyses in the notebook. These files are a Python pickle of a dictionary. The dictionary's keys specify the trial type ('valid', 'neutral' or 'invalid') and contain all of the response time values for each usable trial. - transformation\_mats: The 4x4 affine transformation matrix (in .mat format) to align the data. One type of file is for aligning each functional in raw\_nifti to highres (files with \_highres, with one for each run). The other type is for aligning from highres to standard (files with \_highres2standard). - transformation\_ANTs: Contains ANTs folders for each participant. These were created by run\_ANTs\_highres2standard.sh and were used to create the ANTs files found in 'wholebrain\_tstat' and 'masked\_data'. - Trial\_timecourse: Stores images visualizing the looking data for each block. The blue lines correspond to the coded location (breaks in the line mean the participant was looking off-screen or their eyes were closed). Red lines demarcate the cue presence. Green lines demarcate the target presence. This folder also contains matlab files for the raw timecourse data of each participant. Each file contains several cells which define the coded looking data of each participant. The cells are 3d such that the first dimension is the block type (always 1), the second is the block repetition (ranges from 4 to 7), and the third dimension is the trial counter (1 to 8). >> timecourse\_all: contains the coded frame each participant was looking at for each frame. 1 means left, 2 means right, 3 means center, 0 means off-screen, and 6 means the eye was occluded. >> timestamps\_all: what is the time, relative to the onset of the experiment code, that each frame was collected. >> CueOns\_all: what time did the cue onset. NaNs mean that the cue wasn't presented before we quit out of the block. >> TargetOns\_all: what time did the target onset. >> CueSide\_all: what side did the cue appear on. 1 is left, 2 is right, 0 means it appeared on both sides. >> TargetSide\_all: what side did the target appear on. 1 is left, 2 is right. - wholebrain\_tstat: Contains the tstat maps across participants for each relevant condition, calculated using FSL's randomise. In other words, these analyses use merged 'contrast\_maps' as inputs. Run numbers indicate the nth run that was retained in that participant's session. If the number has a letter after it (e.g., functional03a), then that indicates it is a pseudorun and there is other data from this run that has been removed (because it pertained to another task, not reported here). All data within a run is continuous: no interleaved time points were removed. ## Replicating analyses The scripts in the infant\_neuropipe repository can be used to run the analyses reported in the paper. The PosnerCuing.ipynb notebook can regenerate the figures. Scripts in the infant\_neuropipe directory can rerun the analyses starting with the raw data. Refer to the infant\_neuropipe README for direction.