# Dataset from "Retinotopic organization of visual cortex in human infants" Ellis, C. T., Yates, T. S., Skalaban, L. J., Bejjanki, V. R. Arcaro, M. J., & Turk-Browne, N. B., (2021). Retinotopic organization of visual cortex in human infants. *Neuron* This directory contains data used for the analyses of this manuscript. To analyze this data, you should refer to [https://github.com/ntblab/infant_neuropipe/tree/Retinotopy/](https://github.com/ntblab/infant_neuropipe/tree/Retinotopy/). In particular, [https://github.com/ntblab/infant_neuropipe/tree/Retinotopy/scripts/Retinotopy/Retinotopy.ipynb](https://github.com/ntblab/infant_neuropipe/tree/Retinotopy/scripts/Retinotopy/Retinotopy.ipynb) is a notebook which utilizes these files to recreate the figures reported in the paper. This notebook can be adapted to explore other analyses, which in some cases will create new files. It is expected that the contents of this folder will be placed in 'data/Retinotopy' of the infant\_neuropipe. We include traces from two mappers (CE and MA). Their similarities and differences are described in the manuscript but the general recommendation is that the MA traces are likely preferable, especially for V4. Note the terms phase and event are used interchangeably and refer to the 20s period of constant stimulation within a block. The scan sequences are as follows: >PETRA: TR1 = 3.32 ms, TR2 = 2250 ms, TE = 0.07 ms, flip angle = 6 degrees, matrix = 320 x 320, slices = 320, resolution = 0.94 mm isotropic, radial slices = 30,000 >SPACE: TR = 3200 ms, TE = 563 ms, flip angle = 120 degrees, matrix = 192 x 192, slices = 176, resolution = 1 mm isotropic >T2* gradient-echo EPI: TR = 2 s, TE = 30 ms, flip angle = 71 degrees, matrix = 64 x 64, slices = 34, resolution = 3 mm isotropic, interleaved slice acquisition ## File/directory descriptions: **contrast\_maps**: Statistical (z-statistic) maps for each participant used as the input to the main analysis. This data is in anatomical space, and thus is not aligned across participants. Use the matrices in 'transformation\_mats' (see below) to align the participants to standard space. Included in this folder are 3 contrasts for the spatial frequency conditions and 3 contrasts for the meridian mapping conditions. It also includes the partial results of a GLM where gaze exclusions are factored into the GLM. Finally, it also includes a partial set of the COPE files that were created in these analyses. To understand what the contrast numbers correspond to for the files in this folder, the following mapping was used: Spatial frequency (sf) > 1 - high > 2 - low > 3 - high>low Meridian mapping (meridian) > 1 - horizontal > 2 - vertical > 3 - horizontal>vertical **DICE\_btwn\_ppts\_?h.npy**: Numpy file containing the Dice similarity of each pairwise comparison between participants, separately for each hemisphere. This uses the traces from the mapper called CE. Note, this file can be recreated in the notebook but is provided here to help with compute time. **DICE\_btwn\_ppts\_?h\_other.npy**: Same as DICE\_btwn\_ppts\_?h.npy but it uses the traces from the mapper called MA. **gaze\_confounds**: This folder includes a text file for each participant which specifies the time points that are excluded (1) vs. included (0) using our conservative exclusion method. A time point is excluded if at least 25\% (i.e., 500ms) is excluded. This file is used in our analysis to create a confound matrix where there is a unique one-hot column for each time point that is excluded. This data could be recreated using the contents of the files in the `Trial_timecourse` folder. Note the number of excluded time points will not correspond to the proportion of excluded gaze data that is reported in the notebook: the number in the `participant_summary.csv` is the proportion of frames excluded in total. **iBEAT**: Critical folder for each participant containing the necessary anatomical and surface files. In terms of surface files, this includes the anatomical surfaces, statistical contrast maps in surface space, and the manual traces for each participant. This folder is based on a modified version of the freesurfer pipeline (created by `scripts/iBEAT/scaffold_iBEAT.sh` in our infant\_neuropipe) and borrows some files from freesurfer, although those files will be inaccurate. For instance, we ran freesurfer with the `mri/T1.nii.gz` file to get the `aseg.nii.gz` but the aseg file was realigned by FreeSurfer and so isn't appropriate for actual use. It is placed here because the aseg is a necessary but (hopefully) unimportant input into some of the freesurfer functions that are used. Some folders output from our pipeline (like `raw` and `scratch`) are removed from this to save space, but their contents are duplicated elsewhere (e.g., the segmentation volume). Moreover, some files in the SUMA subfolder were zipped (e.g., .nii.gz) to further save space. Each participants folder contains the following: - iBEAT\_QC\_summary.html: Compiles screenshots to evaluate the quality of the surface reconstruction. Should be able to view it in any browser. Uses images from `./screenshots/`. The images shown are intended to be sufficient for the Enigma QC. - label: dummy files from freesurfer to make other scripts run. DO NOT EXPECT TO BE ACCURATE - mri: Volumes used to create iBEAT and to reference for further computations. A critical file here is the T1.nii.gz: a face stripped anatomical file that was used as input to the iBEAT pipeline. For some participants there will also be a T2.nii.gz file that was provided to iBEAT for segmentation, when avalable. Files like the wm.nii.gz were created from the volume segmentation from iBEAT (`iBEAT_segmentation.nii.gz`). The main file from freesurfer that was used here is the aseg file (which may not be accurate). - screenshots: Screenshots taken to allow for the QC of the surface reconstruction - stats: Empty to be compliant with freesurfer commands - SUMA: Necessary directory to view flatmaps of the data. Contains all of the volume, surface and contrast files needed. These are often duplicates of those found elsewhere but converted into the 1D format. Just like a normal SUMA folder, std.141 and std.60 files are created, as well as various spec files. To quantify the area of the traced regions, `ROI_?h.areas_*` folders are created (e.g., `ROI_lh.areas_CE`) which quantify the region properties like surface area and the length of the traced lines. The manual tracing found in the SUMA folder has the following file names (some but not all are made by both mappers): >> ??h.ortho\_lines.niml.roi: dorsal vs ventral and left vs right files for lines running parallel to region boundaries. This file format is chosen because it preserves the order that points were placed. These were made using the CE traces. >> ??h.lines.niml.roi: dorsal vs ventral and left vs right files for lines running perpendicular to region boundaries. This file format is chosen because it preserves the order that points were placed. These were made using the CE traces. >> ?h.areas\_\*.1D.dset: left vs right files for the manually traced regions for either the tracer CE or MA. In this, the numbers correspond to regions as follows: vV1:1, vV2:2, vV3:3, vV4:4, dV1:5, dV2:6, dV3:7, dV3AB:8. If we were unsure about the region being traced then +10 was added to the number to allow it to be considered in analysis if desired. For instance, if we were unsure about the tracing of vV3 we would label it 13. >> ?h.between\_areas\_CE.\*: Manually traced regions bridging V1, V2 and V3 with codes 31, 32, and 33, respectively. These files were only created by the CE mapper. - surf: Surface files that are necessary for further analysis. The ?h.white and ?h.pial files are provided by iBEAT and then the other files are created based on it. We do not do any editing to the iBEAT files (e.g., smoothing) but we do make duplicate files to be compliant with the freesurfer commands that are needed (e.g., ?h.smoothwm is created for freesurfer commands, but is identical to ?h.wm). We also inflate the data (?h.inflated) and make spheres (?h.sphere). This folder also contains the gifti files of the relevant contrasts aligned to a 32k midthickness slice in order to allow for the creation of cifti files. This folder also includes the `fsaverage` and its SUMA folder as a participant which is used for comparing convexity files. **masks**: Contains the occipital masks (taken from the Harvard-Oxford atlas) that are aligned to each participants highres space. Folder also contains the max probability masks from Wang, Mruczek, Arcaro & Kastner (2015), in std.141 space, in which each node is assigned to the highest probability label, if there is one. **participant\_information.csv**: Summary table of the relevant participant information. The description of the column labels is below: - ID: Unique identifier for the session. These names (e.g., sXXXX\_Y\_Z) comprise three parts: sXXXX describes the unique family ID, the \_Y that follows is the sibling ID (counting up from the first child to participate in the family) and the final \_Z is the session number. Hence: s0001\_2\_4 would be the 4th session for the 2nd sibling in family s0001. - Age: Age in months of the participant at the time of scanning. - Sex: Sex assigned to the child by the parent. - Vertical phases: How many usable vertical events were there for this session - Horizontal phases: How many usable horizontal events were there for this session - Low phases: How many usable low spatial frequency events were there for this session - High phases: How many usable high spatial frequency events were there for this session - Blocks: How many blocks were attempted for this participant - Runs: How many runs were attempted for this experiment - prop\_TR\_included: What proportion of TRs in the included blocks of this task were included after motion exclusion (i.e., how many TRs had below 3mm of framewise translational motion) - prop\_eye\_included: What proportion of gaze frames in the included blocks of this task were included? - Intraframe reliability: On what proportion of frames did coders report the same code for the participant's looking behavior? - Coder number: How many coders were there for the gaze of the participant in this session? **plots**: Where the notebook (Retinotopy.ipynb) stores the plots created in the analysis. **raw\_nifti**: Raw functional data for each run where the Retinotopy task data was collected in these participants. If another task, not reported here, was completed in the same run then a pseudo-run was created in which the TRs corresponding to this task were sliced and separated. See below for description of pseudorun nomenclature. These runs include burn in (always 3 TRs). Reference the description of run\_burn\_in.txt below for more details about how to account for burn in. **raw\_timing**: Timing information for the start of each block and event for each participant. For each file the first column is the onset of the event or block, the second column is the duration of event or block and the third column is whether the block is included or not: - \*\_Retinotopy-\*.txt: Files indicating block level timing where the block names are used (i.e., horizontal\_first, vertical\_first, lowhigh, highlow). The onset column refers to the block start and the duration column is the block duration. - \*\_Retinotopy-\*\_Events.txt: Same as above but instead listing the events within these block types. If a block was completed and the data is usable then there should be two events per block. - \*\_Retinotopy-Condition\_\*.txt: Files where the condition names are used (i.e., horizontal, vertical, low, high). The onset column corresponds to event onsets, the duration column correspond to the event durations for events in those conditions **run\_burn\_in.txt**: (not in the zip) This file specifies the number of burn in TRs for each run in the raw\_nifti folder. Specifically, the number stated for each session/run should be removed from the start of each run. In this study, it was always 3. **surface\_QC.csv**: Contains the manually coded segmentation errors for each participant. The first column is their session num., second column is their age in months and third column lists the regions that have an error according to the ENIGMA QC. This is used to evaluate the alignment. **transformation\_mats**: The 4x4 affine transformation matrix (in .mat format) to align the data. One type of file is for aligning each functional in raw\_nifti to highres (files with \_highres, with one for each run). The other type is for aligning from highres to standard (files with \_highres2standard). **Trial\_timecourse**: This folder contains matlab files for the raw timecourse data of each participant. Each file contains several cells which define the coded looking data of each participant. The cells are 3d such that the first dimension is the block type (at least 4), the second is the block repetition (ranges from 1 to 4) and the third dimension is the event/phase counter (always 2). - timecourse\_all: contains the coded frame each participant was looking at for each frame. 1 means left, 2 means right, 3 means center, 7 means up, 8 means down, 0 means off screen, 6 means the eye was occluded. - timestamps\_all: what is the time, relative to the onset of the experiment code, that each frame was collected. - phase\_start\_all: what time did the phase onset. - phase\_end\_all: what time did the phase offset. NaNs mean that the phase was quit before the end. - phase\_name\_all: What was the name of the phase? horizontal, vertical, low or high. - Include\_Block: Is the block included (1) or excluded (0). Blocks can be excluded due to the eye movements or motion. - Include\_Events: Is the phase included. First element is whether eye data is usable (more than 75\% is usable) and the second element is whether the motion is usable (more than 50\% of TRs are usable) - Proportion\_EyeTracking\_Excluded: Report the precise proportion of excluded eye tracking data. **wholebrain\_tstat**: Contains the t stat maps across participants for the fstat test (which tests if any condition is significant). The tfce file is created using FSL's randomise. The tstat is calculated across participants based on the raw file (although the raw file is masked based on the intersect across participants). Run numbers indicate the nth run that was retained in that participant's session. If the number has a letter after it (e.g. functional03a) then that indicates it is a pseudorun and there is other data from this run that has been removed (because it pertained to another task, not reported here). All data within a run is continuous, no interleaved time points were removed. ## Replicating analyses The scripts in the infant\_neuropipe repository can be used to run the analyses reported in the paper. The Retinotopy.ipynb notebook can regenerate the figures. The supervisor scripts in the individual participant script directory can rerun the analyses.