Data from: Movies reveal the fine-grained organization of infant visual cortex

Ellis, Cameron 1 ; Yates, Tristan2; Arcaro, Michael3; Turk-Browne, Nicholas4

Published Oct 21, 2024 on Dryad. https://doi.org/10.5061/dryad.jm63xsjm3

Data files

Oct 21, 2024 version files 41.91 GB

0-predict_retinotopy.zip
3.96 GB
Aeronaut.zip
14.51 GB
Catepillar.zip
3.30 GB
Child_Play.zip
13.86 GB
Meerkats.zip
3.37 GB
Mouseforsale.zip
2.91 GB
README.md
19.72 KB

Abstract

Studying infant minds with movies is a promising way to increase engagement relative to traditional tasks. However, the spatial specificity and functional significance of movie-evoked activity in infants remains unclear. Here we investigated what movies can reveal about the organization of the infant visual system. We collected fMRI data from 15 awake infants and toddlers aged 5–23 months who attentively watched a movie. The activity evoked by the movie reflected the functional profile of visual areas. Namely, homotopic areas from the two hemispheres responded similarly to the movie, whereas distinct areas responded dissimilarly, especially across dorsal and ventral visual cortex. Moreover, visual maps that typically require time-intensive and complicated retinotopic mapping could be predicted, albeit imprecisely, from movie-evoked activity in both data-driven analyses (i.e., independent components analysis) at the individual level and by using functional alignment into a common low-dimensional embedding to generalize across participants. These results suggest that the infant visual system is already structured to process dynamic, naturalistic information and that fine-grained cortical organization can be discovered from movie data.

Ellis, Yates, Arcaro, & Turk-Browne

https://doi.org/10.7554/eLife.92119.1

This directory contains data used for the analyses in the manuscript titled “Movies reveal the fine-grained organization of infant visual cortex” published in eLife. To analyze this data, you should refer to https://github.com/ntblab/infant_neuropipe/tree/predict_retinotopy/. In particular, https://github.com/ntblab/infant_neuropipe/tree/predict_retinotopy/scripts/predict_retinotopy/predict_retinotopy.ipynb is a notebook which utilizes this data to recreate the figures reported in the paper. This notebook can be adapted to explore other analyses, which in some cases will create new files.

To run the notebooks on this data, you should store unzip the 0_predict_retinotopy file into the folder data/predict_retinotopy in the neuropipe project.

Participants saw two movie types (refer to the manuscript for more details):

MM: computer generated movies that are akin to Pixar shorts. Aeronaut was silent, but Catepillar, Meerkats, and Mouseforsale had sound. Some of the movies contained drops, in which the movie went blank for 10s.
ChildPlay: four photorealistic movies played in the same order.

The movies cannot be shared publicly due to copyright. Please contact the corresponding author if you would like a copy of the movies for research purposes.

Participants also completed retinotopy data collection, as described in this paper.

The scan sequences are as follows:

PETRA: TR1= 3.32 ms, TR2= 2250 ms, TE = 0.07 ms, flip angle = 6 degrees, matrix = 320 x 320, slices = 320, resolution = 0.94 mm isotropic, radial slices = 30,000
T2* gradient-echo EPI: TR = 2 s, TE = 30 ms, flip angle = 71 degrees, matrix = 64 x 64, slices = 34, resolution = 3 mm isotropic, interleaved slice acquisition

The dataset contained here is only partial, it does not contain all of the data used in these analyses. You will additionally need to have the Retinotopy data reported previously. This data must be stored in the data/Retinotopy/ directory in the neuropipe.

The folders movie_1d, IC_surfs, and SRM_surfs are empty because these files are created by the notebook and are very large. These folders contain functional data that is aligned to surface space. This is a deterministic operation so the results should be identical when you remake the files, as long as you have the right software (AFNI/SUMA). If you don’t want to/can’t recreate these files, you can set the parameter skip_surf_generation to 1 in the notebook to skip it. The notebook should still run since the surfaces are intermediate files, and the end products are already created.

Infant-specific files are named with the participant name at the start. The infant participant names are comprised of three parts: sXXXX describes the unique family ID, the _X that follows is the sibling ID (counting up from the first child to participate in the family) and the final _X is the session number. Hence: s0001_2_4 would be the 4th session for the 2nd sibling in family s0001. The adult participant names have the form adult_ret_XX where XX is the participant number. These infant participant IDs are consistent across datasets from the NTB Lab. Adult participants are named based on whether they have retinotopy and movies (adult_ret) or just movies (mov). Run numbers indicate the nth run that was retained in that participant’s session. If the number has a letter after it (e.g. functional03a) then that indicates it is a pseudorun and there is other data from this run that has been removed (because it pertained to another task, not reported here).

All files with the suffix ‘.nii.gz’ or ‘.nii’ can be opened using the freely-available fMRIB Software Library (FSL) or open-source software FreeSurfer (FreeSurfer).

File/directory descriptions:

0_predict_retinotopy: This zip file contains the main data for the project

adult_data: Equivalent adult data for what is stored in the data/Retinotopy/ folder for infants. Specifically there are three folders:

contrast_maps: This contains the contrast maps for the meridian and spatial frequency contrast for each adult participant. These data are stored in the high resolution anatomical space.
freesurfer: This contains a SUMA folder for each participant that is created based on the freesurfer for that participant.
SRM_prediction_native: This contains folders for each adult participant representing the output of the SRM prediction. These are an intermediate product that will be realigned to the freesurfer space, as described in the notebook.

adult_participants.csv: This is a csv storing the data for the adults who have retinotopy data. The column labels are: Age is recorded in months. Sex is the assigned sex at birth. Vertical phases how many usable epochs of vertical stimulation from retinotopy. Horizontal phases how many usable epochs of horizontal stimulation from retinotopy. Low phases how many usable epochs of low spatial frequency from retinotopy. High phases how many usable epochs of high spatial frequency from retinotopy. Blocks how many blocks of retinotopy were run. Runs How many runs contained retinotopy data. prop_TR_included what proportion of TRs are usable. prop_eye_included What proportion of frames from retinotopy are usable. Intraframe reliability What proportion of retinotopy frames are coded the same between coders. ` Coder number` how many people coded the gaze of the participants.

concat_movies: Concatenated movie watching data used as the raw data for homotopy and ICA analyses. This stitches together movies with the timing specified in the concat_movies_timing_file folder. This data has been preprocessed through neuropipe. There are similarities with this data and what was uploaded in a previous publication, however, this submission includes more data. The file name includes the participant name and the movie data being aggregated, either MM or ChildPlay. Analyses include the rest epochs between the movie runs.

concat_movies_motion: The framewise displacement, in millimeters, of each TR, stored for each infant participant.

concat_movies_timing_file: Files containing the relevant timing and movie information for each participant run. These are arranged as 4 columns and each row represents a movie clip. The first column is the name of the movie being shown. The second column is the onset time of the movie (in seconds) in the corresponding concat_movies file. The third column is the duration of the movie (in seconds). The fourth column is the run of the raw data it comes from.

func_summaries_line: A summary of the response of a map along the gradient lines. For instance s1607_1_4_meridian_f-10_adult_avg_occipital.pkl means that for participant s1607_1_4 we take the gradients corresponding to the test for meridian maps (i.e., lines perpendicular to the region boundaries) and apply them to the SRM analysis (i.e., using 10 features trained on adults to predict voxels in the occipital lobe). Files exist for each participant’s real retinotopic maps, as well as the results of the ICA and the SRM analyses.

gaze_exclusion.txt: Proportion of gaze data where the participant was not looking at the stimulus, separated for ICA and SRM data (although usually the same). This is needed because it isn’t in the participant csv

IC_codes: Manually labeled IC components. Each file is two columns where each row is a component that was selected. The first column refers to the IC number (i.e., the nth volume of the file in IC_vols, zero-indexed) and the second column is what the component was labeled as (1=meridian, 2=spatial frequency)

IC_mirror_flip: Contains the generated test for mirror flipping the ICs. This folder does not include the generated files (e.g. lh.s1607_1_4.1d.dset) because they are generated by the notebook. Instead, it includes the code for each of these files seed_file.csv and the manually labelled components for each participant. The notebook then uses this to see how many components were found for the mirrored ICs.

IC_surfs: Folders for each participant that contain surfaces for each of the ICs in IC_vols. These surfaces are intermediate products needed for some analyses. This will be empty until populated by the notebook.

IC_vols: Volumes containing the IC components. These volumes are the result of FSL’s melodic, and only the volume is stored to avoid bloat. The command to run the ICA was: melodic -i analysis/secondlevel_MM/default/NIFTI/func2highres_MM_Z.nii.gz -o analysis/secondlevel_MM/default/func2highres_MM_Z.ica -v --nobet --bgthreshold=1 --tr=2 -d 0 --mmthresh=0.5 --report --guireport=analysis/secondlevel_MM/default/func2highres_MM_Z.ica/report.html, which was run in each participant directory. The ICs are in descending order of variance explained. These volumes are aligned to high resolution anatomy but are in native resolution

infant_participants.csv: This is a csv storing the data for the infants who have retinotopy data. The column labels are: Age is recorded in months. Sex is the assigned sex at birth. Vertical phases how many usable epochs of vertical stimulation from retinotopy. Horizontal phases how many usable epochs of horizontal stimulation from retinotopy. Low phases how many usable epochs of low spatial frequency from retinotopy. High phases how many usable epochs of high spatial frequency from retinotopy. Blocks how many blocks of retinotopy were run. Runs How many runs contained retinotopy data. prop_TR_included what proportion of TRs are usable. prop_eye_included What proportion of frames from retinotopy are usable. Intraframe reliability What proportion of retinotopy frames are coded the same between coders. ` Coder number` how many people coded the gaze of the participants.

masks: Binary mask volumes aligned to high resolution space from standard space. Used for SRM to reduce the dimensionality of the analysis.

movie_1d: The functional data in surface space. Each row is a voxel in the brain and each column is a timepoint of the movie data from concat_movies. This format is necessary for homotopy but is too data intensive to upload (the whole folder is about 20gb). This folder will only contain a few files of the form homotopy*.npy file until populated by the notebook. This file is the summary of the results needed to produce the homotopy figures. Delete these numpy files if creating new data that you want to use.

plots: Directory containing the plots generated from the notebook. It is empty until the notebook is run

retinotopy_ppts_movies.txt: A file mapping participants on to the movies that they saw. The first word is the participant name and then each subsequent word on that line is a movie they saw

SRM_participant_csv: Folder containing the participant summaries for infants that participated in the movies used for SRM. This information is needed to create descriptive statistics used in the reporting. The information follows the same format as infant_participants.csv.

SRM_prediction: Outputs of the SRM_predict_retinotopy.py script. This contains folders for each participant in which their data is held out while fitting SRM. The resulting files are predictions of their retinotopic maps based on other participants, as specified by the file name. The folder also contains text files that correlate the predicted and ground truth maps (using the entire masked region). Finally the folder contains pickle files that have the mapping between movies and participants. For instance, infant2movie.pkl has all the movies each infant retinotopy participant saw, and movie2infant.pkl has all participants (including non-retinotopy participants) who saw a movie.

SRM_surfs: Like IC_surfs, these are folders for each participant that contain the relevant SRM volumes (from SRM_prediction) converted into surface space. This will be empty until populated by the notebook.

time_segment_matching: Contains outputs of the time_segment_matching_features.py script. Time segment matching takes a segment of data from the movie (10 TRs) of one participant and tries to where in the movie it came from using the data from all other participants. Since there are many time points, the chance of success is quite low. This folder contains text files of the results where each text file is different parameter choice (e.g., how many features to use in the SRM, what is the mask applied to the data). The first column of the text file is the participant name, the second column is the group used for fitting the SRM, the third column is the movie watched, the fourth column is the time segment matching accuracy (as a proportion) and the final column is the chance rate (it varies based on the movie duration and how many time points were excluded.

Movie folders Aeronaut, Child_Play, Catepillar, Meerkats, Mouseforsale: Each of these zip files contains the movie data used for SRM analyses. You should unzip these folders and store them each in a folder within 0_predict_data called SRM_movies (e.g., 0_predict_data/SRM_movies/Aeronaut/). For each movie, all data within a run is continuous, no interleaved time points were removed; except in Child_Play. Childs_Play is 4 movies stitched together. We have removed rest periods by time shifting each movie by 4s and removing rest periods between the movies.

adult_participants.csv: Summary table of the participant information, including: the participant ID, participant age (in years), participant sex (male or female), the location of the scan, the session number, the total number of TRs collected (num_TR), the proportion of TRs that were usable after motion exclusion (prop_TR_motion), the proportion of TRs that were usable after eye-tracking exclusion (prop_TR_eye), the proportion of frames that were coded the same between gaze coders (eye_reliability; left blank if there was only one gaze coder), and the number of gaze coders (coder_num). If any values are left blank then they were not available.

anatomicals: Anatomical images used for alignment. Facial information has been stripped for anonymity. These were collected using the PETRA sequence (for infants) or the MPRAGE sequence (for adults) defined above. In some cases, more than one scan has been averaged to improve quality.

eye_confounds: Text files with 1s for TRs that should be excluded for eye closure

infant_participants.csv: Summary table of the participant information, including: the participant ID, participant age (in months), participant sex (male or female), the location of the scan, the session number, the total number of TRs collected (num_TR), the proportion of TRs that were usable after motion exclusion (prop_TR_motion), the proportion of TRs that were usable after eye-tracking exclusion (prop_TR_eye), the proportion of frames that were coded the same between gaze coders (eye_reliability), and the number of gaze coders (coder_num). Eye-tracking data is missing from one participant. If any values are left blank then they were not available.

motion_confounds: Text files with 1s for TRs that should be excluded for excessive motion (>3mm translational motion)

preprocessed_native: Contains a folder called linear_alignment that has nifti files for preprocessed functional data during movie watching. All functional images were linearly aligned to native anatomical space

preprocessed_standard: Contains nifti files for preprocessed functional data during movie watching that have been aligned to standard MNI space either linearly with manual adjustment (subfolder linear_alignment) or nonlinearly with ANTs (subfolder nonlinear_alignment)

raw_nifti: Raw functional data for each run where movie task data was collected in these participants. If another task, not reported here, was completed in the same run then a pseudo-run was created in which the TRs corresponding to this task were sliced and separated.

raw_timing: Timing information for the start of each block and event for each participant. For each file the first column is the onset of the event or block in seconds, the second column is the duration of event or block in seconds and the third column is the weight. (Note that all participants but s5037_1_1 for Aeronaut only saw the movie once in the session)

run_burn_in.txt and run_burn_in_adults.txt: File with subject name, functional run, and number of TRs in the burn-in for that run (by default should be 3, but may differ)

transformation_mats: The 4x4 affine transformation matrix (in .mat format) to align the data. One type of file is for aligning each functional in raw_nifti to highres (files with _highres, with one for each run). The other type is for aligning from highres to standard (files with _highres2standard).

transformation_ANTs: Contains ANTs folders for each participant. These were created by run_ANTs_highres2standard.sh and were used to create the nonlinear registration to infant standard and linear registration to adult MNI standard. Note that for adults, the infant_standard2standard.mat is an identity matrix, meaning that alignment to ‘infant_standard’ vs ‘standard’ is identical, and all references to ‘infant_standard’ actually refer to adult MNI space.

example_func2highres.nii.gz: functional image of the centroid TR that minimizes the Euclidian distance between TRs aligned to highres anatomical space\
example_func2infant_standard.nii.gz: functional image of the centroid TR that minimizes the Euclidian distance between TRs aligned to infant standard space\
example_func2standard.nii.gz: functional image of the centroid TR that minimizes the Euclidian distance between TRs aligned to adult MNI standard space\
example_func.nii.gz: functional image of the centroid TR in its native 3mm space\
fs_alignment.mat: transformation matrix that aligns fs_vol.nii.gz to highres_brain.nii.gz (6 degrees of freedom)
fs_brain.nii.gz: freesurfer-outputted highres anatomical image rotated and masked to only show brain voxels\
fs_vol.nii.gz: freesurfer-outputted highres anatomical image in 1mm space\
highres2infant_standard_0GenericAffine.mat: transformation matrix used to move from highres to infant standard space\
highres2infant_standard_1Warp.nii.gz: warp file used by ANTs to move from high resolution to infant standard space. Note that this was downsampled to 3mm to GREATLY reduce folder size, but we used 1mm warps in the data\
highres2infant_standard_InverseWarped.nii.gz: infant standard image aligned to highres space via ANTs\
highres2infant_standard_Warped.nii.gz: highres anatomical image aligned to infant standard space via ANTs\
highres2standard.nii.gz: highres anatomical image aligned to adult MNI standard space\
infant_standard2standard.mat: linear transformation matrix between infant standard and adult MNI standard space\
highres_brain.nii.gz: highres anatomical image masked to only show brain voxels\
infant_standard.nii.gz: infant standard image, determined based on the child’s age\
mask.nii.gz: mask to facilitate anatomical alignment to standard, manually edited from freesurfer output