Data and code for: Spatial cell type enrichment predicts mouse brain connectivity
Data files
Sep 01, 2023 version files 3.31 GB
Abstract
A fundamental neuroscience topic is the link between the brain's molecular, cellular and cytoarchitectonic properties and structural connectivity (SC). Recent studies relate inter-regional connectivity to gene expression, but the relationship to regional cell-type distributions remains understudied. Here, we utilize whole-brain mapping of neuronal and non-neuronal subtypes via the Matrix Inversion and Subset Selection (MISS) algorithm to model inter-regional connectivity as a function of regional cell-type composition with machine learning. We deployed random forest algorithms for predicting connectivity from cell type densities, demonstrating surprisingly strong prediction accuracy of cell types in general and particular cells like oligodendrocytes. We found evidence of a strong distance-dependency in the cell-connectivity relationship, with layer-specific excitatory neurons contributing the most for long-range connectivity, while vascular and astroglia are salient for short-range connections. Our results demonstrate a link between cell types and connectivity, providing a roadmap for examining this relationship in other species, including humans.
README: Data and code for: "Spatial Cell Type Enrichment Predicts Mouse Brain Connectivity"
These are the data files used to construct the results and figures for the manuscript: "Spatial Cell Type Enrichment Predicts Mouse Brain Connectivity" by Shenguan Sun and Justin Torok, et al.
Each .mat file is listed with descriptions of each variable, if more than one is stored.
Code files
- FIBoxplot.m: Function for constructing box plots
- Fig2.ipynb: Notebook for generating Figure 2 results
- Fig3.ipynb: Notebook for generating Figure 3 results
- Fig4.ipynb: Notebook for generating Figure 4 results
- Fig5.ipynb: Notebook for generating Figure 5 results
- Fig6.ipynb: Notebook for generating Figure 6 results
- FISuperclass.m: Function for parsing feature importance results from individual cell types into superclasses
- zeiselfeatureimp.m: General script for generating various figures and results
Data dependency files
- BrainFrame_Dependencies.mat: Data files relating to the 3D visualization of the connectome (see https://github.com/Raj-Lab-UCSF/Brainframe)
- classkey: Cell array of the Tasic, et al. cell-type names
- GENGDmod: 3D array of indices of the 212-region CCF atlas
- input_struct: Struct array of default settings for Brainframe
- nonzerovox: Vector of linear indices of brain voxels in GENGDmod
- structIndex: Cell array of linear indices corresponding to the voxels in GENGDmod in each region
- structList: Array of AIBS numerical identifiers for each region in GENGDmod
- CellDensity_Tasic_nG606.mat: Struct containing published, MISS-derived regional densities of the Tasic, et al. cell types. The regional densities used within this manuscript are contained in the array outstruct.Bmeans. (see DOI 10.1073/pnas.2111786119)
- Connectomes.mat: Connectomes derived from Oh, et al. (DOI: https://doi.org/10.1038/nature13186)
- C_dir: Array of connectivity values for the 424 (ipsilateral + contralateral) CCF regions
- C_sym: Symmetrized version of C_dir (not used here)
- Interregional_Distances_JLT.mat: Array of center-to-center distances between the CCF regions
- MRx3_L90_inds.mat: Published MRx3 gene order for Tasic, et al. (see https://github.com/Raj-Lab-UCSF/MISS-pipeline)
- regionvoxels.mat: Array of volumes of each of the 212 CCF regions
- Tasic_Inputs.mat: MISS input files, including gene expression information, for Tasic, et al. (see https://github.com/Raj-Lab-UCSF/MISS-pipeline)
- C_indivcells: Array of gene expression values for each individual cell in the Tasic, et al. dataset
- classkey: Cell array of the Tasic, et al. cell-type names
- classkey_subt: Cell array of the Tasic, et al. cluster names
- ct_labvec: Array of cell-type identifiers for each individual cell
- gene_names: Cell array of gene symbols
- GENGDmod: 3D array of indices of the 212-region CCF atlas
- listBmap: 3D array of indices of the 212-region CCF atlas, renumbered from GENGDmod
- nonzerovox: Vector of linear indices of brain voxels in GENGDmod
- structIndex: Cell array of linear indices corresponding to the voxels in GENGDmod in each region
- structList: Array of AIBS numerical identifiers for each region in GENGDmod
- subt_labvec: Array of cluster identifiers for each individual cell
- voxvgene: Array of voxel gene expression values from Lein, et al.
- Tasic_Ontology.mat: Cell array of names and superclasses of the Tasic, et al. cell types
- Tasic_scrambledC_nG606_new.mat: Array of scrambled matrices derived using the MISS pipeline for Tasic, et al. (see https://github.com/Raj-Lab-UCSF/MISS-pipeline)
- Taxonomic_Distance_Info.mat: Distance matrix and hierarchical clustering of the 212 CCF regions, derived from the AIBS developing mouse reference atlas (see https://developingmouse.brain-map.org/static/atlas)
- devmat: Array of cluster indices for the development-based hierarchical clustering of CCF regions
- taxonomic_distance_matrix: Array of inter-regional distances derived from devmat
- Zeisel_Inputs.mat: MISS input files, including gene expression information, for Zeisel, et al. (see https://github.com/Raj-Lab-UCSF/MISS-pipeline)
- C_indivcells: Array of gene expression values for each individual cell in the Zeisel, et al. dataset
- classkey: Cell array of the Zeisel, et al. cell-type names
- ct_labvec: Array of cell-type identifiers for each individual cell
- gene_names: Cell array of gene symbols
- GENGDmod: 3D array of indices of the 212-region CCF atlas
- listBmap: 3D array of indices of the 212-region CCF atlas, renumbered from GENGDmod
- nonzerovox: Vector of linear indices of brain voxels in GENGDmod
- structIndex: Cell array of linear indices corresponding to the voxels in GENGDmod in each region
- structList: Array of AIBS numerical identifiers for each region in GENGDmod
- voxvgene: Array of voxel gene expression values from Lein, et al.
- Zeisel_MRx3Inds.mat: Published MRx3 gene order for Zeisel, et al. (see https://github.com/Raj-Lab-UCSF/MISS-pipeline)
- Zeisel_Ontology.mat: Cell array of names and superclasses of the Zeisel, et al. cell types
- Zeisel_outstruct_nG1360.mat: Struct containing published, MISS-derived regional densities of the Zeisel, et al. cell types. The regional densities used within this manuscript are contained in the array outstruct.Bmeans. (see DOI 10.1073/pnas.2111786119)
- Zeisel_scrambledC_nG1360_new.mat: Array of scrambled matrices derived using the MISS pipeline for Zeisel, et al. (see https://github.com/Raj-Lab-UCSF/MISS-pipeline)
Results files
- Centrality_Results.mat: Array of centrality results presented in the manuscript.
- Tasic_FeatureImportance.mat: Arrays of random forest FI results for the Tasic, et al. dataset. Each variable corresponds to a different set of 10-fold cross-validation results run under conditions described in the manuscript, with descriptive names. See also FIBoxPlot.m.
- Zeisel_FeatureImportance.mat: Arrays of random forest FI results for the Zeisel, et al. dataset. Each variable corresponds to a different set of 10-fold cross-validation results run under conditions described in the manuscript, with descriptive names. See also FIBoxPlot.m.
Sharing/access Information
Below are the public resources used to create the datasets above:
- The connectome and parcellation information were derived from the Oh, et al. (AIBS) dataset (DOI: https://doi.org/10.1038/nature13186)
- The spatial gene expression data were derived from the Lein, et al. AGEA (DOI: https://doi.org/10.1038/nature05453)
- The 25-type scRNAseq dataset was derived from Tasic, et al. (AIBS) (DOI: 10.1038/s41586-018-0654-5)
- The 200-type scRNAseq dataset was derived from Zeisel, et al. (Linnarsson lab) (DOI: 10.1016/j.cell.2018.06.021)
- The processing of these data into cell-type-densities are described in the original MISS manuscript from Mezias, et al. (DOI: 10.1073/pnas.2111786119)
- The taxonomic matrix was obtained using the AIBS developing mouse atlas (https://developingmouse.brain-map.org/static/atlas)