Searching for the cellular underpinnings of the selective vulnerability to tauopathic insults in Alzheimer's disease
Data files
Feb 04, 2025 version files 5.29 GB
-
AD_Genes.mat
382 B
-
CCF_labels.mat
8.25 KB
-
CellDensity_Yao2021_all.mat
2.29 GB
-
kim_density_listB_order.mat
3.86 KB
-
Mouse_Celltype_SpatialNulls.zip
287.99 MB
-
Mouse_Tauopathy_Data_HigherQ.mat
22.24 KB
-
README.md
10.32 KB
-
Yao_Dependencies.mat
1.35 GB
-
Yao_Inputs.mat
1.36 GB
-
Yao_MRx3_inds.mat
6.64 KB
Abstract
Neurodegenerative diseases such as Alzheimer's disease exhibit pathological changes in the brain that proceed in a stereotyped and regionally specific fashion. However, the cellular underpinnings of regional vulnerability are poorly understood, in part because whole-brain maps of a comprehensive collection of cell types have been inaccessible. Here, we deployed a recent cell-type mapping pipeline, Matrix Inversion and Subset Selection (MISS), to determine the brain-wide distributions of pan-hippocampal and neocortical cells in the mouse, and then used these maps to identify general principles of cell-type-based selective vulnerability in PS19 mouse models. We found that hippocampal glutamatergic neurons as a whole were significantly positively associated with regional tau deposition, suggesting vulnerability, while cortical glutamatergic and GABAergic neurons were negatively associated. We also identified oligodendrocytes as the single most strongly negatively associated cell type. Further, cell-type distributions were more predictive of end-time-point tau pathology than AD-risk-gene expression. Using gene ontology analysis, we found that the genes that are directly correlated to tau pathology are functionally distinct from those that constitutively embody the vulnerable cells. In short, we have elucidated cell-type correlates of tau deposition across mouse models of tauopathy, advancing our understanding of selective cellular vulnerability at a whole-brain level.
README: Searching for the cellular underpinnings of the selective vulnerability to tauopathic insults in Alzheimer's disease
https://doi.org/10.5061/dryad.h18931zwv
Description of the data and file structure
Files and variables
File: Yao_Dependencies.mat
Description: Contains all cell density and gene expression files required to run the analyses in the paper.
Variables
- classkey: 42 x 1 cell array of Yao cell-type names
- gene_names: 3763 x 1 cell array of gene symbols for the intersection set of genes between the scRNAseq (Yao et al, 2021, Cell) and ISH (Lein et al, 2007, Nature) datasets
- geneinds: 1 x 3763 numeric array of MRx3-reordered gene indices in "gene_names" (see Mezias et al, 2022, PNAS and https://github.com/Raj-Lab-UCSF/MISS-Pipeline)
- genevct: 3763 x 42 numeric array of cell-type-specific gene expression across all 42 Yao cell types and 3763 intersection-set genes
- genevct_allgenes: 31053 x 42 numeric array of cell-type-specific gene expression across all 42 Yao cell types and all genes profiled by Yao et al
- GENGDmod: 67 x 41 x 58 numeric array containing the AGEA atlas parcellation per 200-um voxel in the AIBS native region ID space for CCFv2 (Oh et al, 2014, Nature, not used)
- listBmap: 67 x 41 x 58 numeric array containing the AGEA atlas parcellation per 200-um voxel in our renamed region ID space
- nonzerovox: 50246 x 1 numeric array of linear indices corresponding to cerebrum voxels in listBmap and GENGDmod
- outstruct: 1 x 1 MATLAB struct object containing optimal Yao cell-type densities. This corresponds to nG = 1300 and index 181 ("elbowind") of the outstruct contained within "CellDensity_Yao2021_all.mat" (see the description of the full "outstruct" object in this .mat file for a full explanation of struct fields)
- structIndex: 212 x 1 cell array containing all linear indices of each CCFv2 region in the AGEA (not used)
- structList: 212 x 1 numeric array of AIBS region IDs of CCFv2 regions (not used)
- voxvgene: 50246 x 3763 numeric array of gene expression for all 3763 genes across all 50246 cerebrum voxels, in "nonzerovox" index order
File: CellDensity_Yao2021_all.mat
Description: Contains the MISS predictions for the Yao cell-type densities
Variables
- outstruct: 1 x 249 MATLAB struct object containing all cell-type inference data, with the following fields:
- resnorm: scalar, residual from
lsqnonneg
when inferring cell-type distributions - fronorm: scalar, Frobenius norm of the residual from
lsqnonneg
when inferring cell-type distributions - corrB: 50246 x 42 matrix of inferred densities for each of the 42 Yao cell types in each of the 50246 200-um voxels
- nGen: scalar, number of MRx3 genes used in cell-type inference (see Mezias et al, 2022, PNAS for details)
- lambda: scalar, MRx3-related hyperparameter for thresholding gene selection (see Mezias et al, 2022, PNAS for details)
- Bsums: 426 x 42 matrix of the summed densities for each of the 42 Yao cell types in the 426-region CCFv2 parcellation
- Bmeans: 426 x 42 matrix of the averaged densities for each of the 42 Yao cell types in the 426-region CCFv2 parcellation
- resnorm: scalar, residual from
- ng_param_list: 1 x 249 numeric array of nG values used for cell-type inference, corresponding to the "nGen" field of "outstruct"
- classkey: 42 x 1 cell array of cell-type names (duplicated in "Yao_Dependencies.mat")
- geneinds: 1 x 3763 numeric array of MRx3-reordered gene indices in "gene_names" ("Yao_Dependencies.mat"). For each entry of "outstruct" within this file, this array was sliced from 1 to the corresponding "nGen" value to slice the rows of the scRNAseq gene expression array (derived from "genevct" in "Yao_Dependencies.mat") for cell-type inference. See also Mezias et al, 2022, PNAS and https://github.com/Raj-Lab-UCSF/MISS-Pipeline (duplicated in "Yao_Dependencies.mat")
- elbowind: scalar, index of "outstruct" where optimal cell densities were achieved (nGen = 1300), using
ElbowSelector_MRx3
File: Mouse_Celltype_SpatialNulls.zip
Description: Contains all spin-nulls created using the BrainSMASH toolbox (https://brainsmash.readthedocs.io/en/latest/) (Burt et al, 2020, *NeuroImage). The Mouse_CellTypes_Null-AutoCorr.ipynb
file is a Jupyter notebook containing all code required to generate these spin nulls.
Variables (per .mat file; each corresponds to the spin nulls for a different Yao cell type)
- celltype: string, name of the Yao cell type whose regional densities are being spun
- nulls: 426 x 10000 numeric array corresponding to the "densities" across the 426 CCFv2 brain regions for "celltype", where each of the 10000 columns is a different randomly generated spin null
File: Mouse_Tauopathy_Data_HigherQ.mat
Description: Contains all regional tauopathy data used in this study (see the Tauopathy Experiments subsection within Description of the data and file structure for sources of each experiment's data).
Variables
- mousedata_struct: MATLAB struct object containing all tauopathy data, with field names corresponding to the 12 mouse experiments. Each field has the following variables:
- data: number of regions x number of time points numeric array of quantified tau pathology
- time_stamps: 1 x number of time points numeric array of times post injection when tau pathology was quantified; all given in units of months except for the "DS" experiments (Kaufman et al, 2016, Neuron), which are given in weeks
- regions: number of regions x 2 cell array. The first column contains the names of the regions where tau was quantified, in the order of "data"; the suffixes "I" and "C" indicate regions ipsilateral and contralateral to the injection site, respectively
- seed: 44 x 1 numeric array of logical values, where a "1" indicates a region in "regions" and "data" that was seeded with tau. If seed is NaN, then that experiment either had no seed (Hurtado) or the seed was outside of the CCFv2 parcellation (IbaP301S)
File: AD_Genes.mat
Description: Contains the AD risk gene names from Bateman et al, 2023, Nucleic Acids Res. that were also contained within the 3763-intersection gene set
Variables
- genelist: 1 x 24 cell array of AD risk gene symbols
File: kim_density_listB_order.mat
Description: Regional densities of interneurons from Kim, et al. 2017, Cell
Variables
- kim_dense_reorder: 212 x 3 numeric array of Pvalb (column 1), Sst (column 2), and Vip (column 3) densities across 212 bilateral (i.e., not split up by hemisphere) regions in CCFv2 space. The 12th (and 225th) region of the 426-region parcellation is not included for lack of quantification
File: Yao_Inputs.mat
Description: Contains all inputs required for MISS inference of Yao cell types.
Variables
The variables mostly overlap with "Yao_Dependencies.mat", but does not contain cell densities or MRx3 gene indices (as these are outputs rather than inputs of MISS). There is one unique variable:
- gene_names_yao: 31053 x 1 cell array of gene symbols for the intersection set of genes profiled within the scRNAseq dataset (Yao et al, 2021, Cell)
File: Yao_MRx3_inds.mat
Description: Contains the reordering of genes based on MRx3 information content.
Variables
- geneinds: 1 x 3763 numeric array of MRx3 indices for MISS (duplicated in "Yao_Dependencies.mat" and "CellDensity_Yao2021_all.mat")
File: CCF_labels.mat
Description: Contains the description of the CCFv2 parcellation, where regions are in the same order as the regional Yao cell-type densities.
Variables
- CCF_labels: 426 x 4 cell array; column 1 contains the names of the 426 regions, column 2 contains the major anatomical structure to which that region belongs, column 3 specifies whether the region is in the gray or white matter (note: all are gray-matter regions), column 4 indicates which hemisphere the region belongs to
Code/software
A stable version of all code for generating the analyses and figures in this study are located in the provided CellTypeVulnerability.zip file and in the following repository: https://github.com/Raj-Lab-UCSF/CellTypeVulnerability. Running the Results_Script_CTVulnerability_clean.m file, after placing the large (> 1GB) files contained in this database in a folder outside of the repository (see line 5, 'datapath' definition) will produce the figures in the paper programmatically. For all 3D glass brain visualizations, the Brainframe package (https://github.com/Raj-Lab-UCSF/Brainframe) has to be downloaded and placed outside of the main repository.
Inference of cell-type densities was performed using the MISS package (https://github.com/Raj-Lab-UCSF/MISS-Pipeline), specifically using the Wrapper_MISS_Yao2021_.m files contained within that repository.
For any questions, please contact Justin Torok (torok.justin.l@gmail.com) or Ashish Raj (ashish.raj@ucsf.edu)
Access information
Data was derived from the following sources:
- scRNAseq: https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-smart-seq
- AGEA: https://mouse.brain-map.org/agea
- Tauopathy:
- Boluda et al, 2015 (DOI: 10.1007/s00401-014-1373-0)
- Hurtado et al, 2010 (DOI: 10.2353/ajpath.2010.100346)
- Iba et al, 2013 (DOI: 10.1523/JNEUROSCI.2642-12.2013)
- Iba et al, 2015 (DOI: 10.1007/s00401-015-1458-4)
- Kaufman et al, 2016 (DOI: 10.1016/j.neuron.2016.09.055)
Methods
Gene expression
The scRNAseq data used to generate the cell-type maps come from Yao, et al. for the Allen Institute for Brain Science (AIBS), which sequenced approximately 1.3 million individual cells sampled comprehensively throughout the neocortex and hippocampal formation at 10x sequencing depth (Yao et al, 2021, Cell). Using a standard Jaccard-Louvain clustering algorithm, the authors jointly and hierarchically clustered these samples at three taxonomic levels: class (n = 4), subclass (n = 42), and cluster (n = 387). The full annotation and gene expression profile of each sample, as well as trimmed mean expression across cell-type clusters, are publicly available (https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-whole-cortex-and-hippocampus-10x). Here we used this trimmed means by cluster dataset, as the Matrix Inversion and Subset Selection (MISS) algorithm only requires the consensus profiles of cell types per cluster. Utilizing the hierarchical taxonomy provided by the authors as described above, we grouped the 387 individual clusters into subclasses as we have done previously\cite{Mezias2022}, resulting in 42 unique neuronal and non-neuronal cell types spanning four major classes: cortical glutamatergic, hippocampal glutamatergic, GABAergic, and non-neuronal (herein referred to as the Yao cell types).
The spatial gene expression data come from the coronal series of the in situ hybridization (ISH)-based Allen Gene Expression Atlas (AGEA) (Lein et al, 2007, Nature). While the sagittal atlas has better gene coverage, we chose to use the coronal atlas because of its superior spatial coverage, which provides an isotropic resolution of 200 um per voxel. Furthermore, MISS uses a feature selection algorithm to remove uninformative and noisy genes, partly mitigating the effect of the reduced gene coverage. We performed unweighted averaging on genes for which multiple probes were available, resulting in a dataset of 4083 unique genes. Lastly, we removed the 320 genes that were not present in both the scRNAseq and ISH datasets, resulting in a final set of 3763 genes.
Tauopathy experiments
We queried five studies to obtain twelve individual mouse tauopathy datasets (which we refer to interchangeably as "experiments"):
-
BoludaCBD and BoludaDSAD (Boluda et al, 2015, Acta Neuropathol.)
-
DS4, DS6, DS7, DS9, DS6 110, DS9 110 (Kaufman et al, 2016, Neuron)
-
Hurtado (Hurtado et al, 2010, Am J Pathol.)
-
IbaHippInj and IbaStrInj (Iba et al, 2013, J Neurosci.)
-
IbaP301S (Iba et al, 2015, Acta Neuropathol.)
We selected these studies for their spatial coverage (>40 regions quantified across both hemispheres) and the fact that they all utilized the same mouse tauopathy model (PS19), which contains a P301S tau transgene on a C57BL/6 background. The only exception is the Hurtado experiment, which contained an additional mutation in the amyloid precursor protein (APP) gene.
Alzheimer's disease risk gene selection
We selected our 24 AD risk genes by finding the intersection set between the list given by the Alzheimer's Disease Sequencing Project (ADSP) (Bellenguez et al, 2022, Nature Genetics; Kunkle et al, 2019, Nature Genetics) and the AGEA (Lein et al, 2007, Nature)
Gene annotations were obtained from the UniProt database (Bateman et al, 2023, Nucleic Acids Res.) unless otherwise noted.
Matrix Inversion and Subset Selection (MISS)
We applied the MISS algorithm to the Yao, et al. scRNAseq dataset (Yao et al, 2021, Cell) and the AGEA ISH dataset (Lein et al, 2007, Nature) as described previously (Mezias et al, 2022, PNAS).
Briefly, MISS involves two steps: 1) subset selection, which utilizes a feature selection algorithm to remove low-information genes that add noise to the final prediction of cell-type density; and 2) matrix inversion, where the gene-subset spatial ISH-based gene expression matrix is regressed on the gene-subset scRNAseq-based gene expression matrix voxel-by-voxel to obtain cell-type densities.