Single-cell spatial transcriptomics and proteomics of APOE Christchurch in 5xFAD and PS19 mice
Data files
Jan 24, 2025 version files 9.95 GB
-
5xApoeCh_Protein_annotated_seurat.rds
1 GB
-
5xApoeCh_RNA_annotated.rds
4.56 GB
-
DGE_files.zip
6.02 MB
-
mouse_signature_matrix-updated.csv
754 B
-
PS19ApoeCh_Protein_annotated_seurat.rds
866.33 MB
-
PS19ApoeCh_RNA_annotated.rds
3.52 GB
-
README.md
7.10 KB
Abstract
This collection of datasets comprises results from four single-cell spatial experiments conducted on mouse brains: two spatial transcriptomics experiments and two spatial proteomics experiments. These experiments were performed using the Bruker Nanostring CosMx technology on 10µm coronal brain sections from the following mouse models: (1) 14-month-old male 5xFAD;ApoeCh mice and genotype controls, and (2) 9-month-old PS19;ApoeCh mice and genotype controls. Each dataset is provided as an RDS file which includes raw and corrected counts for the RNA data and mean fluorescent intensity for the protein data, along with comprehensive metadata. Metadata includes mouse genotype, sample ID, cell type annotations, sex (for PS19;ApoeCh dataset), and X-Y coordinates of each cell. Results from differential gene expression analysis for each cell type between genotypes using MAST are also included as .csv files.
README: Single-cell spatial transcriptomics and proteomics of APOE Christchurch in 5xFAD and PS19 mice
https://doi.org/10.5061/dryad.m63xsj4ck
Description of the data and file structure
We have submitted all processed RDS files (5xApoeCh_Protein_annotated_seurat.rds, PS19ApoeCh_Protein_annotated_seurat.rds, 5xApoeCh_RNA_annotated_seurat.rds, PS19ApoeCh_RNA_annotated_seurat.rds) analyzed using the R package Seurat. Sample metadata are stored in seurat@meta.data and organized in the same way for the 5xFAD and PS19 cohorts. For spatial proteomics, we have included the .csv files containing the parameters used to perform automated cell typing with the CELESTA algorithm (mouse_signature_matrix.csv, mouse_tuning_params.csv). Finally, we have submitted a .zip file containing the outputs from differential gene expression analysis (DGE_files.zip).
Files and variables
Single-cell spatial proteomics datasets
Rownames of metadata (accessed using rownames(seurat@meta.data)) contain unique identifiers for each single cell, formatted as c_[slide][fov][cell]. Additional relevant metadata columns are described below:
- fov: Field Of View (FOV) the cell is in
- Area: Number of pixels assigned to a given cell
- AspectRatio: Width divided by height
- x_FOV_px: x position of the cell center within the FOV, measured in pixels
- y_FOV_px: y position of the cell center within the FOV, measured in pixels
- Width: Cell’s maximum length in x dimension (pixels)
- Height: Cell’s maximum length in y dimension (pixels)
- Mean.DAPI: Mean fluorescence intensity within a given cell (AU)
- Max.DAPI: Max fluorescence intensity within a given cell (AU)
- Run_Tissue_name: Flowcell name
- slide_ID_numeric: SlideID
- x_slide_mm: x position of the cell center within the slide, measured in mm
- y_slide_mm: y position of the cell center within the slide, measured in mm
- nCount_RNA: Mean fluorescence intensity ("RNA" is a misnomer)
- nFeature_RNA: Number of unique proteins detected ("RNA" is a misnomer)
- nCount_negprobes: Number of Negative counts
- nFeature_negprobes: Number of unique Negative targets
- Area.um2: Area of cell (um^2)
- celesta_R1: Cell type annotations from round 1 of CELESTA cell typing
- celesta_R2: Cell type annotations from round 2 of CELESTA cell typing (All microglia are further classified into DAM or homeostatic)
- celesta_final: Final annotations from CELESTA
- celesta_broad: Broad cell type annotations (e.g., all neurons grouped together)
- celesta_cell_type_n: Number associated with cell type (from mouse_signature_matrix.csv)
mouse_signature_matrix-updated.csv
User-defined cell-type signature matrix.
(1) The first column has to contain the cell types to be inferred
(2) The second column has the lineage information for each cell type. The lineage information has three numbers connected by “_” (underscore). The first number indicates round. Cell types with the same lineage level are inferred at the same round. The increasing number indicates increased cell-type resolution. Here, All_Microglia -> DAM and Microglia (homeostatic)
(3) Starting from column three, each column is a protein marker. If the protein marker is known to be expressed for that cell type, then it is denoted by “1”. If the protein marker is known to not express for a cell type, then it is denoted by “0”. If the protein marker is irrelevant or uncertain to express for a cell type, then it is left blank. For example, CD11c is expressed in some but not all microglia, so is left blank for All_Microglia.
Single-cell spatial transcriptomics datasets
Rownames of metadata (accessed using rownames(seurat@meta.data)) contain unique identifiers for each single cell, formatted as c_[slide][fov][cell]. Additional metadata columns are described below:
- fov: Field Of View (FOV) the cell is in
- Area: Number of pixels assigned to a given cell
- AspectRatio: Width divided by height
- x_FOV_px: x position of the cell center within the FOV, measured in pixels
- y_FOV_px: y position of the cell center within the FOV, measured in pixels
- Width: Cell’s maximum length in x dimension (pixels)
- Height: Cell’s maximum length in y dimension (pixels)
- Mean.DAPI: Mean fluorescence intensity within a given cell (AU)
- Max.DAPI: Max fluorescence intensity within a given cell (AU)
- Run_Tissue_name: Flowcell name
- slide_ID_numeric: SlideID
- x_slide_mm: x position of the cell center within the slide, measured in mm
- y_slide_mm: y position of the cell center within the slide, measured in mm
- nCount_RNA: Number of RNA counts
- nFeature_RNA: Number of unique RNA targets
- nCount_negprobes: Number of Negative counts
- nFeature_negprobes: Number of unique Negative targets
- Area.um2: Area of cell (um^2)
- annotation: Manual cell type annotation based on marker genes and location in space
- annotation_broad: Broad cell type annotation (e.g., all excitatory neurons grouped together)
- group: Genotype name
- sample_n: Sample name
- sex: Sample sex
DGE_files.zip
The .zip file contains output from differential gene expression analysis with the MAST test (https://github.com/RGLab/MAST) using the FindMarkers() function. Files are organized by mouse data (single-cell spatial transcriptomics) and human data (snRNA-seq). Each filename contains metadata on the specific comparison and the cell type that was analyzed (eg. "5xApoeCh_vs_5xFAD_All_microglia.csv" contains the DGE results from the 5xFAD;ApoeCh vs. 5xFAD comparison for all microglia grouped together, including homeostatic and disease-associated microglia). Column names in each .csv file are described below:
- p_val: Raw p-value to determine if there is a significant difference between ident1 and ident2.
- avg_logFC: log fold-change of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group. If the slot is scale.data or a reduction is specified, the average difference is returned instead of the log fold change, and the column is named "avg_diff".
- pct.1: The percentage of cells where the gene is detected in the first group
- pct.2: The percentage of cells where the gene is detected in the second group
- p_val_adj: Adjusted p-value, based on Bonferroni correction using all genes in the dataset
- gene: Names of the genes analyzed in the DGE analysis
- group: Specific cell type being analyzed in the DGE analysis (should match the file name)
- ident1: Represents the first genotype in the analysis. The direction of avg_logFC or avg_diff values indicates expression in this genotype relative to ident2
- ident2: Represents the second genotype in the analysis, serving as the comparison baseline for ident1
Methods
Sample preparation: Isopentane fresh-frozen brain hemispheres were embedded in optimal cutting temperature (OCT) compound (Tissue-Tek, Sakura Fintek, Torrance, CA), and 10µm thick coronal sections were prepared using a cryostat (CM1950, LeicaBiosystems, Deer Park, IL). Six hemibrains were mounted onto each VWR Superfrost Plus microscope slide (Avantor, 48311-703) and kept at -80°C until fixation. For both 5xFAD (14 months old, males) and PS19 (9 months old, females and 1 male ApoeCh) models, n=3 mice per genotype except for n=2 for PS19;ApoeCh (wild-type, ApoeCh HO, 5xFAD HEMI or PS19 HEMI, and 5xFAD HEMI; ApoeCh HO or PS19 HEMI;ApoeCh HO) were used for transcriptomics and proteomics. The same mice were used for both transcriptomics and proteomics. Tissues were processed according to the Nanostring CosMx fresh-frozen slide preparation manual for RNA and protein assays (NanoString University).
Data processing: Spatial transcriptomics datasets were filtered using the AtoMx RNA Quality Control module to flag outlier negative probes (control probes targeting non-existent sequences to quantify non-specific hybridization), lowly-expressing cells, FOVs, and target genes. Datasets were then normalized and scaled using Seurat 5.0.1 SCTransform to account for differences in library size across cell types [31]. Principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) analysis were performed to reduce dimensionality and visualize clusters in space. Unsupervised clustering at 1.0 resolution yielded 33 clusters for the 5xFAD dataset and 40 clusters for the PS19 dataset. Clusters were manually annotated based on gene expression and spatial location.
Spatial proteomics data were filtered using the AtoMx Protein Quality Control module to flag unreliable cells based on segmented cell area, negative probe expression, and overly high/low protein expression. Mean fluorescence intensity data were hyperbolic arcsine transformed with the AtoMx Protein Normalization module. Cell types were automatically annotated based on marker gene expression using the CELESTA algorithm.