Single-cell spatial transcriptomics and proteomics of APOE Christchurch in 5xFAD and PS19 mice

Tran, Kristine1 ; Kwang, Nellie 1 ; Green, Kim1

Published Jan 24, 2025 on Dryad. https://doi.org/10.5061/dryad.m63xsj4ck

Data files

Jan 24, 2025 version files 9.95 GB

5xApoeCh_Protein_annotated_seurat.rds

1 GB
5xApoeCh_RNA_annotated.rds

4.56 GB
DGE_files.zip

6.02 MB
mouse_signature_matrix-updated.csv

754 B
PS19ApoeCh_Protein_annotated_seurat.rds

866.33 MB
PS19ApoeCh_RNA_annotated.rds

3.52 GB
README.md

7.10 KB

Abstract

This collection of datasets comprises results from four single-cell spatial experiments conducted on mouse brains: two spatial transcriptomics experiments and two spatial proteomics experiments. These experiments were performed using the Bruker Nanostring CosMx technology on 10µm coronal brain sections from the following mouse models: (1) 14-month-old male 5xFAD;ApoeCh mice and genotype controls, and (2) 9-month-old PS19;ApoeCh mice and genotype controls. Each dataset is provided as an RDS file which includes raw and corrected counts for the RNA data and mean fluorescent intensity for the protein data, along with comprehensive metadata. Metadata includes mouse genotype, sample ID, cell type annotations, sex (for PS19;ApoeCh dataset), and X-Y coordinates of each cell. Results from differential gene expression analysis for each cell type between genotypes using MAST are also included as .csv files.

https://doi.org/10.5061/dryad.m63xsj4ck

Description of the data and file structure

We have submitted all processed RDS files (5xApoeCh_Protein_annotated_seurat.rds, PS19ApoeCh_Protein_annotated_seurat.rds, 5xApoeCh_RNA_annotated_seurat.rds, PS19ApoeCh_RNA_annotated_seurat.rds) analyzed using the R package Seurat. Sample metadata are stored in seurat@meta.data and organized in the same way for the 5xFAD and PS19 cohorts. For spatial proteomics, we have included the .csv files containing the parameters used to perform automated cell typing with the CELESTA algorithm (mouse_signature_matrix.csv, mouse_tuning_params.csv). Finally, we have submitted a .zip file containing the outputs from differential gene expression analysis (DGE_files.zip).

Files and variables

Single-cell spatial proteomics datasets

5xApoeCh_Protein_annotated_seurat.rds, PS19ApoeCh_Protein_annotated_seurat.rds

Rownames of metadata (accessed using rownames(seurat@meta.data)) contain unique identifiers for each single cell, formatted as c_[slide][fov][cell]. Additional relevant metadata columns are described below:

fov: Field Of View (FOV) the cell is in
Area: Number of pixels assigned to a given cell
AspectRatio: Width divided by height
x_FOV_px: x position of the cell center within the FOV, measured in pixels
y_FOV_px: y position of the cell center within the FOV, measured in pixels
Width: Cell’s maximum length in x dimension (pixels)
Height: Cell’s maximum length in y dimension (pixels)
Mean.DAPI: Mean fluorescence intensity within a given cell (AU)
Max.DAPI: Max fluorescence intensity within a given cell (AU)
Run_Tissue_name: Flowcell name
slide_ID_numeric: SlideID
x_slide_mm: x position of the cell center within the slide, measured in mm
y_slide_mm: y position of the cell center within the slide, measured in mm
nCount_RNA: Mean fluorescence intensity ("RNA" is a misnomer)
nFeature_RNA: Number of unique proteins detected ("RNA" is a misnomer)
nCount_negprobes: Number of Negative counts
nFeature_negprobes: Number of unique Negative targets
Area.um2: Area of cell (um^2)
celesta_R1: Cell type annotations from round 1 of CELESTA cell typing
celesta_R2: Cell type annotations from round 2 of CELESTA cell typing (All microglia are further classified into DAM or homeostatic)
celesta_final: Final annotations from CELESTA
celesta_broad: Broad cell type annotations (e.g., all neurons grouped together)
celesta_cell_type_n: Number associated with cell type (from mouse_signature_matrix.csv)

mouse_signature_matrix-updated.csv

User-defined cell-type signature matrix.

(1) The first column has to contain the cell types to be inferred

(2) The second column has the lineage information for each cell type. The lineage information has three numbers connected by “_” (underscore). The first number indicates round. Cell types with the same lineage level are inferred at the same round. The increasing number indicates increased cell-type resolution. Here, All_Microglia -> DAM and Microglia (homeostatic)

(3) Starting from column three, each column is a protein marker. If the protein marker is known to be expressed for that cell type, then it is denoted by “1”. If the protein marker is known to not express for a cell type, then it is denoted by “0”. If the protein marker is irrelevant or uncertain to express for a cell type, then it is left blank. For example, CD11c is expressed in some but not all microglia, so is left blank for All_Microglia.

Single-cell spatial transcriptomics datasets

5xApoeCh_RNA_annotated_seurat.rds, PS19ApoeCh_RNA_annotated_seurat.rds

Rownames of metadata (accessed using rownames(seurat@meta.data)) contain unique identifiers for each single cell, formatted as c_[slide][fov][cell]. Additional metadata columns are described below:

fov: Field Of View (FOV) the cell is in
Area: Number of pixels assigned to a given cell
AspectRatio: Width divided by height
x_FOV_px: x position of the cell center within the FOV, measured in pixels
y_FOV_px: y position of the cell center within the FOV, measured in pixels
Width: Cell’s maximum length in x dimension (pixels)
Height: Cell’s maximum length in y dimension (pixels)
Mean.DAPI: Mean fluorescence intensity within a given cell (AU)
Max.DAPI: Max fluorescence intensity within a given cell (AU)
Run_Tissue_name: Flowcell name
slide_ID_numeric: SlideID
x_slide_mm: x position of the cell center within the slide, measured in mm
y_slide_mm: y position of the cell center within the slide, measured in mm
nCount_RNA: Number of RNA counts
nFeature_RNA: Number of unique RNA targets
nCount_negprobes: Number of Negative counts
nFeature_negprobes: Number of unique Negative targets
Area.um2: Area of cell (um^2)
annotation: Manual cell type annotation based on marker genes and location in space
annotation_broad: Broad cell type annotation (e.g., all excitatory neurons grouped together)
group: Genotype name
sample_n: Sample name
sex: Sample sex

DGE_files.zip

The .zip file contains output from differential gene expression analysis with the MAST test (https://github.com/RGLab/MAST) using the FindMarkers() function. Files are organized by mouse data (single-cell spatial transcriptomics) and human data (snRNA-seq). Each filename contains metadata on the specific comparison and the cell type that was analyzed (eg. "5xApoeCh_vs_5xFAD_All_microglia.csv" contains the DGE results from the 5xFAD;ApoeCh vs. 5xFAD comparison for all microglia grouped together, including homeostatic and disease-associated microglia). Column names in each .csv file are described below:

p_val: Raw p-value to determine if there is a significant difference between ident1 and ident2.
avg_logFC: log fold-change of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group. If the slot is scale.data or a reduction is specified, the average difference is returned instead of the log fold change, and the column is named "avg_diff".
pct.1: The percentage of cells where the gene is detected in the first group
pct.2: The percentage of cells where the gene is detected in the second group
p_val_adj: Adjusted p-value, based on Bonferroni correction using all genes in the dataset
gene: Names of the genes analyzed in the DGE analysis
group: Specific cell type being analyzed in the DGE analysis (should match the file name)
ident1: Represents the first genotype in the analysis. The direction of avg_logFC or avg_diff values indicates expression in this genotype relative to ident2
ident2: Represents the second genotype in the analysis, serving as the comparison baseline for ident1

Sample preparation: Isopentane fresh-frozen brain hemispheres were embedded in optimal cutting temperature (OCT) compound (Tissue-Tek, Sakura Fintek, Torrance, CA), and 10µm thick coronal sections were prepared using a cryostat (CM1950, LeicaBiosystems, Deer Park, IL). Six hemibrains were mounted onto each VWR Superfrost Plus microscope slide (Avantor, 48311-703) and kept at -80°C until fixation. For both 5xFAD (14 months old, males) and PS19 (9 months old, females and 1 male ApoeCh) models, n=3 mice per genotype except for n=2 for PS19;ApoeCh (wild-type, ApoeCh HO, 5xFAD HEMI or PS19 HEMI, and 5xFAD HEMI; ApoeCh HO or PS19 HEMI;ApoeCh HO) were used for transcriptomics and proteomics. The same mice were used for both transcriptomics and proteomics. Tissues were processed according to the Nanostring CosMx fresh-frozen slide preparation manual for RNA and protein assays (NanoString University).

Data processing: Spatial transcriptomics datasets were filtered using the AtoMxRNA Quality Control module to flag outlier negative probes (control probes targeting non-existent sequences to quantify non-specific hybridization), lowly-expressing cells, FOVs, and target genes. Datasets were then normalized and scaled using Seurat 5.0.1 SCTransform to account for differences in library size across cell types [31]. Principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) analysis were performed to reduce dimensionality and visualize clusters in space. Unsupervised clustering at 1.0 resolution yielded 33 clusters for the 5xFAD dataset and 40 clusters for the PS19 dataset. Clusters were manually annotated based on gene expression and spatial location.

Spatial proteomics data were filtered using the AtoMx Protein Quality Control module to flag unreliable cells based on segmented cell area, negative probe expression, and overly high/low protein expression. Mean fluorescence intensity data were hyperbolic arcsine transformed with the AtoMx Protein Normalization module. Cell types were automatically annotated based on marker gene expression using the CELESTA algorithm.