Microglial replacement in a Sandhoff disease mouse model reveals myeloid-derived β- hexosaminidase is necessary for neuronal health

Tsourmas, Kate 1 ; Kwang, Nellie1; Green, Kim1

Published Aug 21, 2025 on Dryad. https://doi.org/10.5061/dryad.3tx95x6rq

Data files

Aug 21, 2025 version files 8.07 GB

HexbKO_full_with_allenref_labels_public.rds

4.18 GB
HexbKO_protein_seurat_w_group_assignments_public.rds

1.82 GB
HexbKO_wt_vs_con_with_allenref_labels_public.rds

2.08 GB
mouse_signature_matrix.csv

633 B
README.md

7.37 KB

Abstract

Lysosomal storage disorders (LSDs) are a large disease class involving lysosomal dysfunction, often resulting in neurodegeneration. Sandhoff disease (SD) is an LSD caused by a deficiency in the β subunit of the β-hexosaminidase enzyme (Hexb). Although Hexb expression in the brain is specific to microglia, SD primarily affects neurons. To investigate how a microglial gene is involved in neuronal homeostasis, here we show that β-hexosaminidase is secreted by microglia and integrated into the lysosomal compartment of neurons. To assess therapeutic relevance, we treat the Hexb^-/- SD mouse model with bone marrow transplant and colony-stimulating factor 1 receptor inhibition, which broadly replaces Hexb^-/- microglia with Hexb-sufficient cells. Microglial replacement reverses apoptotic gene signatures, improves behavior, restores β-hexosaminidase enzymatic activity and Hexb expression, prevents substrate buildup, and normalizes neuronal lysosomal phenotypes, underscoring the critical role of myeloid-derived β-hexosaminidase in maintaining neuronal health and establishing microglial replacement as a potential LSD therapy.

https://doi.org/10.5061/dryad.3tx95x6rq

Description of the data and file structure

These data were collected to assess differences between experimental treatment strategies in the Sandhoff disease model Hexb knockout mice using CosMx spatial transcriptomic and proteomic analyses. One spatial transcriptomics dataset is from only wildtype control and Hexb knockout control mice. One dataset is a spatial transcriptomics experiment performed on wildtype control, Hexb knockout control, Hexb knockout treated with bone marrow transplant (BMT), and Hexb knockout treated with BMT and colony-stimulating factor 1 receptor (CSF1R) inhibition, which results in broad replacement of microglia with bone marrow-derived macrophages/monocytes. The final dataset is a spatial proteomics experiment performed on these same four groups (WT control, Hexb knockout control, Hexb knockout BMT, and Hexb knockout BMT + CSF1R inhibition).

We have submitted all processed RDS files (HexbKO_wt_vs_con_with_allenref_labels_public.rds, HexbKO_full_with_allenref_labels_public.rds, HexbKO_protein_seurat_w_group_assignments.rds), analyzed using the R package Seurat. Sample metadata are stored in seurat@meta.data. For spatial proteomics, we have included the .csv files containing the parameters used to perform automated cell typing with the CELESTA algorithm (mouse_signature_matrix.csv, mouse_tuning_params.csv).

Files and variables

Single-cell spatial transcriptomics datasets

Files: HexbKO_wt_vs_con_with_allenref_labels_public.rds, HexbKO_full_with_allenref_labels_public.rds

Row names of metadata (accessed using rownames(seurat@meta.data)) contain unique identifiers for each single cell, formatted as c_[slide][fov][cell]. Additional metadata columns are described below:

fov: Field Of View (FOV) the cell is in
Area: Number of pixels assigned to a given cell
AspectRatio: Width divided by height
Width: Cell’s maximum length in x dimension (pixels)
Height: Cell’s maximum length in y dimension (pixels)
Mean.DAPI, Mean.Histone, Mean.G, Mean.GFAP: Mean fluorescence intensity within a given cell (AU)
Max.DAPI, Max.Histone, Max.G, Max.GFAP: Max fluorescence intensity within a given cell (AU)
Run_Tissue_name: Flowcell name
x_FOV_px: x position of the cell center within the FOV, measured in pixels
y_FOV_px: y position of the cell center within the FOV, measured in pixels
slide_ID_numeric: SlideID
cell_ID: cell identification number, formatted as c_[slide][fov][cell]
x_slide_mm: x position of the cell center within the slide, measured in mm
y_slide_mm: y position of the cell center within the slide, measured in mm
nCount_RNA: Number of RNA counts
nFeature_RNA: Number of unique RNA targets
nCount_negprobes: Number of Negative counts
nFeature_negprobes: Number of unique Negative targets
Area.um2: Area of cell (um^2)
nCount_SCT: SCT-normalized number of unique RNA transcripts per cell
nFeature_SCT: SCT-normalized number of features (genes) per cell
SCT_snn_res.1: Clustering from FindMarkers() at resolution = 1.0
seurat_clusters: Final clusters from FindMarkers() (which were used to annotate)
predicted.celltype: label annotations from an Allen Brain Atlas single-cell RNA-seq reference dataset for cortex and hippocampus (https://doi.org/10.1016/j.cell.2021.04.021), predicting cell identity
predicted.celltype.score: estimated accuracy of Allen Brain cell type prediction
manual_clusters: manual cell type annotation based on Allen Brain cell type prediction, marker genes, and location in space
manual_clusters_collapsed: combination of manual cell type annotation into more broad cell types while maintaining individual cell identities (i.e., VLMC 1, 2, etc -> VLMC)
collapsed_simple: simplification of manual cell type annotation into broad cell types (i.e., VLMC -> Vascular; Pvalb, L2_3 IT CTX -> Inhibitory Neuron)
group: Genotype name
sample_n: Sample name

Single-cell spatial proteomics dataset

File: HexbKO_protein_seurat_w_group_assignments_public.rds

Row names of metadata (accessed using row names(seurat@meta.data)) contain unique identifiers for each single cell, formatted as c_[slide][fov][cell]. Additional relevant metadata columns are described below:

fov: Field Of View (FOV) the cell is in
Area: Number of pixels assigned to a given cell
AspectRatio: Width divided by height
x_FOV_px: x position of the cell center within the FOV, measured in pixels
y_FOV_px: y position of the cell center within the FOV, measured in pixels
Width: Cell’s maximum length in x dimension (pixels)
Height: Cell’s maximum length in y dimension (pixels)
Mean.DAPI: Mean fluorescence intensity within a given cell (AU)
Max.DAPI: Max fluorescence intensity within a given cell (AU)
Run_Tissue_name: Flowcell name
slide_ID_numeric: SlideID
x_slide_mm: x position of the cell center within the slide, measured in mm
y_slide_mm: y position of the cell center within the slide, measured in mm
nCount_RNA: Mean fluorescence intensity ("RNA" is a misnomer)
nFeature_RNA: Number of unique proteins detected ("RNA" is a misnomer)
nCount_negprobes: Number of Negative counts
nFeature_negprobes: Number of unique Negative targets
Area.um2: Area of cell (um^2)
celesta_R1: Cell type annotations from round 1 of CELESTA cell typing
celesta_R2: Cell type annotations from round 2 of CELESTA cell typing (All microglia are further classified into DAM or homeostatic)
celesta_final: Final annotations from CELESTA
celesta_broad: Broad cell type annotations (e.g., all neurons grouped together)
celesta_cell_type_n: Number associated with cell type (from mouse_signature_matrix.csv)
sex: Sample sex
group: Genotype name
sample_n: Sample name

mouse_signature_matrix.csv

User-defined cell-type signature matrix.

(1) The first column has to contain the cell types to be inferred

(2) The second column has the lineage information for each cell type. The lineage information has three numbers connected by “_” (underscore). The first number indicates the round. Cell types with the same lineage level are inferred at the same round. The increasing number indicates increased cell-type resolution.

(3) Starting from column three, each column is a protein marker. If the protein marker is known to be expressed for that cell type, then it is denoted by “1”. If the protein marker is known not to be expressed for a cell type, then it is denoted by “0”. If the protein marker is irrelevant or uncertain to express for a cell type, then it is left blank. For example, CD11c is expressed in some but not all microglia, so it is left blank for All_Microglia.

Code/software

This data was produced using the RNA Quality Control Module from Nanostring's AtomX software and analyzed and annotated in R using the following open-source packages: Seurat 5.0.1 SCTransform, ggplot2, and MAST (Model-based Analysis of Single-cell Transcriptomes).

Spatial transcriptomic & proteomic analysis

Section preparation: One day prior to experiment, PFA-fixed brain hemispheres were embedded in optimal cutting temperature (OCT) compound (Tissue-Tek, Sakura Fintek, Torrance, CA), and 10 μm sagittal sections were cut using a cryostat (CM1950, LeicaBiosystems, Deer Park, IL). Six hemibrains were mounted onto VWR Superfrost Plus slides (Avantor, 48311–703) and kept at −80°C overnight. For Hexb^-/- BMT groups and the WT control group, n=3 mice per experimental condition were utilized (wild-type control, Hexb^-/- control, Hexb^-/- BMT, Hexb^-/- BMT + CSF1Ri) for transcriptomics and proteomics. When selecting representative brains, we considered BMDM infiltration levels from both Hexb^-/- BMT groups, choosing brains with similar total forebrain GFP^+^ staining to group averages. Tissue was processed in accordance with the Nanostring CosMx fresh-frozen slide preparation manual for RNA and protein assays (NanoString University).

Slide treatment, RNA, day 1:** Slides were removed from -80°C and baked at 60°C for 30 min. Slides were then processed for CosMx: three 1X washes PBS for 5 minutes each, 4% sodium dodecyl sulfate (SDS; CAT#AM9822) for 2 minutes, three 1X PBS washes for 5 minutes each, 50% ethanol for 5 minutes, 70% ethanol for 5 minutes, and two washes with 100% ethanol for 5 minutes each before allowing slides to air dry for 10 minutes at room temperature. Antigen retrieval was performed using a pressure cooker maintained at 100°C for 15 min in preheated 1X CosMx Target Retrieval Solution (Nanostring, Seattle, WA). Slides were then transferred to DEPC-treated water (CAT#AM9922) and washed for 15 seconds, incubated in 100% ethanol for 3 minutes, and air dried at room temperature for 30 minutes. Slides were incubated with digestion buffer (3 μg/mL Proteinase K in 1X PBS; Nanostring) for tissue permeabilization, then washed 2 times in 1X PBS for 5 minutes. Fiducials for imaging were diluted to 0.00015% in 2X SSC-T and incubated on slides for 5 minutes. Following fiducial treatment, slides were protected from light at all times. Tissues were then post-fixed with 10% neutral buffered formalin (NBF; CAT#15740) for 1 minute, washed twice with NBF Stop Buffer (0.1M Tris-Glycine Buffer, CAT#15740) for 5 minutes each, and washed with 1x PBS for 5 minutes. Next, NHS-Acetate (100 mM; CAT#26777) mixture was applied to each slide and incubated at room temperature for 15 minutes. Slides were washed twice with 2X SSC for 5 minutes each. Slides were then incubated for 16–18 hours in a hybridization oven at 37°C with a modified 1000-plex Mouse Neuroscience RNA panel (Nanostring) for in situ hybridization with the addition of an rRNA segmentation marker.

Slide treatment, RNA, day 2: Following in situ hybridization, slides were washed twice in pre-heated stringent wash solution (50% deionized formamide [CAT#AM9342], 2X saline-sodium citrate [SSC; CAT#AM9763]) at 37°C for 25 minutes each, then washed twice in 2X SSC for 2 minutes each. Slides were then incubated with DAPI nuclear stain for 15 minutes, washed with 1X PBS for 5 minutes, incubated with GFAP and histone cell segmentation markers for 1 hour, and washed three times in 1X PBS for 5 minutes each. Flow cells were adhered to each slide to create a fluidic chamber for spatial imaging. Slides were loaded into and processed automatically with the CosMx instrument. Approximately 300 fields of view (FOVs) were selected on each slide, capturing hippocampal, corpus callosum, upper thalamic, upper caudate, and cortical regions for each section. Slides were imaged for approximately 7 days and data were automatically uploaded to the Nanostring AtoMx online platform. Pipeline pre-processed data was exported as a Seurat object for analysis with R 4.3.1 software.

Side treatment, protein, day 1: Slides were removed from -80°C and baked at 60°C for 30 min, then washed three times with 1X Tris Buffered Saline with Tween (TBS-T; CAT#J77500.K2) for 5 minutes each. Antigen retrieval was performed using a pressure cooker held at 80°C in pre-heated Tris-EDTA buffer (10 mM Tris Base [CAT#10708976001], 1 mM EDTA solution, 0.05% Tween 20, pH 9.0) for 7 minutes. Following antigen retrieval, slides were allowed to cool to room temperature for 5 minutes, then washed three times in 1X TBS-T for 5 minutes each. Slides were incubated with Buffer W (Nanostring) for 1 hour at room temperature. Slides were then incubated for 16-18 hours at 4°C with the CosMx 64-plex protein panel and segmentation markers (GFAP, IBA1, NEUN, and S6).

Side treatment, protein, day 2: Following incubation, slides were washed three times with 1X TBS-T for 10 minutes each, then washed with 1X PBS for 2 minutes. Fiducials for imaging were diluted to 0.00005% in 1X TBS-T and incubated on the slide for 5 minutes. Slides were the washed with 1X PBS for 5 minutes, incubated in 4% PFA for 15 minutes, and washed three times with 1X PBS for 5 minutes each. Slides were incubated with DAPI nuclear stain for 10 minutes, then washed twice with 1X PBS for 5 minutes each. Slides were then incubated with 100 mM NHS-Acetate for 15 minutes and washed with 1X PBS for 5 minutes. Flow cells were adhered to each slide to create a fluidic chamber for spatial imaging. Slides were loaded into and processed automatically with the CosMx instrument. Approximately 600 FOVs were selected per slide, capturing each full section. Slides were imaged for ~6 days data were automatically uploaded to the Nanostring AtoMx online platform. Pipeline pre-processed data was exported as a Seurat object for analysis with R 4.3.1 software.

Spatial transcriptomics data analysis: Spatial transcriptomics datasets were processed as previously described (Tran et al., Mol Neurodegen 2025). Principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) analysis were performed to reduce the dimensionality of the dataset and visualize clusters in space. Unsupervised clustering at 1.0 resolution yielded 39 clusters for the WT control versus Hexb^-/- control dataset and 38 clusters for the dataset which included WT controls, Hexb^-/- controls, Hexb^-/- BMT, and Hexb^-/- BMT + CSF1Ri. Clusters were annotated with a combination of automated and manual approaches: 1) label annotations from the Allen Brain Atlas single-cell RNA-seq reference dataset (for cortex and hippocampus) were projected onto our spatial transcriptomics dataset (Yao et al., *Cell *2021) and 2) cluster identities were further refined via manual annotation based on gene expression of known marker genes and location in XY space. Cell proportion plots were generated by first plotting the number of cells in each broad cell type, then scaling to 1. normalized percentages for each group, calculated by dividing the number of cells in a given cell type-group pair by the total number of cells in that group, and 2. dividing by the sum of the proportions across the cell type to account for differences in sample sizes. Differential gene expression analysis per cell type between groups was performed on scaled expression data using MAST to calculate the average difference (Finak et al., Genome Biol 2015), defined as the difference in log-scaled average expression between the two groups for each broad cell type. DEG scores were calculated between group pairs for each subcluster by summing the absolute log₂ fold change values of all genes with statistically significant gene (i.e., p_adj < 0.05) differential expression patterns between two groups. Data visualizations were generated using ggplot2 3.4.4 (Wickham 2016).