Transcriptomic analysis of skeletal muscle regeneration across mouse lifespan identifies altered stem cell states
Data files
Nov 27, 2024 version files 27.37 GB
-
Aged_SkM_mm10_v1-1.RData
13.46 GB
-
Geriatric_TA_ST_5dpi_pp.h5ad
4.31 GB
-
Myo_Aged_SkM_mm10_v1-1.RData
1.20 GB
-
README.md
2.49 KB
-
Young_TA_ST_5dpi_pp.h5ad
8.40 GB
Abstract
Skeletal muscle regeneration relies on the orchestrated interaction of myogenic and non-myogenic cells with spatial and temporal coordination. The regenerative capacity of skeletal muscle declines with aging due to alterations in myogenic stem/progenitor cell states and functions, non-myogenic cell contributions, and systemic changes, all of which accrue with age. A holistic network-level view of the cell-intrinsic and -extrinsic changes influencing muscle stem/progenitor cell contributions to muscle regeneration across the lifespan remains poorly resolved. To provide a comprehensive atlas of regenerative muscle cell states across mouse lifespan, we collected a compendium of 273,923 single-cell transcriptomes from hindlimb muscles of young, old, and geriatric (4-7, 20, and 26 months old, respectively) mice at six closely sampled time-points following myotoxin injury. We identified 29 muscle-resident cell types, eight of which exhibited accelerated or delayed dynamics in their abundances between age groups, including T and NK cells and multiple macrophage subtypes, suggesting that the age-related decline in muscle repair may arise from temporal miscoordination of the inflammatory response. We performed a pseudotime analysis of myogenic cells across the regeneration timespan and found age-specific myogenic stem/progenitor cell trajectories in old and geriatric muscles. Given the critical role that cellular senescence plays in limiting cell contributions in aged tissues, we built a series of tools to bioinformatically identify senescence in these single-cell data and assess their ability to identify senescence within key myogenic stages. By comparing single-cell senescence scores to co-expression of hallmark senescence genes Cdkn2a and Cdkn1a, we found that an experimentally derived gene list derived from a muscle foreign body response (FBR) fibrosis model accurately (receiver-operator curve AUC = 0.82-0.86) identified senescent-like myogenic cells across mouse ages, injury time-points, and cell-cycle states, in a manner comparable to curated gene-lists. Further, this scoring approach in both single-cell and spatial transcriptomic datasets pinpointed transitory senescent-like subsets within the myogenic stem/progenitor cell trajectory that are associated with stalled MuSC self-renewal states across all ages of mice. This new resource on mouse skeletal muscle aging provides a comprehensive portrait of the changing cellular states and interactions underlying skeletal muscle regeneration across the mouse lifespan.
-
Author Information
A. Principal Investigator Contact Information
Name: Benjamin Cosgrove
Institution: Cornell University
Email: bdc68@cornell.eduB. First Author Contact Information
Name: Lauren Walter
Institution: Cornell University
Email: ldw64@cornell.edu - Date of data collection:
scRNA-seq: Aug. 2020 - Dec. 2020
Slide-seq: Dec. 2023 - Jan. 2024 - Information about funding sources that supported the collection of the data:
A. National Institutes of Health - R01AG058630
B. National Institutes of Health - R01AR081449
C. National Institutes of Health - R01AI176681
D. National Institutes of Health - U54AG07977
E. National Institutes of Health - T32HD057854
F. National Institutes of Health - F30OD032097
G. National Institutes of Health - R01AI105265
H. National Institutes of Health - DP1AR076959
DATA & FILE OVERVIEW
-
File List:
A. Aged_SkM_mm10_v1-1.RData
i. Format: Seurat object, saved as an RData file
ii. Description: Fully processed Seurat object containing 65 individual samples and 273,923 cells. Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript.B. Myo_Aged_SkM_mm10_v1-1.RData
i. Format: Seurat object, saved as an RData file
ii. Description: Subset myogenic cells containing 27,234 cells. Details on processing can be found in the manuscript.C. Young_TA_ST_5dpi_pp.h5ad
i. Format: Scanpy object, saved as .h5ad file
ii. Description: Fully processed Scanpy object of the young spatial transcriptomics data. Includes spatial coordinates, cell types, and other metadata features. Details on processing can be found in the manuscript.D. Geriatric_TA_ST_5dpi_pp.h5ad
i. Format: Scanpy object, saved as .h5ad file
ii. Description: Fully processed Scanpy object of the geriatric spatial transcriptomics data. Includes spatial coordinates, cell types, and other metadata features. Details on processing can be found in the manuscript. -
People involved with sample collection:
Lauren Walter collected the additional young and geriatric scRNA-seq samples (GSE232106)
Ioannis Ntekas collected the Slide-seq samples(GSE266933)
Mouse muscle injury and single-cell isolation. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols (approval # 2014-0085), and experiments were performed in compliance with its institutional guidelines. Mice were maintained at 70-73°F on a 14/10-h light/dark with humidity mainly at 40%. Muscle injury was induced in young (4-7 months-old [mo]), old (20 mo), and geriatric (26 mo) C57BL/6J mice (Jackson Laboratory # 000664; NIA Aged Rodent Colonies) by injecting both tibialis anterior (TA) muscles with 10 µl of notexin (10 µg/ml; Latoxan, France). The mice were sacrificed, and TA muscles were collected at 0, 1, 2, 3.5, 5, and 7 days post-injury (dpi) with n = 3-4 biological replicates per sample. Each TA was processed independently to generate single-cell suspensions. At each time point, the young and old samples are biological replicates of TA muscles from distinct mice, and the geriatric samples are biological replicates of two TA muscles from each of the two mice. A mixture of male and female mice was used. See Supplemental Table 1 for additional details. Muscles were digested with 8 mg/ml Collagenase D (Roche, Basel, Switzerland) and 10 U/ml Dispase II (Roche, Basel, Switzerland) and then manually dissociated to generate cell suspensions. Myofiber debris was removed by filtering the cell suspensions through a 100 µm and then a 40 µm filter (Corning Cellgro # 431752 and # 431750). After filtration, erythrocytes were removed by incubating the cell suspension inan erythrocyte lysis buffer (IBI Scientific # 89135-030).
Single-cell RNA-sequencing library preparation. After digestion, the single-cell suspensions were washed and resuspended in 0.04% BSA in PBS at a concentration of 106 cells/ml. A hemocytometer was used to manually count the cells to determine the concentration of the suspension. Single-cell RNA-sequencing libraries were prepared using the Chromium Single Cell 3’ reagent kit v3 (10x Genomics, Pleasanton, CA) following the manufacturer’s protocol (10x Genomics: Resolving Biology to Advance Human Health, 2020). Cells were diluted into the Chromium Single Cell A Chip to yield a recovery of 6,000 single-cell transcriptomes with <5% doublet rate. Libraries were sequenced on the NextSeq 500 (Illumina, San Diego, CA) (Illumina | Sequencing and array-based solutions for genetic research, 2020). The sequencing data was aligned to the mouse reference genome (mm10) using CellRanger v5.0.0 (10x Genomics) (10x Genomics: Resolving Biology to Advance Human Health, 2020).
Preprocessing single-cell RNA-sequencing data. From the gene expression matrix, the downstream analysis was carried out in R (v3.6.1). First, the ambient RNA signal was removed using the default SoupX (v1.4.5) workflow (autoEstCounts and adjustCounts; github.com/constantAmateur/SoupX) (Young and Behjati, 2020). Samples were then preprocessed using the standard Seurat (v3.2.3) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP; github.com/satijalab/seurat) (Stuart et al., 2019). Cells with fewer than 200 genes, with fewer than 750 UMIs, and more than 25% of unique transcripts derived from mitochondrial genes were removed. After preprocessing, DoubletFinder (v2.0.3) was used to identify putative doublets in each dataset (McGinnis, Murrow, and Gartner, 2019). The estimated doublet rate was 5% according to the 10x Chromium handbook. The putative doublets were removed from each dataset. Next, the datasets were merged and then batch-corrected with Harmony (github.com/immunogenomics/harmony) (v1.0) (Korsunsky et al., 2019). Seurat was then used to process the integrated data. Dimensions accounting for 95% of the total variance were used to generate SNN graphs (FindNeighbors) and SNN clustering was performed (FindClusters). A clustering resolution of 0.8 was used resulting in 24 initial clusters.
Cell type annotation in single-cell RNA-sequencing data. Cell types were determined by expression of canonical genes. Each of the 24 initial clusters received a unique cell type annotation. The nine myeloid clusters were challenging to differentiate between, so these clusters were subset out (Subset) and re-clustered using a resolution of 0.5 (FindNeighbors, FindClusters) resulting in 15 initial clusters. More specific myeloid cell type annotations were assigned based on the expression of canonical myeloid genes. This did not help to clarify the monocyte and macrophage annotations, but it did help to identify more specific dendritic cell and T cell subtypes. These more specific annotations were transferred from the myeloid subset back to the complete integrated object based on the cell barcode.
Analysis of cell type dynamics. We generated a table with the number of cells from each sample (n = 65) in each cell type annotation (n = 29). We removed the erythrocytes from this analysis because they are not a native cell type in skeletal muscle. Next, for each sample, we calculated the percent of cells in each cell type annotation. The mean and standard deviation were calculated from each age and time point for every cell type. The solid line is the mean percentage of the given cell type, the ribbon is the standard deviation around the mean, and the points are the values from individual replicates. We evaluated whether there was a significant difference in the cell type dynamics over all six-time points using non-linear modeling. The dynamics for each cell type were fit to some non-linear equation (e.g., quadratic, cubic, quartic) independent and dependent on age. The type of equation used for each cell type was selected based on the confidence interval and significance (p < 0.05) of the leading coefficient. If the leading coefficient was significantly different from zero, it was concluded that the leading coefficient was needed. If the leading coefficient was not significantly different than zero, it was concluded that the leading coefficient was not needed, and the degree of the equation went down one. No modeling equation went below the second degree. The null hypothesis predicted that the coefficients of the non-linear equation were the same across the age groups while the alternative hypothesis predicted that the coefficients of the non-linear equation were different across the age groups. We conducted a One-Way ANOVA to see if the alternative hypothesis fits the data significantly better than the null hypothesis and we used FDR as the multiple comparison test correction (using the ANOVA and p.adjust (method = fdr) functions in R, respectively).
T cell exhaustion scoring. We grouped the three T cell populations (this includes Cd3e+ cycling and non-cycling T cells and Cd4+ T cells) and z-scored all genes. The T cell exhaustion score was calculated using a transfer-learning method developed by Cherry et al 2023 and a T cell exhaustion gene list from Bengsch et al 2018 (Bengsch et al., 2018; Cherry et al., 2023). The Mann-Whitney U-test was performed on the T cell exhaustion score between ages.
Senescence scoring. We tested two senescence-scoring methods along with fourteen senescence gene lists (Supplemental Table 2) to identify senescent-like cells within the scRNA-seq dataset. The Two-way Senescence Score (Sen Score) was calculated using a transfer-learning method developed by Cherry et al 2023 (Cherry et al., 2023). With this method, we tested fourteen gene lists: Stimulation Independent, Replicative, Oncogene, Ion-Radiation Induced, Senescence-Associated Core, Aged-Chondrocyte, Two-way foreign body response (FBR), SPiDER Young MuSCs, and SPiDER Geriatric MuSCs. The One-way Sen Score was calculated using Single-Sample GSEA with the R package escape (v1.4.2) (Borcherding et al., 2021) as per Saul et al., 2022 (Saul et al., 2022). With this method, we tested six gene lists: One-way FBR, SenMayo, GO: Senescence, GO: SASP, Lung uninjured, and Lung injured.
We evaluated the ability of the two methods and the fourteen gene lists to accurately identify senescent-like MuSCs and progenitors by calculating a receiver operating characteristic (ROC) curve. For each MuSCs and progenitor cell, we evaluated whether it expressed both Cdkn2a and Cdkn1a. This analysis was done for all MuSCs and progenitors (Figure 3M-P), MuSCs and progenitors split by age (Figures 3P and 4A-F), and MuSCs and progenitors split by G1-Status (Figures 4G-J and 6B). The area under the curve (AUC) was calculated for each ROC curve (Figures 3M-P, 4K, and 5B).
Given that the One-way Sen Score with the FBR gene list performed the best (AUC = 0.86), we focused on that for further analyses. We next set a threshold of senescence based on the One-way FBR Sen Score where 50% of the MuSCs and progenitors with at least that score co-express Cdkn2a and Cdkn1a was set as the senescence threshold. For the One-way FBR Sen Score, MuSCs and progenitors with a score >= 2412.562 were called “senescent-like” while all other cells were called “not senescent-like” (Figure 6A, Extended Data Figure 10G-H). We refer to the Supplemental Methods for more details on how the Senescence scoring was performed and evaluated.
Cell cycle scoring. To each cell in the final dataset, we assigned an S phase score, a G2M phase score, and a discrete phase classification (G1/S/G2M) using Seurat’s standard Cell-Cycling Scoring method (Stuart et al., 2019). We have treated the S phase and G2M phase scores as polar coordinates to help us visualize how cells are progressing through the cell cycle (Figure 5D). We have converted the polar coordinates to cartesian coordinates and normalized the theta values so that they range from 0 to 1 so that cells in G1 have the lowest theta values followed by cells in S and G2M (Figure 5E, Extended Data Figure 10A-B). This enables us to see how cells are progressing linearly through the cell cycle. Seurat’s standard cutoff between cells classified as G1 versus cells classified as S is at the normalized theta value of 0.25. Looking at the distribution of cells across the normalized theta values as well as the expression of cell cycle markers Cdk1 and Cdk4, we decided to extend the G1 to S cutoff to 0.375 (Figure 5E-G, Extended Data Figure 10C-D). Cells with a normalized theta value >= 0.375 are considered Non-G1 (S/G2/M).
Myogenic cell subsets. From the final dataset, the cells with the cell type IDs ‘MuSCs and progenitors’ and ‘Myonuclei’ were subset out and the Seurat workflow was partially re-run (ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and RunUMAP). Dimensions accounting for 95% of the total variance were used to generate SNN graphs (FindNeighbors) and SNN clustering was performed (FindClusters) (Stuart et al., 2019). A clustering resolution of 0.7 was used resulting in 9 clusters. These 9 clusters were assigned general cell type IDs based on canonical myogenic markers. Of the 9 clusters, 4 were progenitor subtypes, 3 were myonuclei subtypes, and 2 were doublets (Figure 5A).
To more specifically ID the doublet clusters we looked at the co-expression of myogenic and non-myogenic markers (Extended Data Figure 7). Using GetAssayData, we extracted the log-normalized expression values of Pax7, Pecam1, Acta1, and C1qa in each cell in the 9 myogenic clusters. For each of the 9 clusters, we plotted cells by their expression values of Pax7 and Pecam1 and by their expression values of Acta1 and C1qa. A density plot was plotted along the x- and y-axes using ggmarginal(type = “density”) (Extended Data Figure 7). The two clusters identified as doublets were excluded from the remaining myogenic subset analyses.
To identify the myonuclei clusters more specifically, we looked at the expression of myonuclei markers, markers of high metabolic activity, and the percent of unique transcripts derived from mitochondrial genes (Extended Data Figure 8). Using GetAssayData, we extracted the log-normalized expression values of Myh1 and Myh4 in each cell in the three myonuclei clusters. For every cell, as defined by the cell barcode, we determined whether the expression value equaled zero (no expression) or exceeded zero (expression) for both Myh1 and Myh4 independently. For each of the three myonuclei clusters, the fraction of cells that expressed Myh1 and Myh4, only Myh1, only Myh4, and neither Myh1 nor Myh4 were calculated by dividing the number of cells that expressed Myh1 and Myh4, only Myh1, only Myh4, and neither Myh1 nor Myh4 by the total number of cells within each myonuclei cluster (Extended Data Figure 8D-G).
Harmony embedding values from the dimensions accounting for 95% of the total variance were used for further dimensional reduction with PHATE, using phateR (v1.0.7) (Figure 5B-C) (Moon et al., 2019). The PHATE embedding values were used by Monocle 3 (v1.0.0) (Trapnell et al., 2014; Qiu et al., 2017; Cao et al., 2019). The normal Monocle 3 workflow was used (cluster_cells, estimate_size_factors, learngraph, order_cells) where L1.sigma = 0.4 and the root cell was in the Progenitor 1 cluster. The pseudotime values for each cell, as defined by Monocle 3, were transferred from the Monocle 3 CDS object to the myogenic cells only Seurat object by cell barcode. The pseudotime values were divided into 25 bins with approximately equal numbers of cells (1089-1090 cells per bin) (Figure 5C). We assigned myogenic cell type IDs based on known myogenic marker expression in each pseudotime bin and dpi (Extended Data Figure 9). We refer to the Supplemental Methods for more details on how the myogenic cell type IDs were assigned.
We focused on dpi 3.5 and compared the cells in MuSCs 1, MuSCs 2-6, and all MPCs (this includes cells classified as Cycling MPCs and cells classified as Committing MPCs). For each sample, we calculated the fraction of cells that had log-normalized Cdkn2a and Cdkn1a counts greater than 0 (Figure 6E-G). Within these same groupings, we also calculated the fraction of cells that had a One-Way FBR Sen Score greater than 2412.562 (Figure 7H-J). We conducted two-sided Welch’s t-tests to evaluate whether there was a significant difference between ages.
Differential Expression and Gene Set Enrichment Analysis. For select comparisons, we used Seurat’s FindAllMarkers() function to identify genes that were differentially expressed between groups. In the myogenic subset with all ages at day 3.5, we did this analysis between the cells in ‘MuSCs 1’ and the cells in ‘MuSCs 2-6’ (Figure 5L). In the MuSCs and progenitors with all ages at day 3.5, we did this analysis between cells that co-expressed Cdkn2a and Cdkn1a (we refer to these cells as ‘Double positive’) and all other cells (we refer to these cells as ‘other’). Genes that had an FDR-corrected p-value <=0.05 were ranked by average log2 fold change and used in a Gene Set Enrichment Analysis (GSEA, v4.1.0). The gene set databases used included h.all.v2023.1.Hs.symbols.gmt, c2.all.v2023.1.Hs.symbols.gmt, 5.all.v2023.1.Hs.symbols.gmt, and c8.all.v2023.1.Hs.symbols.gmt. Significant GO Terms (FDR q-value <= 0.25) were ranked by the normalized enrichment score (enrichment scores normalized by the size of the gene set) (Figure 3H).
Tissue collection for spatial transcriptomics. Muscle injury was induced in young and geriatric C57BL/6J female mice by injecting both TA muscles with 10 μl of notexin (10 μg/ml; Latoxan, France). The mice were sacrificed, and TA muscles were collected at 5 dpi. The TA muscles were coated in Tissue-Tek O.C.T. Compound (Sakura Finetek #4583), snap-frozen in liquid nitrogen-cooled isopentane (Thermo Scientific Chemicals # AA19387AP), and then stored at -80C. Frozen TA muscles were sectioned with a cryostat transversely at 10 µm thick slices at -20C. Slide-seq spatial transcriptomics(Stickels et al., 2021) experiments were performed using the Curio Seeker Kit (Curio Bioscience). The sections were mounted on Curio Seeker tiles (Curio Bioscience) and further processed to generate spatial transcriptomics libraries.
Library preparation for spatial transcriptomics. Sequencing libraries were generated for the mounted young (Curio Tile ID: A0018_011) and geriatric (Curio Tile ID: A0018_012) TA muscle sections with the Curio Seeker Spatial Transcriptomics kit. After RNA hybridization and reverse transcription, the tissue sections underwent enzymatic digestion, and the beads were detached from the glass tile and suspended in solution. This was followed by second-strand cDNA synthesis and amplification. Sequencing libraries were prepared using the Nextera XT DNA sample preparation kit, pooled, and sequenced on an Illumina NextSeq 2K system using a P2 flow cell and a 100-cycle kit (Read 1 = 50 bp, Read 2 = 72 bp, Index 1 = 8 bp, and Index 2 = 8 bp). The sequencing data were processed with the slidesnake pipeline (https://github.com/mckellardw/slide_snake). The ‘seeker_v3.1_internalTrim’ recipe was used to trim adapters (cutadapt v4.6) (Martin, 2011) and low-quality reads before mapping to the mouse genome (GRCm39) using the STAR Solo (STAR v2.7.10b) pipeline that generates a feature-by-bead barcode expression matrix.
Pre-processing Slide-seq data for quality control. AnnData (v0.10.3) (Virshup et al., 2021) objects were generated from the count matrices of each sample and Scanpy (v1.9.3) (Wolf, Angerer, and Theis, 2018) was used to load the spatial coordinates of each bead as provided by Curio Bioscience. Beads with fewer than 20 transcripts and genes detected in less than 10 beads were removed. The expression data was then scaled and log-normalized. Additionally, the spatial distances between all pairs of beads were calculated and beads with less than 10 neighboring beads within 100 μm were removed (Mantri et al., 2024).
Cell type deconvolution in Slide-seq data. The Curio Seeker technology utilizes a dense array of beads,10 μm in diameter, that capture polyadenylated mRNA from a tissue section placed on the array (Stickels et al., 2021). The beads collect transcriptomes predominantly from cells in direct contact with them. Although the size of the beads allows for close to single-cell spatial profiling, multiple cells contribute to the transcriptome captured on each bead (Kleshchevnikov et al., 2022). Deconvolution approaches, where the gene expression signature from a reference single-cell dataset is used to predict the cell type contribution of each bead transcriptome, are used to construct cell type maps. Here, we applied the cell2location model (v0.1.3) (Kleshchevnikov et al., 2022) to deconvolve the spatial transcriptomics data against the aged scRNA-seq dataset presented in this paper. The scRNA-seq reference was filtered to retain only genes that are highly expressed and informative for identifying rare cell types (cell_count_cutoff = 5, cell_percent_cutoff = 0.01, and nonz_mean_cutoff = 1.12). Using these refined gene lists, cell-type-specific expression signatures were derived using negative binomial regression. The resulting signatures were applied to the Slide-seq data to assign cell-type identities based on the highest prediction scores, with hyperparameters set to N_cells_per_location = 1 and detection_alpha = 20.
Defining the injury zone in Slide-seq data. To define the injury zone, we employed a k-nearest neighbors (kNN) algorithm to calculate immune cell density around each tissue spot. Each spot was assigned an Immune_neighbors score based on the number of immune cells (T cells, B cells, NK cells, dendritic cells, neutrophils, and monocytes/macrophages) within its 300 nearest neighbors. We established injury zones by identifying spots with an Immune_neighbors score exceeding 20. Additionally, to focus on the high-density injury zone, additional specific spatial criteria were applied: spots with y-coordinates less than 1500 μm for the young sample and spots with x-coordinates greater than 1945 μm for the geriatric sample were classified as within the injury zone. Spots failing to meet these criteria were designated as a non-injury zone. To refine the injury zone identification and remove isolated spots, we applied a kNN algorithm with k=20, focusing on neighbors classified as 'injury' spots. We reclassified spots with more than 10 'injury' neighbors as part of the injury zone and spots with 10 or fewer 'injury' neighbors as 'non-injury', effectively discarding isolated spots and small clusters. This step ensured that the defined injury zones represented coherent clusters of high-density injury spots.
Senescence score calculation in Slide-seq data. The One-way FBR Sen Score was calculated on a refined FBR gene list. This refined list contained 44 genes that were captured in more than 2 beads assigned with the MuSCs and progenitors label in both samples. To calculate the Sen Score, we used the Scanpy tool score_genes function with the parameters set to 1000 bins and a control size of 50. The score was calculated on the normalized matrix of each sample. Age-specific significance was evaluated using a two-sided Mann-Whitney U-test.
Statistics and Reproducibility. For the newly generated scRNA-seq data, we collected four replicates for each injury time point, for a total of 32 samples. The additional young samples are biological replicates of TA muscles from distinct mice, and the geriatric samples are biological and technical replicates of two TA muscles from two distinct mice. The details for statistical analyses performed in this study are provided in the respective sections of the Results and Methods sections.