Extraocular muscle stem cells exhibit distinct cellular properties associated with non-muscle molecular signatures
Data files
Jan 25, 2024 version files 4.52 GB
-
Q_ACT_scRNAseq.zip
-
README.md
Abstract
The muscle stem cell (MuSC) population is recognized as functionally heterogeneous. Cranial muscle stem cells, which originate from head mesoderm, can have greater proliferative capacity in culture and higher regenerative potential in transplantation assays when compared to those in the limb. The existence of such functional differences in phenotypic outputs remain unresolved as a comprehensive understanding of the underlying mechanisms is lacking. We addressed this issue using a combination of clonal analysis, live imaging, and scRNA-seq, identifying critical biological features that distinguish extraocular (EOM) and limb (Tibialis anterior, TA) MuSC populations. Time-lapse studies using a MyogenintdTomato reporter showed that the increased proliferation capacity of EOM MuSCs is accompanied by a differentiation delay in vitro. Unexpectedly, in vitro activated EOM MuSCs expressed a large array of distinct extracellular matrix (ECM) components, growth factors, and signaling molecules that are typically associated with mesenchymal non-muscle cells. These unique features are regulated by a specific set of transcription factors that constitute a coregulating module. This transcription factor network, which includes Foxc1 as one of the major players, appears to be hardwired to EOM identity as it is present in quiescent adult MuSCs, in the activated counterparts during growth and retained upon passages in vitro. These findings provide insights into how high-performing MuSCs regulate myogenic commitment by active remodeling of their local environment.
README: Extraocular muscle stem cells exhibit distinct cellular properties associated with non-muscle molecular signatures
https://doi.org/10.5061/dryad.b8gtht7k0
Description of the data and file structure
The shared object is composed of two distinct folders corresponding to the Quiescent cells and Activated cells. Each folder is organized the same way:
- One folder "CellRanger_outs" containing raw and filtered counts matrices as outputed by CellRanger
- One folder "PREPROCESSED" containing the Seurat processed R object and a subfolder with the R object used for the scVelo and pyScenic analyses.
Q and ACT refers respectively to Quiescent and Activated cells.
Methods
scRNAseq data generation
MuSCs were isolated on BD FACSAriaTM III based on GFP fluorescence and cell viability from Tg:Pax7- nGFP mice (Sambasivan et al., 2009). Quiescent MuSCs were manually counted using a hemocytometer and immediately processed for scRNA-seq. For activated samples, MuSCs were cultured in vitro as described above for four days. Activated MuSCs were subsequently trypsinized and washed in DMEM/F12 2% FBS. Live cells were re-sorted, manually counted using a hemocytometer and processed for scRNA-seq.
Prior to scRNAseq, RNA integrity was assessed using Agilent Bioanalyzer 2100 to validate the isolation protocol (RIN>8 was considered acceptable). 10X Genomics Chromium microfluidic chips were loaded with around 9000 cells and cDNA libraries were generated following manufacturer’s protocol. Concentrations and fragment sizes were determined using Agilent Bioanalyzer and Invitrogen Qubit. cDNA libraries were sequenced using NextSeq 500 and High Output v2.5 (75 cycles) kits. Count matrices were subsequently generated following 10X Genomics Cell Ranger pipeline.
Following normalisation and quality control, we obtained an average of 5792 ± 1415 cells/condition.
Seurat preprocessing
scRNAseq datasets were processed using Seurat (https://satijalab.org/seurat/) (Butler et al., 2018). Cells with more than 10% of mitochondrial gene fraction were discarded. 4000-5000 genes were detected on average across all 4 datasets. Dimensionality reduction and UMAPs were generated following Seurat workflow. The top 100 DEGs were determined using Seurat "FindAllMarkers" function with default parameters. When processed independently (scvelo), the datasets were first regressed on cell cycle genes, mitochondrial fraction, number of genes, number of UMI following Seurat dedicated vignette, and doublets were removed using DoubletFinder v3 (McGinnis et al., 2019). A "StressIndex" score was generated for each cell based on the list of stress genes previously reported (Machado et al., 2021) using the “AddModule” Seurat function. 94 out of 98 genes were detected in the combined datasets. UMAPs were generated after 1. StressIndex regression, and 2. after complete removal of the detected stress genes from the gene expression matrix before normalization. In both cases, the overall aspect of the UMAP did not change significantly (Figure S5). Although immeasurable confounding effects of cell stress following isolation cannot be ruled out, we reasoned that our datasets did not show a significant effect of stress with respect to the conclusions of our study.
Matrisome analysis
After subsetting for the features of the Matrisome database (Naba et al., 2015) present in our single-cell dataset, the matrisome score was calculated by assessing the overall expression of its constituents using the "AddModuleScore" function from Seurat (Butler et al., 2018).
RNA velocity and driver genes
Scvelo was used to calculate RNA velocities (Bergen et al., 2020). Unspliced and spliced transcript matrices were generated using velocyto (Manno et al., 2018) command line function. Seurat-generated filtering, annotations and cell-embeddings (UMAP, tSNE, PCA) were then added to the outputted objects. These datasets were then processed following scvelo online guide and documentation. Velocity was calculated based on the dynamical model (using scv.tl.recover_dynamics(adata), and scv.tl.velocity(adata, mode=’dynamical’)) and differential kinetics calculations were added to the model (using scv.tl.velocity(adata, diff_kinetics=True)). Specific driver genes were identified by determining the top likelihood genes in the selected cluster. The lists of the top 100 drivers for EOM and TA progenitors are given in Suppl Tables 10 and 11.
Gene regulatory network inference and transcription factor modules
Gene regulatory networks were inferred using pySCENIC (Aibar et al., 2017; Sande et al., 2020). This algorithm regroups sets of correlated genes into regulons (i.e. a transcription factor and its targets) based on binding motifs and co-expression patterns. The top 35 regulons for each cluster were determined using scanpy "scanpy.tl.rank_genes_groups" function (method=t-test). Note that this function can yield less than 35 results depending on the cluster. UMAP and heatmap were generated using regulon AUC matrix (Area Under Curve) which refers to the activity level of each regulon in a given cell. Visualizations were performed using scanpy (Wolf et al., 2018). The outputted list of each regulon and their targets was subsequently used to create a transcription factor network. To do so, only genes that are regulons themselves were kept. This results in a visual representation where each node is an active transcription factor and each edge is an inferred regulation between 2 transcription factors. When placed in a force-directed environment, these nodes aggregate based on the number of shared edges. This operation greatly reduced the number of genes involved, while highlighting co-regulating transcriptional modules. Visualization of this network was performed in a force-directed graph using Gephi “Force-Atlas2” algorithm (https://gephi.org/).