Single-cell RNA-seq unveils fibroblast-t cell interplay in muscle-invasive bladder cancer
Data files
Apr 08, 2025 version files 4.22 GB
-
BCa-cellranger.zip
4.22 GB
-
README.md
7.79 KB
-
Table_S1-EJC.csv
1.95 KB
Abstract
Muscle-invasive bladder cancer (MIBC) is characterized by a complex tumor microenvironment (TME) that drives aggressive progression and treatment resistance. Previous studies have highlighted the roles of cancer-associated fibroblasts (CAFs) and exhausted T (Tex) cells in MIBC, but their interactive mechanisms remain poorly understood. Here, single-cell RNA sequencing of 19 tissue samples from 12 patients—7 MIBC, 3 non-muscle-invasive bladder cancer (NMIBC), and 9 normal tissue samples—identified 13 transcriptionally distinct fibroblast clusters and 10 functionally heterogeneous T-cell subsets. Two interferon (IFN)-responsive fibroblast populations, F-ISG15 (inflammatory CAFs) and F-POSTN (myofibroblastic CAFs), were shown to predominate in the MIBC TME. In vivo experiments demonstrated that IFN-γ secreted by Tex cells polarizes CAFs to secrete CXCL12, which recruits CXCR4-expressing T cells via the CXCL12-CXCR4 chemotactic axis. Spatial analysis revealed a bidirectional loop: Tex-derived IFN-γ sustains CAF activation, whereas CAF-secreted CXCL12 amplifies Tex infiltration. Clinically, activated CAF signatures correlate with advanced disease stages and reduced patient survival in MIBC. These findings establish CXCL12 and IFN signaling as critical therapeutic targets, offering new strategies to disrupt immunosuppressive TME crosstalk and improve outcomes for MIBC patients.
Summary of experimental efforts underlying this dataset
This dataset was generated to explore the complex interactions between cancer - associated fibroblasts (CAFs) and exhausted T (Tex) cells within the tumor microenvironment (TME) of muscle - invasive bladder cancer (MIBC). Tissue samples were collected from 12 patients, consisting of 7 MIBC, 3 non - muscle - invasive bladder cancer (NMIBC), and 9 normal tissue samples. Single - cell RNA sequencing (scRNA - seq) was performed on a total of 127,391 cells isolated from these samples. The data was then comprehensively analyzed to identify different cell types, fibroblast and T - cell subtypes, and to uncover the molecular mechanisms governing their interactions. Additionally, experimental validation was carried out through immunohistochemistry, immunofluorescence, ELISA, and other assays to support the findings from the scRNA - seq analysis.
https://doi.org/10.5061/dryad.nzs7h450j
Abbreviations
- MIBC: Muscle - Invasive Bladder Cancer
- NMIBC: Non - Muscle - Invasive Bladder Cancer
- TME: Tumor Microenvironment
- CAFs: Cancer - Associated Fibroblasts
- Tex: Exhausted T cells
- scRNA - seq: Single - Cell RNA Sequencing
- IFN: Interferon
- ELISA: Enzyme - Linked Immunosorbent Assay
- IHC: Immunohistochemistry
- TCGA: The Cancer Genome Atlas
Description of the data and file structure
Single-cell sequencing of bladder cancer
Cell Ranger
Folder Naming Convention and Explanation
The main dataset folder is named “BCa - cellranger”. The sub-folders within it are named mainly based on the sample type and number:
- Sample Type: “LymphNode” represents lymph node samples; “Normal” represents normal samples; “Tumor” represents tumor samples; “PBMC” represents peripheral blood mononuclear cell samples.
- Numbering: Such as “05”, “06”, etc., which are used to distinguish different specific samples. For example, “LymphNode05” indicates the folder for the lymph node sample numbered 05.
Explanation of Files within Subfolders
Taking the “filtered_feature_bc_matrix” folder within the “LymphNode05” sub - folder as an example, it contains the following three files:
barcodes.tsv: This file stores cell barcode information. Each barcode corresponds to a single cell and serves as an identifier for distinguishing different cells in single - cell sequencing analysis. In subsequent research such as cell type identification and gene expression analysis, these barcodes are used to locate and analyze the gene expression of each cell.
features.tsv: This file records gene feature information. It contains gene identifiers (such as gene IDs, gene names, etc.) and relevant annotation information, which is an important basis for identifying and analyzing gene expression data. When conducting operations such as differential gene expression analysis and gene function enrichment analysis, the gene information in this file is required to interpret the data.
matrix.mtx: This is a sparse matrix file that stores gene expression data in cells in a matrix format. The rows of the matrix correspond to genes (associated with the gene information in features.tsv), the columns correspond to cells (associated with the cell barcodes in barcodes.tsv), and the values in the matrix represent the expression levels of genes in the corresponding cells (such as UMI counts, etc.). It is a core data storage file in single - cell sequencing analysis.
Table_S1 - EJC Explanation
Table_S1 - EJC in the dataset provides a comprehensive set of information. It includes details about patients and sample - related sequencing and analysis metrics.
Patient - related Information:
- Patient: Serves as a unique identifier for each patient in the study.
- Gender: Indicates whether the patient is male or female.
- Age: Records the age of the patient at the time of sample collection.
- History of treatment: In this table, all entries are “Treatment naïve”, meaning the patients had not received any relevant treatment before the sample collection.
Sample Information:
- Sample: Specifies the type of sample, including Tumor, PBMC, Normal, Lymph Node, and different tumor - site samples like Tumor Site1 and Tumor Site2.
- Tumor Stage: Describes the stage of the tumor, such as pT2aN0M0, pT2bN0M0, etc., which helps in understanding the extent of tumor development, including tumor size, invasion depth, and metastasis status.
- Tumor Grade: Classifies the tumor as Low or High grade, reflecting the degree of tumor cell differentiation and malignancy.
- Sequencing and Analysis Metrics:
- Cell Number: Represents the total number of cells detected in the corresponding sample, providing an indication of the sample’s cellular content.
- Mean Reads per Cell: Reflects the average number of sequencing reads obtained for each cell, which is an important indicator of sequencing depth.
- Medium Gene per Cell: Denotes the median number of genes detected per cell, helping to assess the richness of gene expression in cells.
- Medium UMI per Cell: Represents the median count of Unique Molecular Identifiers (UMI) per cell, which is used to measure gene expression levels more accurately.
- Sequencing Saturation: Expressed as a percentage, it shows the degree to which all transcripts in the sample have been covered during sequencing.
- After Filtering: This value indicates the result after data filtering, which could be related to cell number or other data metrics, reflecting the state of the data after quality - control processing.
- Features: Refers to the number of features, which may represent gene - related features or other relevant characteristics detected in the sample, and is useful for characterizing the sample’s genetic features.
Access information
Other publicly accessible locations of the data:
- The transcriptome reported in this paper has been deposited in the China National Center for Bioinformation under accession number PRJCA 029328, which is publicly accessible for all researchers at http://bigd.big.ac.cn/gsa.
Data was derived from the following sources:
- n/a
Human subjects data
I hereby declare that explicit consent has been obtained from all participants in this study to publish the de-identified data in the public domain. During the sample collection phase, each participant was provided with a detailed explanation of the research objectives, data processing methods, and potential implications of data publication. This ensured that they fully understood the study and voluntarily signed an informed consent form.
Regarding data de-identification, a series of stringent measures were implemented. At the time of sample collection, all samples were labeled with unique codes, and no directly identifiable personal information, such as names, ID numbers, or medical record numbers, was recorded on the samples. During the data processing and storage stages, the codes used to identify the sample sources and the participants’ personal information were stored in separate systems with strict access controls. Only authorized researchers had access to this information. Additionally, during the data analysis phase, all data related to individual characteristics, such as age and gender, were presented in the form of aggregated statistics to avoid any disclosure of information that could potentially identify a specific individual. Through these measures, we have ensured that the human subjects data in this dataset has been properly de-identified, protecting the privacy and rights of the participants to the greatest extent possible and complying with applicable legal and ethical guidelines.