Single-cell multi-modal analysis of tumor microenvironment in human non-small cell lung cancer tissues
Data files
May 07, 2025 version files 8.54 GB
-
deident_clinical.csv
4.18 KB
-
Histology.zip
7.84 GB
-
mIF_data.h5ad
691.97 MB
-
README.md
3.29 KB
-
RNA-seq.zip
25.78 KB
-
tissue_patient_map.csv
1.73 KB
Abstract
To understand the role of tumor microenvironement in affecting clinical outcomes, we generated tissue-matched multiplex immunofluorescence (mIF) images, H&E-stained histopathological images, and RNA-seq data of human non-small cell lung cancer tissues.
Components
-
mIF_data.h5ad: Multiplex immunofluorescence imaging data of NSCLC tissues. Data are stored in an AnnData object (denoted as "adata").
- adata.X: immunofluorescence signals quantified from each cell with dimension Ncells x Nbiomarkers
- adata.obs: metadata of each cell, containing the following variables
- CellID: Unique cell identifiers
- Centroid X µm: Nucleus centroid coordinate (X-axis)
- Centroid Y µm: Nucleus centroid coordinate (Y-axis)
- Nucleus detection probability as determined by the StarDist model (version 0.4.0)
- Nucelus: *: Nuclear morphology and texture measurements as quantified from the nucleus segmentation results
- Cell: *: Cellular morphology and texture measurements as quantified from the cell segmentation results
- phenotype: Cell type as determined by the hierarchical lineage assignment strategy
- RCN: Membership of the recurrent spatial cell neighborhood as determined by the neighborhood cell-type composition
- patient_id: De-anonymized patient identifier
- core_id: Tissue core identifier
-
Histology folder: Nucleus segmentation and cell-type classification results on H&E-stained whole-slide images by NucSegAI.
Each file is named by its tissue identifier as indicated in the tissue_patient_map.csv.
Each cell nucleus contains the following information:
- bbox: two dimensional spatial coordinates of the bounding box (rectangles)
- centroid: nucleus centroid coordinates
- contour: nucleus contour coordinates
- type_prob: the probability of the predicted cell-type class
- cell_type: the predicted cell-type class
-
RNA-seq folder: Geneset variation analysis of the FFPE RNA-seq data.
A csv file of dimension Nsamples x Npathways. The pathway score is determined using the GSEAPy package (version 1.1).
-
tissue_patient_map.csv: Map from tissue IDs to patient IDs
-
deident_clinical.csv: De-identified clinical information with the following variables
- patient_id: patient identifiers
- Gender (0: Female, 1:Male)
- Race (Asian: 0, Black: 1, White: 2, Other/Unknown: 3)
- Smoking Status (0: Never, 1: Former, 2: Daily)
- Age (<70: 0, >=70: 1)
- Stage: Cancer stage
- Histology class (0: LUAD, 1: LUSC, 2: Other)
- IO_received (0: No, 1: Yes)
- IO_response (0: No, 1: Yes)
- Deceased (0: No, 1: Yes): Whether the patient is deceased within the follow-up period
- Survival or loss to follow-up (months)
Authors
- Yuanning Zheng
Stanford University
eric2021@stanford.edu - Olivier Gevaert
Stanford University
ogevaert@stanford.edu
Human subjects data
The use of human samples in this study was approved by Stanford’s Institutional Review Board (IRB). The IRB waived the requirement for informed consent due to the de-identified retrospective nature of the study. Data were de-identified in accordance with HIPAA privacy regulations. Patient and tissue identifiers were anonymized by randomizing them into numerical integers (1-136), and ages were categorized into two ranges: less than 70 and 70 or older.
