Dysregulation of cell state dynamics during early stages of serous endometrial carcinogenesis
Data files
Sep 05, 2025 version files 10.37 GB
-
allDiestrus_Epi_mU7_mU30_recluster_final_01182025.rds
1.33 GB
-
cont_Epi_mU7_mU30_Final_01182025.rds
348.65 MB
-
cont_mU7_mU30_Final_01182025_simple.rds
474.46 MB
-
DiestrusMice_mU7_mU30_Final_01182025_simple.rds
2.08 GB
-
Ear_Epi_mU7_mU30_recluster_01182025.rds
444.06 MB
-
EarCan_recluster_mU7_mU30_01182025_simple.rds
821.67 MB
-
FinalSamples_preDF_02232024.rds
1.59 GB
-
FinalSamples_preQC_02232024.rds
1.64 GB
-
Late_Epi_mU7_mU30_recluster_01182025.rds
539.49 MB
-
LateCan_recluster_mU7_mU30_01182025_simple.rds
788.25 MB
-
README.md
13.42 KB
-
S01_Normal_cloupe.cloupe
31.26 MB
-
S05_Normal_cloupe.cloupe
22.39 MB
-
S06_Normal_cloupe.cloupe
16.14 MB
-
S08_D_cloupe.cloupe
20.78 MB
-
S10_Normal_cloupe.cloupe
20.98 MB
-
S11_D_cloupe.cloupe
28.32 MB
-
S13_PD_cloupe.cloupe
22.82 MB
-
S15_PD_cloupe.cloupe
23.46 MB
-
S16_PD_cloupe.cloupe
20.92 MB
-
S17_PD_cloupe.cloupe
24.49 MB
-
S18_D_cloupe.cloupe
29.66 MB
-
S19_D_cloupe.cloupe
28.09 MB
-
S20_D_cloupe.cloupe
23.11 MB
Abstract
We have compiled single-cell transcriptomes of the mouse endometrium before and after induction of serous endometrial carcinoma (SEC) using our genetically engineered mouse model. In this model, Pax8 is the promotor for the targeted inactivation of Trp53 and Rb1 in the mouse endometrium after the introduction of doxycyline to recapitulate SEC. To prepare a census of single-cell RNA and spatial RNA expression profiles, we collected uterine horns from 18 mice (6 weeks to 6 months old) including 5 normal, 7 pre-dysplastic (30-50 days post-induction, DPI), and 6 dysplastic (80-150 DPI) samples. For spatial transcriptome evaluation we collected uterine horns from 13 mice including 4 normal, 4 pre-dysplastic, and 5 dysplastic samples. All mice were at late diestrus according to vaginal smears followed by morphological verification and estimation of Ki67 index in paraffin sections of adjacent uterus. Additionally, pathology of each sample was validated by the analysis of trained comparative pathologists to identify if they are show dysplastic lesions.
The datasets included in this upload are seurat objects of all samples combined, as well as those separated by SEC stage (Normal, Pre-dysplastic, or Dysplastic). Each stage has an object associated with all cells and an epithelial subset. Additionally, there are two RDS objects included that are the combined sample dataset at two stages throughout the quality control process. The Visium datasets included are .cloupe files compatable for evaluation in LoupeBrowser 7.
This README.txt file was generated on 2024-06-06 by Matalin Pirtz
GENERAL INFORMATION
-
Title of Dataset:
Dysregulation of cell state dynamics during early stages of serous endometrial carcinogenesis -
Description of the Data and File Structure:
These are prepared Seurat objects for quick analysis of the cell states present in the normal, pre-dysplastic, and dysplastic mouse uterus after the induction of serous endometrial carcinoma through the inactivation of Trp53 and Rb1 in the endometrium. These files are intended to complement R code deposited on GitHub (https://github.com/PirtzM/EarlySEC_scRNA) for further exploration of the cell types/states present in the tissue in the single-cell RNA sequencing (scRNA-seq) samples.
There are also .cloupe files for quick exploration of spatial RNA sequencing samples of the normal, pre-dysplastic, and dysplastic mouse uterus after the induction of serous endometrial carcinoma through the inactivation of Trp53 and Rb1 in the endometrium. These samples can be easily viewed in 10X Genomics LoupeBrowser application.
-
Author Information
Principal Investigator Contact Information
Name: Alexander Yu. Nikitin
Institution: Cornell University
Email: an58@cornell.edu
B. First Author Contact Information
Name: Matalin Pirtz
Institution: Cornell University
Email: mgp73@cornell.edu
Name: Andrea Flesken-Nikitin
Institution: Cornell University
Email: af78@cornell.edu
- Date of data collection:
scRNA-seq: Aug. 2021 - Jun. 2023
Visium seq: Mar. 2022 - Dec. 2023 - Information about funding sources that supported the collection of the data:
A. National Institutes of Health - CA248524
B. National Institutes of Health - CA260115
DATA & FILE OVERVIEW
- File List:
A. FinalSamples_preQC_02232024.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Seurat object including all 18 scRNA-seq samples (42,694 cells) before removing low quality cells or doublets. Used for Supplemental Figure 2A-C.
B. FinalSamples_preDF_02232024.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Seurat object including all 18 scRNA-seq samples (38,298 cells) after removing low quality cells but before removing doublets. Used for Supplemental Figure 2D-E.
C. DiestrusMice_mU7_mU30_Final_01182025_simple.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Fully processed Seurat object containing all 18 scRNA-seq samples (37,543 cells). Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript and my GitHub.
D. allDiestrus_Epi_mU7_mU30_recluster_final_01182025.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Fully processed Seurat object containing the epithelial subset of all 18 scRNA-seq samples (19,449 cells). Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript and my GitHub.
E. cont_mU7_mU30_Final_01182025_simple.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Fully processed Seurat object containing only the 5 control/normal scRNA-seq samples (7,614 cells). Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript and my GitHub.
F. cont_Epi_mU7_mU30_Final_01182025.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Fully processed Seurat object containing only the epithelial subset of the 5 control/normal scRNA-seq samples (4,689 cells). Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript and my GitHub.
G. EarCan_recluster_mU7_mU30_01182025_simple.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Fully processed Seurat object containing only the 7 early SEC/Pre-dysplastic scRNA-seq samples (17,212 cells). Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript and my GitHub.
H. Ear_Epi_mU7_mU30_recluster_01182025.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Fully processed Seurat object containing only the epithelial subset of the 7 early SEC/Pre-dysplastic scRNA-seq samples (7,713 cells). Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript and my GitHub.
I. LateCan_recluster_mU7_mU30_01182025_simple.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Fully processed Seurat object containing only the 6 late SEC/Dysplastic scRNA-seq samples (12,717 cells). Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript and my GitHub.
J. Late_Epi_mU7_mU30_recluster_01182025.rds
i. Format: Seurat object, saved as .rds file
ii. Description: Fully processed Seurat object containing only the epithelial subset of the 6 late SEC/Dysplastic scRNA-seq samples (7,063 cells). Includes dimensional reduction values after integration with Harmony, as well as cell types, quality control information, and many other metadata features. Details on processing can be found in the manuscript and my GitHub.
K. S01_Normal_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Normal sample, S01. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
L. S05_Normal_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Normal sample, S05. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
M. S06_Normal_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Normal sample, S06. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
N. S10_Normal_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Normal sample, S10. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
O. S13_PD_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Pre-dysplastic sample, S13. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
P. S15_PD_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Pre-dysplastic sample, S15. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
Q. S16_PD_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Pre-dysplastic sample, S16. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
R. S17_PD_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Pre-dysplastic sample, S17. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
S. S08_D_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Dysplastic sample, S08. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
T. S11_D_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Dysplastic sample, S11. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
U. S18_D_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Dysplastic sample, S18. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
V. S19_D_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Dysplastic sample, S19. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
W. S20_D_cloupe.cloupe
i. Format: LoupeBrowser Object, saved as .cloupe
ii. Description: The Visium Spatial object compatible with LoupeBrowser viewing and analysis of Dysplastic sample, S20. Includes clustering to only include the endometrially associated barcodes for expression quantification. Details on processing can be found in the manuscript.
Cell Type Variables
Here is a quick reference of the cell type labels found within the scRNA-seq . rds file metadata. All cell types were determined based on gene expression commonly used to identify the cell types in the adult mouse uterus. These gene lists can be found both on my GitHub and in the manuscript.
-
LE (1-5) - Luminal Epithelium
-
GE - Glandular Epithelium
-
DDP - Putative Ciliated Epithelium
-
CE - Cycling Epithelium
-
CM - Cox1/Cox2/Malat1+ Epithelium
-
Meso - Mesothelium
-
SM - Smooth Muscle
-
Fib (1-3) - Fibroblast
-
Lymph - Lymphatic Endothelium
-
Vascular - Vascular Endothelium
-
Im - Macrophage, Lymphocyte, and/or T cell, NK cell
Also to note: Cluster numbering within a specific cell type (ex: LE 1 or LE 2) was determined by the size of the cluster in the normal samples, and then gene expression for that cluster was matched across stages. Also, cell types in the stage specific epithelial subsets (files F, H, J), are also indicated with a stage specific modifier (ex: N - LE 1), to acknowledge stage specific variation in gene expression. Further information on this can be found in the Methods.
Code/Software
-
R with the Seurat package is required to run all .rds files. All analysis was done with R version 4.1.1 and Seurat version 4.3.0. To access the files in R Studio, load the Seurat library into your session using library(Seurat), and read the files using Seurat::readRDS([file name here]).
Scripts to follow the generation of figures in the manuscript (including the loading of specific, applicable files), can be found on GitHub (https://github.com/PirtzM/EarlySEC_scRNA/tree/main/Figure_Scripts).
-
10X Genomics' LoupeBrowser application is required to view all .cloupe files.LoupeBrowser version 7.0.1 was used for analysis in the manuscript. Install the application here: https://www.10xgenomics.com/support/software/loupe-browser/downloads/previous-versions, then open the .cloupe files directly into the application.
Contributions
All sequencing samples were collected by Andrea Flesken-Nikitin
All dataset alignment, preprocessing, and analysis was done by Matalin G. Pirtz
Experimental Animals
Tg(Pax8-rtTA2SM2)1Koes/J (Pax8-rtTA) Tg(tetO-Cre)1Jaw/J(Tre-Cre) mice (Perets et al., 2013) mice and Gt(ROSA)26Sortm9(CAG-tdTomato)Hze* (Ai9) mice (JAX stock no. 007909) were obtained from The Jackson Laboratory (Bar Harbor, ME, USA). Trp53loxP/loxP and Rb1loxP/loxP mice, which have loxP alleles flanking their respective genes, Trp53 and Rb1, were a gift from Dr. Anton Berns (The Netherlands Cancer Institute, Amsterdam, The Netherlands). For all experiments, mice were collected in later diestrus, also described by some as early proestrus. Briefly, criteria for this stage included observing almost exclusively leukocytes in vaginal smears, a medium-wide lumen, dense/early edematous stroma, and medium-high proliferation of the glandular and luminal epithelia (Ki67 index 10-95%). All experiments and maintenance of the mice were performed following ethical regulations for animal testing and research. The Cornell University Institutional Animal Care and Use Committee (IACUC) approved all animal protocols, and experiments were performed in compliance with its institutional guidelines.
Doxycycline induction
Doxycycline was administered through a single intraperitoneal injection (i.p.) to Pax8-rtTA Tre-Cre Trp53loxP/loxPRb1loxP/loxP Ai9 mice and control mice at 6 weeks to 6 months of age. Doxycycline was administered at a dose of 12 μl g^-1^ body weight at a concentration of 6.7 mg ml^-1^ in sterile PBS. Mice were identified to be in diestrus phase of the estrous cycle before induction using vaginal smears. All mice were euthanized by CO2, and further analyses were carried out.
Pathological evaluation
All mice underwent gross pathology evaluation at the time of necropsy. Potential sites of endometrial carcinoma metastasis were evaluated carefully, including the local lymph nodes, omentum, peritoneum, and lungs.
Histology, immunohistochemistry, and image analysis
All tissues were fixed in buffered 4% paraformaldehyde overnight at 4˚C followed by standard tissue processing and paraffin embedding. Histology and immunohistochemistry stainings were carried out on 4-μm-thick tissue sections. For immunohistochemistry, antigen retrieval was performed as needed by incubation of deparaffinized and rehydrated tissue sections in boiling 10 mM sodium citrate buffer (pH 6.0) for 15 min. All primary antibodies used for immunostaining are listed in Supplementary Table 2. The primary antibodies were incubated at 4˚C overnight or 1 hr at room temperature, followed by incubation with secondary biotinylated or fluorophore conjugated antibodies (one hour, at room temperature, RT). Incubation with biotinylated antibodies slides followed by modified Elite avidin-biotin peroxidase (ABC) method (Vector Laboratories, Burlingame, CA, USA; pk-6100) for 30 minutes (RT). Stained sections were scanned by ScanScope CS2 (Leica Biosystems, Vista, CA) with a 40X objective, followed by the analysis with the ImageJ software (National Institutes of Health, Bethesda, MD, USA). Following Vectashield Vibrance Antifade Mounting Medium with DAPI (H-1800, Vector Laboratories, Newark, CA), immunofluorescence samples were tilescanned (Zeiss 710 upright Confocal, Cornell University's Biotechnology Resource Center) at 40X. Analysis was performed with ImageJ software.
Single-cell isolation
For collection, induced mice were sacrificed at various time points in the diestrus phase of the estrous cycle. Each mouse was collected and processed independently to generate single-cell suspensions. Both uterine horns of individual mice were collected and placed in sterile 1X PBS containing 100 IU ml^-1^ of penicillin and 100 μl ml^-1^ streptomycin (Corning, 30-002-Cl). A portion of one uterine horn per mouse was set aside in 4% paraformaldehyde at 4˚C for paraffin embedding. The remaining uterine horns were placed in a 200 μL drop of the same PBS solution, cut open lengthwise, and minced into 1.5-2.5 mm pieces with scalpels. Minced tissues were transferred with the help of a sterile, wide bore 200 μl pipette tip into a 15 ml centrifuge tube containing 2 ml of the same PBS solution and then centrifuged for 6 minutes at 400 rcf at 4˚C. Then, the minced tissues for individual mice were digested using a Papain Dissociation System (Worthington Biochemical Corporation, New Jersey). Tissue was digested in papain mixture from the kit at 37˚C for 1.5 hours with periodic mechanical perturbation. After papain digestion, the papain was inhibited with 3 ml of our stop solution, a DMEM/FBS solution (DMEM Ham’s F12, Corning 10-092-CV; 20% fetal bovine serum [FBS], Sigma-Aldrich F4135; and 0.1 mg ml^-1^ DNase I, Stem Cell Technologies 07900). Cell suspension was placed in a new tube. Cells were centrifuged as before, and the supernatant was removed. Next, 1.35 ml of the albumin-ovomucoid inhibitor provided by the kit was added to the suspended cells, and the cells were resuspended gently. The reaction was stopped using 3 ml of our stop solution, mixed, and then cells were spun down at the same rate as before. The supernatant was removed, and the pellet was resuspended in 2 ml of 0.25% TrypLE Express Enzyme solution (Invitrogen, 12604013) prewarmed to 37˚C. The tube was then incubated at 37˚C with a loose cap for 10 minutes in a 5% CO2 incubator. Cells were resuspended with a 1-ml pipette tip 40 times, and then digestion was inhibited with our stop solution. Cells were spun down and supernatant was removed before the addition of 1 ml Dispase-mUE (Dispase II [7 mg ml^-1^ DMEM Ham’s F12], Neutral Protease, Worthington NPRO2; and 10 μg μL^-1^ DNase). Cells were aspirated in the solution gently 40 times using a 1 ml pipette tip. Then suspensions were filtered through a 35 μm filter (Falcon 08-771-23) to remove debris, and remaining cells were spun down. Cells were resuspended in 100 μl A-scR-Medium (5% FBS in DMEM Ham’s F12 with 1 mM Y-27632 [Toris 1257], placed on ice and transferred to Cornell’s Genomics Core for single-cell RNA sequencing for library preparation.
Single-cell RNA sequencing library preparation
To begin single-cell RNA sequencing library preparation, cell aliquots were stained with trypan blue for live and dead cell calculation. Live cell preparations with a target cell recovery of 5,000 cells were loaded on a Chromium Controller (10X Genomics, Single Cell 3’ v3 chemistry) to perform single-cell partitioning and barcoding using the microfluidic platform device. The NextSeq500 System was used to sequence the cDNA library samples after barcode preparation.
Download and alignment of RNA sequencing data
For sample sequence alignment, a custom reference for mm39 which included the Ai9-tdTomato gene annotation was built using the CellRanger (v7.1.0, 10X Genomics) mkref function. From UCSC Genome Browser downloads, mm39.fa soft-masked assembly sequence and mm39.ncbi.RefSeq.gtf file for gene annotations (last modified 2022-11-30) were downloaded for the reference. These files were appended with the Ai9-TdTomato annotation. The Ai9-tdTomato sequence included both the tdTomato sequence and WPRE sequence and was annotated as TdTomato-UTR. The fasta and gtf files for these sequences were shared with us from the Baker Lab at the University of Edinburgh (10.1093/cvr/cvab296). These files were concatenated with the mm39 fasta and gtf files. Finally, raw reads were aligned to the mm39+tdTomato reference genome using CellRanger (v7.0.1, 10X Genomics).
Single-cell RNA sequencing preprocessing and batch correction
First, SoupX (v1.6.2; github.com/constantAmateur/SoupX) was used to remove ambient RNA signals using their default workflow (autoEstCounts and adjustCounts). The standard Seurat (v4.3.0) workflow (NormalizeData, ScaleData, FindVariableFeatures, RunPCA, FindNeighbors, FindClusters, and runUMAP; github.com/satijalab/seurat) was used for Seurat object preparation and preprocessing. DoubletFinder (v2.0.3; github.com/chris-mcginnis-ucsf/DoubletFinder) was used to predict doublets in each dataset individually. The default PN~~ value (0.25) was used, and BCmvn optimization was used to identify ideal PK values for each sample. Doublet rate was informed by the 10x Chromium Handbook. Then, all datasets were merged. Cells with fewer than 200 features (nFeature), fewer than 750 transcripts (nCounts), or more than 25% of unique transcripts related to mitochondrial genes, and predicted doublets were removed. After preprocessing and merging of the datasets, batch correction was performed using Harmony (v0.1.1; github.com/immunogenomics/harmony). We then used Seurat to process the integrated data.
Clustering parameters and annotations
Cell types were determined using multiple subsets: as a fully integrated object (Integrated) and as individually integrated objects based on SEC stage (normal, predysplastic, and dysplastic). Each SEC-specific object was a subset (Seurat::subset) from the original integrated dataset. Once subset, the data were rescaled (Seurat::FindVariableFeatures and Seurat::ScaleData), but original normalization was maintained. Following that, principal component analysis (Seurat::RunPCA) was rerun and the dimensions accounting for 95% of variability in the datasets were used to generate SNN graphs (Seurat::FindNeighbors). Cell clustering was determined using Louvain clustering on the output graphs (Seurat::FindClusters). Clustering resolution and K values were adjusted for each combination of samples to best represent the data. Cell types were determined based on commonly used genes for the healthy, adult mouse uterus. In particular, references from the mouse uterus at diestrus were used when possible. The same process was followed for epithelial subsets from their associated original dataset. A core list of genes was used to compare datasets and validate that general cell types were commonly named. If needed, clusters that had similar canonical gene expression patterns underwent differential expression analysis (Seurat::FindMarkers) for better identification. Cell types with multiple associated clusters were numbered in order based on their abundance in the normal dataset (i.e. LE 1, LE 2) and then carried to other datasets based on similar expression. To highlight that there were differences in overall expression profiles between SEC stages in these clusters, they were named based on which SEC stage they originated (i.e. N – LE 1 from the normal dataset). Genes that had been associated with stem/progenitor-like phenotypes, but without documentation of specific expression in the LE or GE, were labeled as “putative progenitor-like” in DotPlots.
Differential gene expression analysis
The fully integrated dataset including all samples was used for differential gene expression analysis. The dataset was subsetted (Seurat::subset) to reflect only the LE, GE, DDP, and cycling populations. The COX/MAL population was excluded due to the possibility of their being damaged cells as a result of the cell isolation. Seurat::FindMarkers() was used to identify the top 100 differentially expressed genes (based on positive Log2 Fold Change) between each combination of SEC stages.
Gene set enrichment analysis
The barcodes for the cells labeled LE 4 in the predysplastic epithelial subset and the complete LE population in the normal epithelial subset were extracted from the metadata. Then, the barcode list was filtered to only contain cells that were also included in the fully integrated epithelial subset. Seurat::FindMarkers() was used to identify differentially expressed genes in the predysplastic LE 4 population to the normal LE population with logfc.threshold = 0. The resulting list was reordered (ranked) from largest to smallest based on the average Log2 Fold Change column. The ranked list was analyzed using the fgsea::fgsea (v1.20.0; https://github.com/alserglab/fgsea, last accessed 06/08/2025).
The gene lists used for analysis were downloaded as .gmt files from gsea-msigdx.org (last accessed 06/08/2025). They were loaded into the R session using fgsea::gmtPathways. The following mouse-specific collections were utilized: the MH:hallmark gene set (mh.all.v2025.1.Mm.symbols.gmt), the Reactome subset of CP from M2:curated gene sets (m2.cp.reactome.v2025.1.Mm.symbols.gmt), the M3:regulatory target gene sets (m3.all.v2025.1.Mm.symbols.gmt), and the BP (m5.go.bp.v2025.1.Mm.symbols.gmt) and MF (m5.go.mf.v2025.1.Mm.symbols.gmt) subsets of GO, and MPT:tumor phenotype ontology (m5.mpt.v2025.1.Mm.symbols.gmt) from the M5:ontology gene sets. The resulting enriched pathways were filtered to only include those with an adjusted p-value less than 0.05. Enrichment plots were made for pathways of interest (fgsea::plotEnrichment).
Visium spatial RNA sequencing sample preparation
Mouse uteri from Doxycyline induced Pax8-rtTA Tre-Cre Trp53loxP/loxPRb1loxP/loxP Ai9 experimental mice and control littermates were dissected and frozen in O.C.T. (Tissue Tek O.C.T. Compound, VWR, 25608-930) filled molds (Disposable Based Molds, 15 x 15 mm, VWR, 60872-488) on a metal block chilled in dry ice for 10 minutes. Molds were wrapped in aluminum foil, placed and sealed in zip lock plastic bags and stored at -80°C (10xGenomics, protocol CG000240RevD). Embedded samples, and all materials for cryosectioning were equilibrated in a cryostat (Leica CM1950) chamber temperature -21°C (object head, -16°C) for 30 minutes. Transverse uterus sections of 10µm thickness were placed on chilled Visium Tissue Optimization Slides (10xGenomics, 1000191 Slide Kit), and Visium Spatial Gene Expression Slides (10xGenomics, 1000187 Slide Kit). Tissues processed on the Visium Tissue Optimization Slides (10xGenomics, protocol CG000238RevD) indicated the optimal uterine tissue permeabilization time was 12 minutes. For tissue optimization experiments Tiff fluorescent images (Microscope Olympus EX51, Camera Olympus XM10) were captured with a TRITC filter cube, 2x objective, and 700-ms exposure time. Uterine tissues on Visium Spatial Gene Expression Slides were processed for Hematoxylin & Eosin (H&E) staining (10xGenomics, protocol CG000160RevC). Immediately after staining, digital brightfield histology images were taken at 40x (Leica BioSystems, Aperio ScanScope CS2). Without delay cDNA Synthesis, Second Strand Synthesis & Denaturation, and cDNA Amplification was performed (10xGenomics, protocol CG000239RevE). Samples were frozen at -20°C and transferred to Cornell Genomics Facility for Quality Control (QC), Gene Expression Library Construction and sequencing.
The Illumina Sequencing parameters for all Visium samples were as follows, visium library preparations were run 2 times, they were re-balanced and re-pooled in between the 2 runs. The estimated dPCR was 2.5 nMol. Samples were run on a NextSeq2K P2 100bp flowell instrument, sequencing Kit NextSeq 2K P2 100 bp, read length 28+10+10+90, sequencing primer type TruSeq Compatible DNA/RNA, barcode type Dual barcode i7 and i5.
Visium Spatial RNA sequencing Alignment
Raw FASTQ files and histology images (Brightfield,16-bit Jpeg, 2025 x 2074 pixels, 16 µm, hourglass dots of fiducial frame in upper left corner) were processed by SpaceRanger (v2.1.1, 10X Genomics). For samples S01 through S12, LoupeBrowser (V6.2.0, 10X Genomics) was used to manually select relevant barcodes for samples that were mounted to an acquisition area, and the provided JSON file of barcodes utilized with the loupe-alignment argument in the SpaceRanger counts function. The raw data from our Visium samples were aligned to our custom reference genome of mm39+TdTomato, the same as our single-cell RNA sequencing samples.
Spatial RNA sequencing preprocessing
Aligned files were opened into LoupeBrowser (V7.0.1, 10X Genomics), where clustering occurred. Low quality spots were identified using a UMI count cutoff of 1000 and Feature count cutoff of 500, these same spots were removed, and the samples were reclustered. All further analysis was performed in LoupeBrowser with log normalized expression values for each individual sample.
Identification of genes of interest in Visium samples
Differentially expressed genes in mouse scRNA-seq data were screened in Visium samples for visual signs of differential expression in the uterus of SEC samples compared to normal samples. After genes of interest were identified, the maximum log-normalized expression values for each individual gene across all samples was identified. A 25% of maximum expression cutoff set to minimize variance across the majority of selected genes, as well as show a balanced view of changes in the number of positive spots with changes in expression level per spot.
To quantify clusters that were associated with the endometrium according to the H&E staining of individual samples, all barcodes except those in the endometrial clusters were removed. Any spot that remained positive for each individual sample after the expression cutoff was applied was counted, unless it clearly did not overlap with the tissue. This value was compared to the total number of spots within the endometrium to identify a percentage of positive spots for each gene.
For the combined signature, the scale value was set to Log Normalized and the Feature average of the Signature gene list was plotted. This displays the average Log Normalized expression of all genes in the list per barcode. The same 25% expression cutoff was applied for this analysis and quantified the same as above.
Identification of clinically relevant genes
Genes identified through mouse datasets were converted to their human homologs and entered into the KM Plotter interface to visualize differential survival based on low or high expression of the genes. In the interface, samples were stratified into high/low expression groups based on a cutoff determined by all samples (Auto Select Best cutoff: all). To capture the majority of SEC patients in the dataset, a subset of samples with high expression of CDKN2A (n = 271) were used for most comparisons. Only genes with significant differences in survival between the high and low expression groups and with a false discovery rate of lower than 60% were included in our final gene signature of seven genes. All statistical testing was performed by the KM Plotter interface.
Further testing of seven gene signatures was performed on 526 samples of the TCGA PanCancer Atlas with cBioPortal interface (https://www.cbioportal.org/, last accessed 06/08/2025). To explore Signature gene alterations in EC patients with TP53 mutations or RB1 pathway alterations, we subset this dataset to only include either SEC or EEC. Then we further stratified the dataset to only include patients with or without TP53 mutations. This list of patients was extracted and used within the “Query by Gene” function. We then selected for samples with mRNA expression data, within the mRNA expression z-scores relative to diploid samples. All cutoffs were preset by cBioPortal. Our signature genes were entered into the gene query box. Once the OncoPrint populated, we counted alterations in specific genes within the signature of the dataset in TP53 mutated samples. This was repeated with EEC cases for comparison.
For samples with RB1 alterations, we utilized the SEC or EEC specific samples and populated the gene query with a list of RB1 pathway-related genes (CCND1, CCNE1, CDK2, CDK4, CDK6, CDKN2A, E2F1,** and RB1) alongside our gene signature genes. We included all samples that displayed any alteration (mRNA over/underexpression, amplifications, or mutations classified as putative drivers) for further analysis. Then, alterations of our gene signature genes in these samples were calculated.
- Flesken-Nikitin, Andrea; Pirtz, Matalin G.; Ashe, Christopher S. et al. (2024). Dysregulation of cell state dynamics during early stages of serous endometrial carcinogenesis [Preprint]. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2024.03.15.585274
