Processed data objects for snPATHO-seq paper
Data files
Oct 09, 2024 version files 5.32 GB
-
4066_integrated_seuarat_object.rds
521.54 MB
-
4066FFPE_Visium.rds
121.83 MB
-
4399_integrated_seuarat_object.rds
663.22 MB
-
4399FFPE_Visium.rds
102.21 MB
-
4411_integrated_seuarat_object.rds
839.62 MB
-
4411FFPE_Visium.rds
88.21 MB
-
Colon_213641_annotated.rds
138.52 MB
-
ColonCRC_1328A_annotated.rds
62.66 MB
-
Dryad_metadata.xlsx
34.76 KB
-
EndocervicalAdenocarcinoma_6707_annotated.rds
83.31 MB
-
Endometrium_220952_annotated.rds
140.11 MB
-
Glioblastoma_1773A_annotated.rds
232.56 MB
-
Kidney_1305272B_annotated.rds
110.57 MB
-
Kidney_FHIL_annotated.rds
485.07 MB
-
liver_8754A_annotated.rds
275.73 MB
-
Lung_20011329NL_annotated.rds
159.45 MB
-
lungcancer_20011329LC_annotated.rds
172.18 MB
-
melanoma_7167A_annotated.rds
401.64 MB
-
Ovary_230303_annotated.rds
51.47 MB
-
PBMC_integrated_annotation_modifed_by_subclustering.rds
671.93 MB
-
README.md
5.42 KB
Abstract
Formalin-fixed paraffin-embedded (FFPE) samples are valuable but underutilized in single-cell omics research due to their low DNA and RNA quality. In this study, leveraging a recent advance in single-cell genomic technology, we introduce snPATHO-seq, a versatile method to derive high-quality single-nucleus transcriptomic data from FFPE samples. We benchmarked the performance of the snPATHO-seq workflow against existing 10x 3’ and Flex assays designed for frozen or fresh samples and highlighted the consistency in snRNA-seq data produced by all workflows. The snPATHO-seq workflow also demonstrated high robustness when tested across a wide range of normal and diseased FFPE tissue samples. When combined with FFPE spatial transcriptomic technologies such as FFPE Visium, the snPATHO-seq provides a multi-modal sampling approach for FFPE samples, allowing more comprehensive transcriptomic characterization.
This repository contains the processed snRNA-seq (as Seurat V4 objects) and Visium (as STutility objects) objects used to generate the visualization in the snPATHO-seq manuscript (preprint: https://www.biorxiv.org/content/10.1101/2023.12.07.570700v1).
Overview of data processing
For snRNA-seq data, cellranger outputs were first filtered using the CellBender package v0.2.0 to remove ambient background and identify cells/nuclei before being processed into a Seurat object (Seurat v4.3.0.9002). Low-quality cells/nuclei were defined as cells/nuclei with less than 200 UMIs, over 8000 UMIs, or over 10% mitochondrial gene products. Additional doublets were identified using the DoubletFinder package (v2.0.3) with default parameters. For samples processed using different snRNA-seq methods, the data was integrated using the Seurat CCA method with default parameters and annotated manually to derive a common cell type representation that can be applied to all datasets.
For Visium data, spaceranger outputs were formatted into STutility (v1.1.1, https://github.com/jbergenstrahle/STUtility) objects for processing and visualization.
A detailed description of the methods used for data generation and processing can also be found in the manuscript (paper in preparation, preprint: https://www.biorxiv.org/content/10.1101/2023.12.07.570700v1).
Description of the data and file structure
File: Dryad metadata.xlsx
Description: Summary of all snRNA-seq and Visium spatial transcriptomics datasets generated in this study.
File: PBMC_integrated_annotation_modifed_by_subclustering.rds
Description: Seurat object used for Figure 1 and Supplementary Figure 2 (manuscript in preparation). Data was collected from two technical replicates using samples from the same donor. The 10x 3’ and Flex methods were used for data collection.
Key metadata columns:
- sample_id: dataset IDs (per snRNA-seq method, per technical replicate).
- new_annotation: Cell type annotations.
Files: [sample_id]_integrated_seuarat_object.rds
Files following this naming structure:
- 4399_integrated_seuarat_object.rds
- 4411_integrated_seuarat_object.rds
- 4066_integrated_seuarat_object.rds
Description: Seurat object used for Figures 2 and 3 and Supplementary Figures 4, 6, 7, 8, and 9 (manuscript in preparation). The 10x 3’, Flex, and snPATHO-seq methods were used for data collection.
Key metadata columns:
- sample_id: dataset IDs (per sample, per processing method).
- processing_method: snRNA-seq methods used to collect the data.
- major_annotation: Cell type annotations.
File: [sample_id]_Visium.rds
Files following this naming structure:
- 4399FFPE_Visium.rds
- 4411FFPE_Visium.rds
- 4066FFPE_Visium.rds
Description: STutility object used for Figure 3 and Supplementary Figure 7 (manuscript in preparation). The 10x Visium FFPE CytAssist workflow was used for data collection.
Files: [sample_id]_annotated.rds
Files following this naming structure:
- melanoma_7167A_annotated.rds
- lungcancer_20011329LC_annotated.rds
- Ovary_230303_annotated.rds
- liver_8754A_annotated.rds
- Lung_20011329NL_annotated.rds
- Kidney_1305272B_annotated.rds
- Kidney_FHIL_annotated.rds
- Endometrium_220952_annotated.rds
- Glioblastoma_1773A_annotated.rds
- EndocervicalAdenocarcinoma_6707_annotated.rds
- ColonCRC_1328A_annotated.rds
- Colon_213641_annotated.rds
Description: Seurat object used for Figure 4 and Supplementary Figure 9 (manuscript in preparation). The 10x scFFPE and snPATHO-seq methods were used for data collection.
Key metadata columns:
- sample_id: Sample ID as documented in the metadata file.
- dataset_id: dataset IDs (per sample, per processing method).
- processing_method: snRNA-seq methods used to collect the data.
- initial_annotation: Cell type annotations.
Code/software
For the Seurat package, please see https://satijalab.org/seurat/ for more details. Seurat v4.3.0.9002 was used to generate the snRNA-seq data objects.
For the STutility package, please see https://github.com/jbergenstrahle/STUtility for more details. STutility v1.1.1 was used to generate the Visium data objects.
For custom scripts used to process the data and generate the visualization in the manuscript, please see https://github.com/TaopengWang/snPATHO-seq_public for more details.## Access information
Data was derived from the following sources:
- Gene Expression Omnibus (GEO) accession code: GSE268426 (snRNA-seq data) and GSE268427 (Visium data).
Sharing/Access information
To access the source data, including the fastq/bam files and the processed cellranger/spaceranger files, please access the Gene Expression Omnibus (GEO) under the accession codes GSE268426 (snRNA-seq) and GSE268427 (Visium).
Code/Software
Scripts used to process the data and generate the illustrations included in the manuscript can be accessed through:
https://github.com/TaopengWang/snPATHO-seq_public
Frozen PBMC samples were thawed and processed directly for 10x 3' and Flex chemistry.
For frozen breast cancer tissue samples, samples were dissociated into single-nucleus suspension before being processed for 10x 3' and Flex chemistry for gene expression analysis.
For FFPE tissue samples, nuclei were extracted using either the snPATHO-seq protocol or the 10x scFFPE protocol followed by 10x Flex chemistry processing for gene expression detection.
FFPE Visium data was generated using the 10x FFPE Visium CytAssist workflow according to the manufacturer's recommendation.