Processed data objects for snPATHO-seq paper

Wang, Taopeng 1 ; Roach, Michael2 ; Martelotto, Luciano2 ; Swarbrick, Alexander1

Published Oct 09, 2024 on Dryad. https://doi.org/10.5061/dryad.7m0cfxq4s

Abstract

Formalin-fixed paraffin-embedded (FFPE) samples are valuable but underutilized in single-cell omics research due to their low DNA and RNA quality. In this study, leveraging a recent advance in single-cell genomic technology, we introduce snPATHO-seq, a versatile method to derive high-quality single-nucleus transcriptomic data from FFPE samples. We benchmarked the performance of the snPATHO-seq workflow against existing 10x 3’ and Flex assays designed for frozen or fresh samples and highlighted the consistency in snRNA-seq data produced by all workflows. The snPATHO-seq workflow also demonstrated high robustness when tested across a wide range of normal and diseased FFPE tissue samples. When combined with FFPE spatial transcriptomic technologies such as FFPE Visium, the snPATHO-seq provides a multi-modal sampling approach for FFPE samples, allowing more comprehensive transcriptomic characterization.

This repository contains the processed snRNA-seq (as Seurat V4 objects) and Visium (as STutility objects) objects used to generate the visualization in the snPATHO-seq manuscript (preprint: https://www.biorxiv.org/content/10.1101/2023.12.07.570700v1).

Overview of data processing

For snRNA-seq data, cellranger outputs were first filtered using the CellBender package v0.2.0 to remove ambient background and identify cells/nuclei before being processed into a Seurat object (Seurat v4.3.0.9002). Low-quality cells/nuclei were defined as cells/nuclei with less than 200 UMIs, over 8000 UMIs, or over 10% mitochondrial gene products. Additional doublets were identified using the DoubletFinder package (v2.0.3) with default parameters. For samples processed using different snRNA-seq methods, the data was integrated using the Seurat CCA method with default parameters and annotated manually to derive a common cell type representation that can be applied to all datasets.

For Visium data, spaceranger outputs were formatted into STutility (v1.1.1, https://github.com/jbergenstrahle/STUtility) objects for processing and visualization.

A detailed description of the methods used for data generation and processing can also be found in the manuscript (paper in preparation, preprint: https://www.biorxiv.org/content/10.1101/2023.12.07.570700v1).

Description of the data and file structure

File: Dryad metadata.xlsx

Description: Summary of all snRNA-seq and Visium spatial transcriptomics datasets generated in this study.

File: PBMC_integrated_annotation_modifed_by_subclustering.rds

Description: Seurat object used for Figure 1 and Supplementary Figure 2 (manuscript in preparation). Data was collected from two technical replicates using samples from the same donor. The 10x 3' and Flex methods were used for data collection.

Key metadata columns:

sample_id: dataset IDs (per snRNA-seq method, per technical replicate).
new_annotation: Cell type annotations.

Files: [sample_id]_integrated_seuarat_object.rds

Files following this naming structure:

4399_integrated_seuarat_object.rds
4411_integrated_seuarat_object.rds
4066_integrated_seuarat_object.rds

Description: Seurat object used for Figures 2 and 3 and Supplementary Figures 4, 6, 7, 8, and 9 (manuscript in preparation). The 10x 3', Flex, and snPATHO-seq methods were used for data collection.

Key metadata columns:

sample_id: dataset IDs (per sample, per processing method).
processing_method: snRNA-seq methods used to collect the data.
major_annotation: Cell type annotations.

File: [sample_id]_Visium.rds

Files following this naming structure:

4399FFPE_Visium.rds
4411FFPE_Visium.rds
4066FFPE_Visium.rds

Description: STutility object used for Figure 3 and Supplementary Figure 7 (manuscript in preparation). The 10x Visium FFPE CytAssist workflow was used for data collection.

Files: [sample_id]_annotated.rds

Files following this naming structure:

melanoma_7167A_annotated.rds
lungcancer_20011329LC_annotated.rds
Ovary_230303_annotated.rds
liver_8754A_annotated.rds
Lung_20011329NL_annotated.rds
Kidney_1305272B_annotated.rds
Kidney_FHIL_annotated.rds
Endometrium_220952_annotated.rds
Glioblastoma_1773A_annotated.rds
EndocervicalAdenocarcinoma_6707_annotated.rds
ColonCRC_1328A_annotated.rds
Colon_213641_annotated.rds

Description: Seurat object used for Figure 4 and Supplementary Figure 9 (manuscript in preparation). The 10x scFFPE and snPATHO-seq methods were used for data collection.

Key metadata columns:

sample_id: Sample ID as documented in the metadata file.
dataset_id: dataset IDs (per sample, per processing method).
processing_method: snRNA-seq methods used to collect the data.
initial_annotation: Cell type annotations.

Code/software

For the Seurat package, please see https://satijalab.org/seurat/ for more details. Seurat v4.3.0.9002 was used to generate the snRNA-seq data objects.

For the STutility package, please see https://github.com/jbergenstrahle/STUtility for more details. STutility v1.1.1 was used to generate the Visium data objects.

For custom scripts used to process the data and generate the visualization in the manuscript, please see https://github.com/TaopengWang/snPATHO-seq_public for more details.## Access information

Data was derived from the following sources:

Gene Expression Omnibus (GEO) accession code: GSE268426 (snRNA-seq data) and GSE268427 (Visium data).

Sharing/Access information

To access the source data, including the fastq/bam files and the processed cellranger/spaceranger files, please access the Gene Expression Omnibus (GEO) under the accession codes GSE268426 (snRNA-seq) and GSE268427 (Visium).

Code/Software

Scripts used to process the data and generate the illustrations included in the manuscript can be accessed through:
https://github.com/TaopengWang/snPATHO-seq_public

Processed data objects for snPATHO-seq paper

Data files

Abstract

README: Processed data objects for snPATHO-seq paper

Overview of data processing

Description of the data and file structure

File: Dryad metadata.xlsx

File: PBMC_integrated_annotation_modifed_by_subclustering.rds

Files: [sample_id]_integrated_seuarat_object.rds

File: [sample_id]_Visium.rds

Files: [sample_id]_annotated.rds

Code/software

Sharing/Access information

Code/Software

Methods