Single cell Iso-Sequencing enables rapid genome annotation for scRNAseq analysis
Data files
Feb 28, 2022 version files 652.66 MB
-
README.txt
2.07 KB
-
Supplementalfile1_CellRangerOutputInformation.xlsx
16.68 KB
-
Supplementalfile2_Annotation1_scISOrSeq.gtf
37.17 MB
-
Supplementalfile3_Annotation2_Bulk_Iso_Seq.gtf
40.26 MB
-
Supplementalfile4_Annotation3_Pooled_Iso_Seqs_merged_with_ensembl.gtf
162.19 MB
-
Supplementalfile5_Annotation4_Pooled_Iso_Seqs_merged_with_ensembl_for_cellranger.gtf
162.19 MB
-
Supplementalfile6_IsoSeq_Processing_Data.xlsx
21.70 KB
-
Supplementalfile7_EnsemblModified_manual_changes.xlsx
18.21 KB
-
Supplementalfile8_Annotation5_ensembl_modified.gtf
250.79 MB
Abstract
Single cell RNA sequencing (scRNAseq) is a powerful technique that continues to expand across various biological applications. However, incomplete 3’ UTR annotations can impede single cell analysis resulting in genes that are partially or completely uncounted. Performing scRNAseq with incomplete 3’ UTR annotations can hinder the identification of cell identities and gene expression patterns and lead to erroneous biological inferences. We demonstrate that performing single cell isoform sequencing (ScISOr-Seq) in tandem with scRNAseq can rapidly improve 3' UTR annotations. Using threespine stickleback fish (Gasterosteus aculeatus), we show that gene models resulting from a minimal embryonic ScISOr-Seq dataset retained 26.1% greater scRNAseq reads than gene models from Ensembl alone. Furthermore, pooling our ScISOr-Seq isoforms with a previously published adult bulk Iso-Seq dataset from stickleback, and merging the annotation with the Ensembl gene models, resulted in a marginal improvement (+0.8%) over the ScISOr-Seq only dataset. In addition, isoforms identified by ScISOr-Seq included thousands of new splicing variants. The improved gene models obtained using ScISOr-Seq lead to successful identification of cell types and increased the reads identified of many genes in our scRNAseq stickleback dataset. Our work illuminates ScISOr-Seq as a cost-effective and efficient mechanism to rapidly annotate genomes for scRNAseq.
This dataset originates from an experiment where 70hpf embryos were dissociated into single cells then captured by the 10X Single Cell Genomics 3' Genome Expression mRNA-Seq prep with v3.1 NextGem chemistry. After capture, the library was split into two libraries, one was sequenced using illumina's NovaSeq 6000 and the other was sequenced by Pacbio Sequel 2. The single cell ISO sequencing (ScISOrSeq) was processed using PacBio's SMRT Analysis software, custom scripts (https://github.com/hopehealey/scISOseq_processing), cDNA cupcake, and SQANTI3. Additional sequencing data (from Naftaly, Pau, and White 2021) was also processed with PacBio's SMRT Analysis software, cDNA cupcake, and SQANTI3. The produced annotation file was merged with the stickleback annotation (BROAD S1: 104.1 database version, downloaded from Ensembl) using TAMA. The new annotations were tested with Cell Ranger to see how well they captured the generated stickleback scRNAseq reads.
We used raw sequncing data from Naftaly, Pau, and White (2021)'s paper to create the Supplementalfile3_Annotation2_Bulk_Iso_Seq, Supplementalfile4_Annotation3_Pooled_Iso_Seqs_merged_with_ensembl.gtf, and Supplementalfile5_Annotation4_Pooled_Iso_Seqs_merged_with_ensembl_for_cellranger.gtf annotations. Information on accessing the sequencing data from our experiment can be found in our manuscript (Single cell Iso-Sequencing enables rapid genome annotation for scRNAseq analysis in Genetics).