Simulated data from: Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2

Nip, Ka Ming 1

Research facility: Canada's Michael Smith Genome Sciences Centre

Published Sep 07, 2022 on Dryad. https://doi.org/10.5061/dryad.cc2fqz68w

Data files

Sep 07, 2022 version files 43.07 GB

mouse_cdna.tar.gz

14.75 GB
mouse_drna.tar.gz

28.32 GB
README.txt

1.18 KB

Abstract

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce RNA-Bloom2, a reference-free assembly method for long-read transcriptome sequencing data. RNA-Bloom2 is available on GitHub at: https://github.com/bcgsc/RNA-Bloom.

We benchmarked the assembly quality and the computational performance of RNA-Bloom2 on simulated data. We prepared two mouse simulated datasets with Trans-NanoSim for the cDNA and dRNA sequencing protocols model on experimental ONT data. The datasets were simulated based on the mouse ENSEMBL annotation for GRCm39. To investigate the effect of sequencing depth, we subsampled each dataset to 2, 10, and 18 million reads, resulting in a total of six sets of reads for our benchmarking experiments. Using the simulated data, we showed that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods.

Decompress the tarballs:

tar -zcf mouse_cdna.tar.gz
tar -zcf mouse_drna.tar.gz

Extract the sample read files:

cd mouse_cdna
bash extract_reads.sh

cd mouse_drna
bash extract_reads.sh

Simulated data from: Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2

Data files

Abstract

Methods

Usage notes

Works referencing this dataset