Skip to main content
Dryad

Simulated data from: Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2

Cite this dataset

Nip, Ka Ming (2022). Simulated data from: Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2 [Dataset]. Dryad. https://doi.org/10.5061/dryad.cc2fqz68w

Abstract

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce RNA-Bloom2, a reference-free assembly method for long-read transcriptome sequencing data. RNA-Bloom2 is available on GitHub at: https://github.com/bcgsc/RNA-Bloom.

We benchmarked the assembly quality and the computational performance of RNA-Bloom2 on simulated data. We prepared two mouse simulated datasets with Trans-NanoSim for the cDNA and dRNA sequencing protocols model on experimental ONT data. The datasets were simulated based on the mouse ENSEMBL annotation for GRCm39. To investigate the effect of sequencing depth, we subsampled each dataset to 2, 10, and 18 million reads, resulting in a total of six sets of reads for our benchmarking experiments. Using the simulated data, we showed that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods.

Methods

The reads were simulated with Trans-NanoSim version 3.1.0: https://github.com/bcgsc/NanoSim

Usage notes

Decompress the tarballs:

tar -zcf mouse_cdna.tar.gz
tar -zcf mouse_drna.tar.gz

Extract the sample read files:

cd mouse_cdna
bash extract_reads.sh

cd mouse_drna
bash extract_reads.sh

Funding

Genome Canada, Award: 243FOR

Genome British Columbia, Award: 243FOR

National Human Genome Research Institute, Award: 2R01HG007182-04A1

Natural Sciences and Engineering Research Council of Canada

Canadian Institutes of Health Research