Simulated data from: Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2
Cite this dataset
Nip, Ka Ming (2022). Simulated data from: Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2 [Dataset]. Dryad. https://doi.org/10.5061/dryad.cc2fqz68w
Abstract
Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce RNA-Bloom2, a reference-free assembly method for long-read transcriptome sequencing data. RNA-Bloom2 is available on GitHub at: https://github.com/bcgsc/RNA-Bloom.
We benchmarked the assembly quality and the computational performance of RNA-Bloom2 on simulated data. We prepared two mouse simulated datasets with Trans-NanoSim for the cDNA and dRNA sequencing protocols model on experimental ONT data. The datasets were simulated based on the mouse ENSEMBL annotation for GRCm39. To investigate the effect of sequencing depth, we subsampled each dataset to 2, 10, and 18 million reads, resulting in a total of six sets of reads for our benchmarking experiments. Using the simulated data, we showed that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods.
Methods
The reads were simulated with Trans-NanoSim version 3.1.0: https://github.com/bcgsc/NanoSim
Usage notes
Decompress the tarballs:
tar -zcf mouse_cdna.tar.gz
tar -zcf mouse_drna.tar.gz
Extract the sample read files:
cd mouse_cdna
bash extract_reads.sh
cd mouse_drna
bash extract_reads.sh
Funding
Genome Canada, Award: 243FOR
Genome British Columbia, Award: 243FOR
National Human Genome Research Institute, Award: 2R01HG007182-04A1
Natural Sciences and Engineering Research Council of Canada
Canadian Institutes of Health Research