Simulated eukaryotic genomic sequencing, long and short reads
Data files
Jan 12, 2023 version files 45.67 GB
-
a_thal_TAIR10.art_illumina1.fq.gz
3.17 GB
-
a_thal_TAIR10.art_illumina2.fq.gz
3.73 GB
-
a_thal_TAIR10.badread_nanopore.fastq.gz
12.66 GB
-
d_mel_ISO1.badread_nanopore.fasta.gz
15.92 GB
-
d_mel_ISO1.illumina1.fq.gz
3.76 GB
-
d_mel_ISO1.illumina2.fq.gz
4.44 GB
-
README.md
1.91 KB
-
s_cere_S288c.art_illumina1.fq.gz
324.81 MB
-
s_cere_S288c.art_illumina2.fq.gz
382.77 MB
-
s_cere_S288c.badread_nanopore.fastq.gz
1.29 GB
Abstract
As accuracy and throughput of nanopore sequencing improves, it is increasingly common to perform long-read-first de novo genome assemblies followed by polishing with accurate short reads (Kim et al. 2021). We briefly introduce FMLRC2, the successor to the original FM-index Long Read Corrector (FMLRC), and illustrate its performance as a fast and accurate de novo assembly polisher for both bacterial and eukaryotic genomes.
Methods
We simulated ONT data and the corresponding paired-end Illumina dataset using Badread v0.2.0 (Wick, 2019) and ART v2016-06-05 (Huang et al., 2011) as previously described (Wick and Holt, 2022). Briefly, short reads were simulated using ART with HiSeqX TruSeq preset, to 100X effective sequencing depth, 150 bp read length, 400±50(sd) bp mean fragment. Long reads were simulated using Badread with parameters “--length 20000,12000 --identity 90,98,4”.