Data from: Prevention, diagnosis, and treatment of high-throughput sequencing data pathologies

Zhou X, Rokas A

Date Published: January 23, 2014

DOI: http://dx.doi.org/10.5061/dryad.h988s

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title The high quality Illumina short read dataset used for the illustration in Figure 1
Downloaded 18 times
Description The high quality Illumina short read data was generated on Illumina HiSeq2000 platform with 100bp read length for a fungal genomic sequencing experiment. The first 7bp of each read is barcode sequence and has been removed.
Download 96.131.4.filtered.fastq.bz2 (919.8 Mb)
Details View File Details
Title Commands used for all QC tool evaluations
Downloaded 186 times
Download commands.docx (17.67 Kb)
Details View File Details
Title Simulated Illumina short read datasets used for the evaluation of adapter trimming tools
Downloaded 26 times
Description All data are 100bp paired-end Illumina reads simulated from the human genome (hg19.fa.masked) using pIRS which can simulate reads with realistic sequencing error profile and GC-bias. For paired-end reads simulation, pIRS selects randomly select insert sizes from a normal distribution with preset mean and standard deviation values. To simulate adapter contamination, the pIRS was modified so that when selected insert sizes are shorter than the preset read length (100bp in this study), adapter sequences will be concatenated at ends of genomic sequences before subsequent simulation of sequencing errors. Multiple simulations were performed with mean values of insert size ranging from 80 to 130 in order to approximate high to low levels of adapter contamination (the mean value used for each simulation was included in the file name of simulated dataset). All simulations were done with standard deviation of 10bp and 0.1x coverage.
Download simulated_reads.tar.bz2 (2.390 Gb)
Details View File Details
Title Illumina short read dataset used for the illustration in Figure 3
Downloaded 9 times
Description The Illumina short read dataset was generated for a fungal genomic sequencing experiment and likely contains cross contamination from other samples prepared in the same experiment.
Download contamination_dataset.fq.bz2 (1.392 Gb)
Details View File Details

When using this data, please cite the original publication:

Zhou X, Rokas A (2014) Prevention, diagnosis, and treatment of high-throughput sequencing data pathologies. Molecular Ecology 23(7): 1679-1700. http://dx.doi.org/10.1111/mec.12680

Additionally, please cite the Dryad data package:

Zhou X, Rokas A (2014) Data from: Prevention, diagnosis, and treatment of high-throughput sequencing data pathologies. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.h988s
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: