Data from: In-solution hybridization for mammalian mitogenome enrichment: pros, cons and challenges associated with multiplexing degraded DNA
Data files
Jul 21, 2015 version files 889.49 MB
-
AllProbes_good.fasta
-
chimera_ID.py
-
combiner-2.0.py
-
Dryad_ModernSquirrelDemux.txt
-
Enrichment Assemblies.zip
-
H3FTPWM01.sff
-
H3HWW5Q01.sff
-
pero_in.454.fasta.qual
-
sequenceGetter-4.0.py
-
WMG_Alignment26Ind_FINAL01.09.14.txt
Abstract
Here, we present a set of RNA-based probes for whole mitochondrial genome in-solution enrichment, targeting a diversity of mammalian mitogenomes. This probes set was designed from seven mammalian orders and tested to determine the utility for enriching degraded DNA. We generated 63 mitogenomes representing five orders and 22 genera of mammals that yielded varying coverage ranging from 0 to >5400X. Based on a threshold of 70% mitogenome recovery and at least 10× average coverage, 32 individuals or 51% of samples were considered successful. The estimated sequence divergence of samples from the probe sequences used to construct the array ranged up to nearly 20%. Sample type was more predictive of mitogenome recovery than sample age. The proportion of reads from each individual in multiplexed enrichments was highly skewed, with each pool having one sample that yielded a majority of the reads. Recovery across each mitochondrial gene varied with most samples exhibiting regions with gaps or ambiguous sites. We estimated the ability of the probes to capture mitogenomes from a diversity of mammalian taxa not included here by performing a clustering analysis of published sequences for 100 taxa representing most mammalian orders. Our study demonstrates that a general array can be cost and time effective when there is a need to screen a modest number of individuals from a variety of taxa. We also address the practical concerns for using such a tool, with regard to pooling samples, generating high quality mitogenomes and detail a pipeline to remove chimeric molecules.