Data from: Impact of enrichment conditions on cross-species capture of fresh and degraded DNA

Paijmans, Johanna L. A.1 2; Fickel, Joerns1 3; Courtiol, Alexandre3; Hofreiter, Michael1 2; Förster, Daniel W.3

Published Apr 28, 2015 on Dryad. https://doi.org/10.5061/dryad.cd711

Data files

Apr 28, 2015 version files 47.98 KB

convert_callsfile_to_one_line_fasta.pl

1.46 KB
felids_ancestral_29.11.2013.fasta

17.35 KB
filter_multiple_sample_gatk1.5_vcf_mtDNA_indels.pl

17.87 KB
LSE-NNE_incomplete-mitogenomes.zip

6.60 KB
Tree-files.zip

4.70 KB

Abstract

By combining high-throughput sequencing with target-enrichment (“hybridization capture”), researchers are able to obtain molecular data from genomic regions of interest for projects that are otherwise constrained by sample quality (e.g. degraded and contamination-rich samples) or a lack of a priori sequence information (e.g. studies on non-model species). Despite the use of hybridization capture in various fields of research for many years, the impact of enrichment conditions on capture success are not yet thoroughly understood. We evaluated the impact of a key parameter – hybridization temperature – on the capture success of mitochondrial genomes across the carnivoran family Felidae. Capture was carried out for a range of samples types (fresh, archival, ancient) with varying levels of sequence divergence between bait and target (i.e. across a range of species) using pools of individually indexed libraries on Agilent SureSelect arrays. Our results suggest that hybridization capture protocols require specific optimization for the sample type that is being investigated. Hybridization temperature affected the proportion of on-target sequences following capture: for degraded samples, we obtained the best results with a hybridization temperature of 65 °C, while a touchdown approach (65 °C down to 50 °C) yielded the best results for fresh samples. Evaluation of capture performance at a regional scale (sliding window approach) revealed no significant improvement in the recovery of DNA fragments with high sequence divergence from the bait at any of the tested hybridization temperatures, suggesting that hybridization temperature may not be the critical parameter for enrichment of divergent fragments.

Ancestral mitogenome reconstruction

As interspecific comparisons featured heavily in our analyses, it was important to choose a reference sequence that would limit the introduction of a bias during the mapping of sequences (“mapping bias”). Mapping algorithms become less effective when the reference is very dissimilar (Prüfer et al. 2010; Schubert et al. 2012) impacting the number of sequences identified as ‘on-target’. As some of the species in this study currently do not have a reference sequence available from GenBank (Table 1), we reconstructed an ancestral mitogenome of all Felidae (Appendix S1; Supporting Table S2) to be used as reference. To reconstruct the ancestral sequence, the most appropriate model of substitution for the alignment of the felid mitogenomes was estimated using jModeltest 2.1.4 under the Bayesian Information Criterion (Posada 2008), which was then implemented in PhyML (Guindon et al. 2010) to infer the topology of the phylogeny (as shown in Fig. 1A). Reconstruction of the ancestral sequence was performed using the webserver of Ancestors v1.1 (Diallo et al. 2007, 2010). For additional information, please refer to the manuscript and its supplements.

felids_ancestral_29.11.2013.fasta

Perl script 1

Perl script written by Kanchon Dasmahapatra (University of York; kanchon.dasmahapatra@york.ac.uk). This script uses a variant file (VCF) generated by GATK to call the individual bases per position for a haploid genome. After this script, run "convert_callsfile_to_one_line_fasta.pl" script to covert the base calls to a fasta file.

filter_multiple_sample_gatk1.5_vcf_mtDNA_indels.pl

Perl script 2

Perl script written by Kanchon Dasmahapatra (University of York; kanchon.dasmahapatra@york.ac.uk). This script uses the output from "filter_multiple_sample_gatk1.5_vcf_mtDNA_indels.pl" and writes the base calls to a fasta file.

convert_callsfile_to_one_line_fasta.pl

Tree files

Tree files for the phylogenies displayed in Supporting Figure 5. Phylogenies were generated using PhyML.

Tree-files.zip

Incomplete mitogenomes for Neofelis Nebulosa and Leptailurus serval

This zip file contains two fasta files with the incomplete mitogenomes for Neofelis Nebulosa ("NNE_Fell") and Leptailurus serval ("LSE_Togo"). These mitogenomes were too low quality for upload on GenBank: LSE_Togo is 44.5% complete and NNE_Fell is 51.0% complete. The length of stretches of N's are estimated based on a closely related individual. For more information regarding these sequences, feel free to contact the authors.

LSE-NNE_incomplete-mitogenomes.zip