Assessment of targeted enrichment locus capture across time and museums using odonate specimens
Data files
May 18, 2023 version files 14.44 GB
Abstract
The use of gDNA isolated from museum specimens for high throughput sequencing, especially targeted sequencing in the context of phylogenetics, is a common practice. Yet, little understanding has been focused on comparing the quality of DNA and the results of sequencing museum DNAs. Dragonflies and damselflies are ubiquitous in freshwater ecosystems and are commonly collected and preserved insects in museum collections hence their use in this study. However, the history of odonate preservation across time and museums have resulted in wide variability in the success of viable DNA extraction, necessitating an assessment of their usefulness in genetic studies. Using Anchored Hybrid Enrichment probes, we sequenced DNA from samples at two museums, 48 from the American Museum of Natural History (AMNH) in NYC, USA, and 46 from the Naturalis Biodiversity Center (RMNH) in Leiden, Netherlands ranging from global collection localities and across a 120-year time span. We recovered at least 4 loci out of an >1000 locus probe set for all samples, with the average capture being ~385 loci. Neither specimen age nor size was a good predictor of locus capture, but recapture rates differed significantly between museums. Samples from the AMNH had lower overall locus capture than the RMNH, perhaps due to differences in specimen storage over time.
Methods
For taxon sampling, damselflies and dragonflies from the RMNH and AMNH were selected with an emphasis on having a breadth of sizes, families, and ages. We initially selected samples that ranged in age from 2001 (~20 years old) to 1909 (~112 years old). We chose 94 specimens; 64 were Anisoptera and 30 were Zygoptera, from 48 AMNH and 46 RMNH. Genomic DNA was isolated and sent to RAPID Genomics (Gainesville Florida) for library preparation and sequencing using Anchored Hybrid Enrichment probes. We trimmed adapters from raw reads for each sample using fastp and checked quality using multiQC. Following trimming, we assembled and assigned orthology to each targeted capture locus. Following assembly, we screened each locus for orthology by ensuring that the locus did not have BLAST hits to multiple places in the genome and, by ensuring the best reciprocal hits between the reference and the query sequence. We generated a multiple sequence alignment and concatenated the alignments using FASconCAT (Kück and Meusemann 2010) and generated an optimal partitioning scheme using relaxed clustering with the model fixed to GTR+G for each subset in IQtree v.2.1.3 (Minh et al. 2020). We selected a model for each subset in the partitioning scheme using ModelFinder and estimated a maximum likelihood tree with 1,000 ultrafast bootstrap replicates in IQtree.
Usage notes
IQ-Tree v.2.1.3 (Data matrix - fasta file)
UNIX/Command line or a Text Editor for viewing (fastq files - raw data)
FigTree (Tree file - .treefile)
BBEdit (Partition files - Nexus)