Data from: Minimally destructive hDNA extraction method for retrospective genetics of pinned historical Lepidoptera specimens
Data files
May 22, 2024 version files 226.32 MB
-
butterfly_v1.asm.bp.p_ctg_mtDNA_masked.fa.gz
-
README.md
Abstract
The millions of specimens stored in entomological collections provide a unique opportunity to study historical insect diversity. Current technologies allow to sequence entire genomes of historical specimens and estimate past genetic diversity of present-day endangered species, advancing our understanding of anthropogenic impact on genetic diversity and enabling the implementation of conservation strategies. A limiting challenge is the extraction of historical DNA (hDNA) of adequate quality for sequencing platforms. We tested four hDNA extraction protocols on five body parts of pinned false heath fritillary butterflies, Melitaea diamina, aiming to minimise specimen damage, preserve their scientific value to the collections, and maximise DNA quality and yield for whole-genome re-sequencing. We developed a very effective approach that successfully recovers hDNA appropriate for short-read sequencing from a single leg of pinned specimens using silica-based DNA extraction columns and an extraction buffer that includes SDS, Tris, Proteinase K, EDTA, NaCl, PTB, and DTT. We observed substantial variation in the ratio of nuclear to mitochondrial DNA in extractions from different tissues, indicating that optimal tissue choice depends on project aims and anticipated downstream analyses. We found that sufficient DNA for whole genome re-sequencing can reliably be extracted from a single leg, opening the possibility to monitor changes in genetic diversity maintaining the scientific value of specimens while supporting current and future conservation strategies.
Methods
The draft de novo Melitaea diamina reference genome (butterfly_v1.asm.bp.p_ctg_mtDNA_masked.fa) was based on 6.2 Gb PacBio HiFi reads and assembled and sequenced at the Functional Genomics Center Zurich, FGCZ. The genome was assembled and purged for duplicates with hifiasm (Cheng et al. 2021) and the parameters -l3 -s 0.55. The assembled genome is 805 Mb long and encompasses 3,918 contigs and has a BUSCO (v5.2.2 arthropoda_odb10; (Manni et al. 2021)value of 96.2%.
The mtDNA contigs were masked except for one contiguous mtDNA sequence to ensure consistent mapping on mtDNA.