On the potential of Angiosperms353 for population genomics studies
Cite this dataset
Johnson, Matthew; Slimp, Madeline; Williams, Lindsay D.; Hale, Haley (2021). On the potential of Angiosperms353 for population genomics studies [Dataset]. Dryad. https://doi.org/10.5061/dryad.76hdr7sv3
Premise of the Study: Targeted sequencing using Angiosperms353 has emerged as a low-cost tool for phylogenetics, with early results spanning from all flowering plants to within genera. The use of universal markers at narrower scales—within populations— would eliminate the need for specific marker development while retaining the benefits of full-gene sequences. However, whether the Angiosperms353 markers provide sufficient variation within species to calculate demographic parameters is untested.
Methods: Using herbarium specimens from a 50-year-old floristic survey of Guadalupe Mountains National Park, we sequenced 95 samples from 24 species using Angiosperms353. We adapted a data workflow to process targeted sequencing data that calls variants within each species and prepares data for population genetic analysis. We calculated genetic diversity using standard metrics (e.g. heterozygosity, pi).
Key Results: Angiosperms353 gene recovery was associated with genomic library concentration, with limited phylogenetic bias. We identified over 1000 segregating variants with zero missing data within 22 of 24 species. A subset of these variants were filtered to remove linked SNPs, revealing high heterozygosity in many species. Pairwise nucleotide diversity (pi) was typically between 0.002 and 0.010, with much of the variation in noncoding regions flanking the targeted sequences.
Conclusions: Despite sequencing few individuals per species, the Angiosperms353 markers contained sufficient variation to calculate demographic parameters. Larger sampling within species will allow for estimating gene flow and population dynamics in any angiosperm. Our study will benefit conservation genetics, where Angiosperms353 provides universal repeatable markers, low missing data, and haplotype information and the use of herbarium specimens.
Target capture sequencing using Angiosperms353. Sequenced assembled using HybPiper to identify the longest supercontig sequence per species. Reads for each species aligned to these reference sequences using bwa and variants called with the GATK Exome Best Practices Pipeline.