Conservation genomic ddRadseq data for Hymenocallis henryae, a federally petitioned spiderlily endemic to the Florida panhadle; SNP data of 279 individuals from 19 populations
Data files
Aug 14, 2024 version files 5.79 MB
-
populations.haps.vcf
-
populations.snps.vcf
-
README.md
Abstract
Hymenocallis henryae is a rare, charismatic spider-lily endemic to the Florida panhandle. Currently under review to determine if listing under the Endangered Species Act is warranted, this species has undescribed genetic diversity, information crucial to the listing process. We conducted field observations of 21 historic populations across the species' geographical range and performed genomic analyses of 279 individuals from 19 extant populations. Most populations had fewer than 40 individuals, while populations with >100 individuals were found exclusively on managed lands. Genetic diversity was uniformly low within populations (HE: 0.074- 0.093), with low to moderate inbreeding coefficients (FIS: 0.068-0.431). Genetic differentiation was relatively low among most populations (FST: 0–0.098), although there was statistical support for isolation by distance. In addition, we found high genetic similarity and lack of population structure across the species range. Clonal propagation through fused bulbs is a common reproductive strategy. We confirmed current threats (habitat change, residential development, fire suppression) and identified several coastal populations threatened by sea level rise. It is recommended to continue with in situ protection and management as well as establishment of ex situ living collections to preserve populations most at risk of extirpation from habitat loss and degradation.
README: Hymenocallis henryae ddRAD-Seq vcf flies
https://doi.org/10.5061/dryad.m63xsj4bb
These are haplotype and SNP vcf files derived from ddRAD-seq of 279 individuals across 19 populations of Hymenocallis henryae, Henry's spiderlily, in the Amaryllidaeaes. Lamiaceae. The final dataset that we used for downstream analysis included 838 loci and 687 variable sites that were each present in at least 60% of individuals.
Description of the data and file structure
populations.haps.vcf: Individual genotypes by haplotype organized by chromosome number (ddRadseq sequenced fragment). Individuals are labeled by population and individual number. For example, Hh.31.10 is Population 31, individual 10. Some populations had multiple individuals from the same "clump". In this case, multiple samples from the same clump are given a letter designation to distinguish them, e.g. Hh11.8a and Hh11.8b.
populations.snps.vcf: Individual genotypes by SNP. Individuals are labeled by populution and individual number. For example, Hh.31.10 is Population 31, individual 10. Some populations had multiple individuals from the same "clump". In this case, multiple samples from the same clump are given a letter designation to distinguish them, e.g. Hh11.8a and Hh11.8b. In general, samples from the same clump are genetic clones.
Methods
DNA Isolation and Sequencing
Total genomic DNA was extracted from 279 selected samples. Briefly, frozen tissue was ground using a mortar and pestle in an extraction buffer containing 100mM Tris, pH 8, 50 mM EDTA, 500 mM NaCl, and 0.1% W:V PVP-40. DNA was precipitated with 5M potassium acetate followed by an isopropanol wash and resuspension in Tris-EDTA (TE) buffer. DNA quantity was assessed using Thermo Fisher Scientific Qubit 4.0. DNA quality was assessed on a 1% agarose gel, and purified DNA was stored at -80 °C.
Samples that displayed adequate quality and reached a minimum DNA concentration of 20 ng/ul were then sent to Floragenex (Floragenex, Inc, 4640 SW Macadam Ave, Portland, OR), where double‐digest restriction site associated DNA sequencing (ddRAD-Seq) was carried out. To summarize, DNA was first digested using the restriction endonucleases PstI and MseI. Samples were diluted for PCR amplification and the product was used to construct a ddRAD‐Seq library. The library was sequenced at the University of Oregon Genomics and Cell Characterization Core Facility (GC3F) on a NovaSeq 6000 with a SP100 chip, generating 118 bp single end reads with a mean 27.5x effective coverage per sample. The sequence data was run through the pipeline STACKS (version 2.60) to assemble the short‐read sequences from all the samples (via the process radtags program), and to align reads into loci that are genotyped (via the gstacks program). Single nucleotide polymorphism data was exported in VCF version 4.2 file format for downstream data analysis. Three quality cut‐off filters were applied allowing for genotypes present in 40%, 60%, or 80% of individuals. We used a dataset in which each locus was represented in at least 60% of individuals; datasets with less missing data (found in 80% of individuals) resulted in a loss of informative loci.
Genetic Diversity and Clonality Assessment
To assess within‐population genetic diversity, we calculated heterozygosity and inbreeding coefficients for each population using the R package hierf‐stat. To assess genetic differentiation between populations, we calculated pairwise FST for populations using the package StaMPP. To investigate isolation by distance, we ran a Mantel test for a significant relationship between pairwise FST and geographic distance between populations using the package ade4. We estimated ancestry coefficients for individuals via an sNMF analysis using the package LEA and performed a Principal component analysis (PCA) with the function ‘dudi.pca’ found in the ade4 package. Clonality was tested usingthe function ‘bitwise.dist’ in the package poppr.