Data from: Targeted genome-wide SNP genotyping in feral horses using non-invasive fecal swabs
Gavriliuc, Stefan et al. (2022), Data from: Targeted genome-wide SNP genotyping in feral horses using non-invasive fecal swabs, Dryad, Dataset, https://doi.org/10.5061/dryad.0vt4b8h1n
The development of high-throughput sequencing has prompted a transition in wildlife genetics from using microsatellites toward sets of Single Nucleotide Polymorphisms (SNPs). However, genotyping large numbers of targeted SNPs using non-invasive samples remains challenging due to relatively large DNA input requirements. Recently, target enrichment has emerged as a promising approach requiring little template DNA. We assessed the efficacy of Tecan Genomics’ Allegro Targeted Genotyping (ATG) for generating genome-wide SNP data in feral horses using DNA isolated from fecal swabs. Total and host-specific DNA were quantified for 989 samples collected as part of a long-term individual-based study of feral horses on Sable Island, Nova Scotia, Canada, using dsDNA fluorescence and a host-specific qPCR assay, respectively. Forty-eight samples representing 44 individuals containing at least 10ng of host DNA (ATG’s recommended minimum input) were genotyped using a custom multiplex panel targeting 279 SNPs. Genotyping accuracy and consistency were assessed by contrasting ATG genotypes with those obtained from the same individuals with SNP microarrays, and from multiple samples from the same horse, respectively. 62% of swabs yielded the minimum recommended amount of host DNA for ATG. Ignoring samples that failed to amplify, ATG recovered an average of 86.7% targeted sites per sample, while genotype concordance between ATG and SNP microarrays was 98.5%. The repeatability of genotypes from the same individual approached unity with an average of 99.9%. This study demonstrates the suitability of ATG for genome-wide, non-invasive targeted SNP genotyping, and will facilitate further ecological and conservation genetics research in equids and related species.
Targeted SNP sequencing panel design
To design the panel for ATG, 300 SNPs present on the Illumina Equine GGP65 Plus array and shown to be polymorphic in the Sable Island population were selected. SNPs with minor allele frequency (MAF) > 0.30 and exhibiting limited linkage disequilibrium as determined by the PLINK --indep-pairwise command with a window size of 50, a step size of 5 and a variance inflation factor of 0.5 (Purcell et al., 2007) were selected. An assay was then developed by Tecan Genomics (Redwood City, United States) covering 279 of the 300 originally submitted SNPs (Appendix 1). In this assay, 237 targets were covered by 2 probes while 42 were covered by a single probe.
Fecal DNA samples collection
Horse DNA was collected by swabbing the mucus layer surrounding freshly deposited feces using a polyester swab attached to a 5mL vial (SIMPORT T307-5A). Vials were preloaded with 400µl of AquastoolTM solution (MultiTarget Pharmaceuticals) and kept in insulated bags containing icepacks after collection in the field and transferred to -20°C when returning to the laboratory on the same day. Samples were transported by air to the mainland (frozen) at the end of each field season and archived at -80°C until DNA extraction.
Fecal DNA extraction
DNA was isolated using a modified version of the AquastoolTM Solution recommended protocol (MultiTarget Pharmaceuticals). First, thawed swab vials were vortexed at full speed for 1 minute, and 200µl of homogenized solution was transferred to a 1.5ml microfuge tube. Samples were then incubated at room temperature for 15 minutes, vortexed for 60 seconds, and centrifuged at full speed on a microcentrifuge (14,000 rpm) for 5 minutes to pellet debris. The clear supernatant (~200µl) was transferred to a 1.5ml tube pre-loaded with 160µl of isopropanol and vortexed for 10 seconds. Tubes where then centrifuged at full speed for 5 minutes, and the supernatant removed and discarded. DNA pellets were then rinsed twice with 70% ethanol before being air dried and resuspended in 60µl of molecular grade water. Once DNA pellets dissolved, samples were centrifuged at full speed for 5 minutes to pellet contaminants. Clear supernatant containing DNA were transferred to new cryotubes, and archived at -80°C.
Fecal DNA quantification
We quantified total (host + exogenous) DNA concentration in samples using 2µl of template DNA and a Qubit 4 or BioTek Synergy LX Multi-Mode Microplate Reader with a Qubit or Quant-It dsDNA High-Sensitivity or Broad Range Assay Kit (Thermo Fisher Scientific) according to manufacturer protocols. To assess how much of the total DNA was attributable to host, we applied a qPCR approach targeting the single copy nuclear F2 gene using equine-specific primers known to be effective across horse breeds (Forward: 5’-GCCAGCAGGCTGAGAACG-3’, Reverse: 5’-TGGTGCAGTTGATTCTGGAATAGGAAATTT-3’; Floren et al., 2015) and horse DNA extracted from muscle tissue as a standard (10x dilution series: 20ng/μl - 0.0002ng/μl). Samples, standards, and negative controls were run in duplicate using a Bio-Rad CFX96 qPCR System, with each reaction containing 2μl of template, 10.0μl of 2xSEnsiFAST SYBR MIX, 0.8μl of 10μM forward primer, 0.8μl of 10μM reverse primer, and 6.4μl molecular grade water. Thermocycling conditions consisted of 95°C for 3 minutes (for polymerase activation) followed by 40 amplification cycles (95°C for 5 seconds, 60°C for 10 seconds, and 72°C for 10 seconds).
The presented data are used in our pilot study of Allegro Targeted Genotyping and consists of paired, raw sequence reads in .fastq format. Code to call variants is available at https://github.com/sgavril/Nugen-Pilot-Analysis