Probabilistic genetic identification of wild boar hybridization to support control of invasive wild pigs (Sus scrofa)
Data files
Dec 20, 2023 version files 65.47 MB
-
03_1c_gscramble_deltaLikelihood_results.csv
-
All_DeltaLikelihoodGenotypes_MetadataFile.csv
-
assignPop_ranking_InformativeLoci_1k_iterations_K5_referencePops.csv
-
K5_1421RefSet_6566FeSw_435AltWePi.bed
-
K5_1421RefSet_6566FeSw_435AltWePi.bim
-
K5_1421RefSet_6566FeSw_435AltWePi.fam
-
README.md
Jan 02, 2024 version files 65.49 MB
-
03_1c_gscramble_deltaLikelihood_results.csv
-
All_DeltaLikelihoodGenotypes_MetadataFile.csv
-
assignPop_ranking_InformativeLoci_1k_iterations_K5_referencePops.csv
-
K5_1421RefSet_6566FeSw_435AltWePi.bed
-
K5_1421RefSet_6566FeSw_435AltWePi.bim
-
K5_1421RefSet_6566FeSw_435AltWePi.fam
-
README.md
Abstract
The rapid expansion of wild pigs (Sus scrofa) throughout the United States (US) has been fueled by unlawful introductions, with invasive populations causing extensive crop losses, damaging native ecosystems, and serving as a reservoir for disease. Multiple states have passed laws prohibiting the possession or transport of wild pigs. However, genetic and phenotypic similarities between domestic pigs and invasive wild pigs – which overwhelmingly represent domestic pig-wild boar hybrids – pose a challenge for the enforcement of such regulations. We sought to exploit wild boar ancestry as a common attribute among the vast majority of invasive wild pigs as a means of genetically differentiating wild pigs from breeds of domestic pigs found within the US. We organized reference high-density single nucleotide polymorphism genotypes (1,039 samples from 33 domestic breeds and 382 samples from 16 wild boar populations) into five genetically cohesive reference groups: mixed-commercial breeds, Durocs, heritage breeds, primitive breeds, and wild boar. Building upon well-established genetic clustering approaches, we structured the test statistic to describe the difference in the likelihood of a given genotype’s ancestry vectors (sensu genetic clustering analysis) if derived strictly from the four described domestic pig reference groups versus allowing for admixture from the wild boar group. By fitting statistical distributions to test statistics of reference domestic pigs, we characterized the distribution of the null hypothesis – that a given genotype descends strictly from domestic pig reference groups. We tested the approach with simulated genotypes and empirical data from an additional 29 breeds of domestic pig represented by 435 unique genotypes; all associated test statistics for simulated and empirical domestic pig challenge sets fell within the distribution of reference domestic pigs. We then evaluated 6,566 invasive wild pigs sampled across the contiguous United States, of which 63% exceeded the maximum threshold for domestic pigs and could be statistically classified as possessing wild boar ancestry. This approach provides a scientific foundation to enforce regulations prohibiting the possession of this destructive invasive species. Further, this computationally efficient and generalizable approach could be readily adapted to quantify gene flow among ecological systems of conservation or management concern.
README
We present data files and computer code necessary to recreate the analyses described in:
"Probabilistic genetic identification of wild boar hybridization to support control of invasive wild pigs (Sus scrofa)" by Timothy J. Smyser, Peter Pfaffelhuber, Rachael M. Giglio, Matthew G. DeSaix, Amy J. Davis, Courtney F. Bowden, Michael A. Tabak, Arianna Manunza, Valentin Adrian Balteanu, Marcel Amills, Laura Iacolina, Pamela Walker, Carl Lessard, and Antoinette J. Piaggio and published in Ecosphere
The files include genotypes for 8,422 Sus scrofa, representing 1,421 reference samples organized into K5 reference groups (mixed-commercial breeds, Duroc, heritage breeds, primitive breeds, and European wild boar), 6,566 wild pigs sampled across the invaded range within the contiguous United States, and 435 domestic pigs sampled from 29 Western breeds that were excluded from the reference set to serve as a test set, with genotypes compiled in a .bed/.bim/*.fam (binary PED file) file format. All genotypes were produced with Illumina BeadChip microarrays (San Diego, California) developed for porcine (PorcineSNP60 v1 and 2 or Genomic Profiler for Porcine HD, exclusively licensed to GeneSeek, a Neogen Corporation, Lincoln, Nebraska). Many of the reference genotypes have been published previously (full citations below with abbreviated citations listed in metadata file "All_DeltaLikelihoodGenotypes_MetadataFile.csv").
Additionally, we provide the following R scripts, necessary to recreate our analyses:
01_DeltaLikelihood_empirical.Rmd - This R markdown (Rmd) file has R code that performs the linkage disequilibrium pruning and Delta Likelihood calculation for the empirical data. As a component of this script, we read in a file presenting all evaluated loci (file "assignPop_ranking_InformativeLoci_1k_iterations_K5_referencePops.csv", columns represent locus name and rank order, from most informative [1] to least informative [28545]) ranked with respect to their relative informativeness in differentiating the 5 reference groups as delineated in the manuscript. Ranking of loci was performed in a separate analysis using the R package assignPOP (function assign.MC; Chen et al., 2018; R Core Team 2023; Appendix S1: Figure S1). Specifically, we conducted 1,000 Monte Carlo bootstrap iterations, retaining 90% of reference genotypes as training data to rank loci by FST within each iteration and concatenating all iterations to rank the informativeness of loci.
02_1_Bootstrap_prep.Rmd - This Rmd file has R code that prepares an Rdata file used in the subsequent file for running the bootstrap iterations.
02_2_Bootstrap_run.R - This R code performs the bootstrap calculation of the Delta Likelihood statistics and is intended to be run across many computers or cores.
03_1a_get-gscramble-HPC.sh - This Bash script submits the code to a Slurm cluster to run the gscramble simulations for Delta Likelihood calculations.
03_1b_run-gscramble-HPC.R - This R code performs the gscramble simulations for Delta Likelihood calculations.
03_1c_gscramble_deltaLikelihood_results.csv - This text file presents the results of the simulations described in the manuscript, listing by row the individual ID for a simulated genotype (column IndID), the domestic pig reference group from which empirical genotypes were drawn for the creation of the 16 member pedigree (ReferenceGroup), the number of empirical domestic pig genotypes included in the pedigree (# Dom Pig), the number of empirical wild boar genotypes included in the pedigree (# Wild Boar), the proportion of the simulated genome originating from the empirical domestic pig genotypes following four generations of recombination as described in the pedigree (Dom Pig Proportion), the proportion of the given simulated genome originating from the empirical wild boar genotypes (Wild Boar Proportion), and the corresponding value for the Delta Likelihood statistic (DeltaLikelihood Value).
03_2_gscramble-DL-summary-paper.Rmd - This Rmd file has the R code to produce the plots used in the manuscript from the simulations.
DeltaLikelihood_tools.R - This R code has the functions written specifically to perform the calculations described in this manuscript.
REFERENCES
Alexandri, P., H.-J. Megens, R. P. M. A. Crooijmans, M. A. M. Groenen, D. J. Goedbloed, J. M. Herrero-Medrano, L. A. Rund, L. B. Schook, E. Chatzinikos, C. Triantaphyllidis, and A. Triantafyllidis. 2017. "Distinguishing Migration Events of Different Timing for Wild Boar in the Balkans." Journal of Biogeography 44: 259-270.
Burgos-Paz, W., C. A. Souza, H.-J. Megens, Y. Ramayo-Caldas, M. Melo, C. Lemus-Flores, E. Caal, H. W. Soto, R Martinez, L. A. Alvarez, L. Aguirre, V. Iniguez, M. A. Revidatti, O. R. Martinez-Lopez, S. Llambi, A. Esteve-Codina, M. C. Rodriguez, R. P. M. A. Crooijmans, S. R. Paiva, L. B. Schook, M. A. M. Groenen, and M. Perez-Enciso. 2013. "Porcine Colonization of the Americas: a 60k SNP Story." Heredity 110: 321-330.
Chen, K.-Y., E. A. Marschall, M. G. Sovic, A. C. Fries, H. L. Gibbs, and S. A. Ludsin. 2018. "assignPOP: An R Package for Population Assignment Using Genetic, Non-genetic, or Integrated Data in a Machine-learning Framework." Methods in Ecology and Evolution 9: 439-446.
Goedbloed, D. J., H.-J. Megens, P. van Hooft, J. M. Herrero-Medrano, W. Lutz, P. Alexandri, R. P. M. A. Crooijmans, M. Groenen, S. E. van Wieren, R. C. Ydenberg, and H. H. T. Prins. 2013. "Genome-wide Single Nucleotide Polymorphism Analysis Reveals Recent Genetic Introgression from Domestic Pigs into Northwest European Wild Boar Populations." Molecular Ecology 22: 856-866.
Iacolina, L., M. Scandura, D. J. Goedbloed, P. Alexandri, R. P. M. A. Crooijmans, G. Larson, A. Archibald, M. Apollonio, L. B. Schook, M. A. M. Groenen, and H.-J. Megens. 2016. "Genomic Diversity and Differentiation of a Managed Island Wild Boar Population." Heredity 116: 60-67.
Lukic, B., M. Ferencakovic, D. Salamon, M. Cacic, V. Orehovacki, L. Iacolina, I. Curik, and V. Cubric-Curik. 2020. "Conservation Genomic Analysis of the Croatian Indigenous Black Slavonian and Turopolje Pig Breeds." Frontiers in Genetics 11: 261.
Manunza, A., M. Amills, A. Noce, B. Cabrera, A. Zidi, S. Eghbalsaied, E. C. de Albornoz, M. Portell, A. Mercade, A. Sanchez, and V. Balteanu. 2016. "Romanian Wild Boars and Mangalitza Pigs Have a European Ancestry and Harbour Genetic Signatures Compatible with Past Population Bottlenecks." Scientific Reports 6: 29913.
Manunza, A, A Zidi, S Yeghoyan, VA Balteanu, TC Carsai, O Scherbakov, O Ramirez, S Eghbalsaied, A Castello, A Mercade, and M Amills. 2013. "A High Throughput Genotyping Approach Reveals Distinctive Autosomal Genetic Signatures for European and Near Eastern Wild Boar." Plos One 8: e55891.
R Core Team. 2023. "R: A Language and Environment for Statistical Computing." https://www.R-project.org/.
Roberts, K. S., and W. R Lamberson. 2015. "Relationships Among and Variation Within Rare Breeds of Swine." Journal of Animal Science 93: 3810-3813.
Smyser, T. J., M. A. Tabak, C. Slootmaker, M. S. Robeson, R. S. Miller, M. Bosse, H. J. Megens, M. A. M. Groenen, S. R. Paiva, D. A. de Faria, H. D. Blackburn, B. S. Schmit, and A. J. Piaggio. 2020. "Mixed Ancestry from Wild and Domestic Lineages Contributes to the Rapid Expansion of Invasive Feral Swine." Molecular Ecology 29: 1103-1119.
Yang, B., L. L. Cui, M. Perez-Enciso, A. Traspov, R. P. M. A. Crooijmans, N. Zinovieva, L. B. Schook, A. Archibald, K. Gatphayak, C. Knorr, A. Triantafyllidis, P. Alexandri, G. Semiadi, O. Hanotte, D. Dias, P. Dovc, P. Uimari, L. Iacolina, M. Scandura, M. A. M. Groenen, L. S. Huang, and H.-J. Megens. 2017. "Genome-wide SNP Data Unveils the Globalization of Domesticated Pigs." Genetics Selection Evolution 49: 71.
Methods
We assembled the Sus scrofa reference set from previously published high-resolution SNP genotypes, restricting analysis to genotypes produced with Illumina BeadArray technology (San Diego, California) across multiple commercially available arrays (Illumina PorcineSNP60, Illumina PorcineSNP60 v2, Genomic Profiler for Porcine HD, licensed exclusively to GeneSeek, a Neogen Corporation, Lansing, Michigan; Ramos et al., 2009). We augmented previously published genotypes (detailed in Smyser et al., 2020) with a subset of novel genotypes produced for this study (Appendix S1: Table S1). We restricted our analyses to loci that were available across all datasets (influenced by loci shared across arrays and the extent to which publicly available datasets were filtered by authors prior to publication) and mapped to autosomes (Sscrofa11.1 genome assembly; Warr et al., 2020). In sum, we included 33 breeds and 16 populations of European wild boar, representing a total of 1,421 reference samples (Table 1) genotyped at 28,545 biallelic loci.
Usage notes
Files are presented as a PLINK binary file family (*.bed/*.bim/*.fam). Files can be opened with PLINK (freely available at https://www.cog-genomics.org/plink/) or with many other software platforms.