Feral swine genotypes and metadata used for identifying translocations in the United States
Data files
Jul 24, 2024 version files 162.53 MB
-
MasterDatabase_GenoData_Giglio_etal2024.bed
-
MasterDatabase_GenoData_Giglio_etal2024.bim
-
MasterDatabase_GenoData_Giglio_etal2024.fam
-
MasterDatabase_MetaData_Giglio_etal2024.csv
-
README.md
Abstract
Globalization has led to the frequent movement of species out of their native habitat. Some of these species become highly invasive and capable of profoundly altering invaded ecosystems. Feral swine (Sus scrofa × domesticus) are recognized as being among the most destructive invasive species, with populations established on all continents except Antarctica. Within the United States (US), feral swine are responsible for extensive crop damage, the destruction of native ecosystems, and the spread of disease. Purposeful human-mediated movement of feral swine has contributed to their rapid range expansion over the past 30 years. Patterns of deliberate introduction of feral swine have not been well described as populations may be established or augmented through small, undocumented releases. By leveraging an extensive genomic database of 18,789 samples genotyped at 35,141 single nucleotide polymorphisms (SNPs), we used deep neural networks to identify translocated feral swine across the contiguous US. We classified 20% (3,364/16,774) of sampled animals as having been translocated and described general patterns of translocation using measures of centrality in a network analysis. These findings unveil extensive movement of feral swine well beyond their dispersal capabilities, including individuals with predicted origins >1,000 km away from their sampling locations. Our study provides insight into the patterns of human-mediated movement of feral swine across the US and from Canada to the northern areas of the US. Further, our study validates the use of neural networks to study the spread of invasive species.
README: Feral swine genotypes and metadata used for identifying translocations in the United States
https://doi.org/10.5061/dryad.b2rbnzsq9
This data contains 35,141 single nucleotide polymorphisms (SNPs) for 18,248 individual feral swine sampled across the contiguous United States as well as 27 feral swine sampled in Canada (Alberta=13 and Saskatchewan=14). We also provide metadata containing the subject ID, the state and county from which the feral swine was sampled, as well as the genetic cluster it was assigned for downstream analyses.
Description of the data and file structure
Genotype data is contained in .bed, .bim, .fam format with the prefix "MasterDatabase GenoData Giglio etal2024 pruned". The metadata is saved as a .csv and named "MasterDatabase MetaData Giglio etal2024.csv".
Sharing/Access information
Wild boar genotypes were derived from the following sources:
- Yang, Bin et al. (2018). Data from: Genome-wide SNP data unveils the globalization of domesticated pigs [Dataset]. Dryad. https://doi.org/10.5061/dryad.30tk6
Methods
Biological samples (n = 18,789) were collected from feral swine throughout their invaded range within the US as an extension of damage management and disease surveillance efforts led by the USDA along with cooperative agencies. Overwhelmingly, samples were collected by USDA‐Animal and Plant Health Inspection Service-Wildlife Services personnel. Feral swine were lethally removed through trapping or targeted sharpshooting from 2001-2022 as an extension of control efforts to reduce threats to agriculture, natural resources, property, and the health of humans and livestock. To identify potential translocations from Canada to the US, biological samples were collected from feral swine in Alberta (n = 13) and Saskatchewan (n = 14), Canada by the University of Saskatchewan under Animal Use Protocol Number 21050024. DNA extraction was performed by GeneSeek (Neogen Corporation [Lincoln, Nebraska, USA]) using various biological sample types (hair, pinna, and kidney) and the MagMaxTM DNA Multi-Sample Ultra Kit (Thermo Fisher Scientific Inc. [Walthan, MA, USA]). Genetic samples were genotyped using GeneSeek’s Genomic Profiler (GGP) for the Porcine 80k array (68,516 loci; Illumina BeadChip microarray [San Diego, California] licensed exclusively to GeneSeek, a Neogen Corporation, [Lincoln, Nebraska]) and aligned to the Sscrofa 11.1 genome assembly (Warr et al. 2020).
As part of our quality control process, we removed individuals presumed to have escaped or released domestic pigs from production farms or the pet trade (i.e., Vietnamese potbellied pigs). To distinguish domestic pigs from genetically typical feral swine, we estimated the ancestry profiles of individuals based on the methods described in Smyser et al. (2020) and removed any individual with a combined ancestry of >0.4 from domestic pig breeds (Berkshire, Hampshire, Chester White, Duroc, Landrace, Yorkshire/Large White, Meishan, and miniature Siberian). Once individuals presumed to be escaped or released domestic pigs were removed, we conducted standard genotype quality control filters using PLINK 2.0 (Chang et al. 2015). First, we removed loci that were unmapped or non-autosomal based on the Sscrofa11.1 reference genome assembly (Warr et al., 2020). We then removed loci with call rates <0.95 or minor allele frequencies <0.05. Individuals were removed for downstream analyses if they were missing >5% of their genotype data. The resulting set of individual genotypes was considered the ‘master’ dataset.