Human impacts on Great Lakes walleye Sander vitreus structure, diversity, and local adaptation
Data files
May 02, 2025 version files 312.61 MB
Abstract
Artificial propagation and wild release may influence the genetic integrity of wild populations. This practice has been prevalent in fisheries for centuries and is often termed “stocking”. In the Laurentian Great Lakes (Great Lakes here-on), walleye populations faced declines from the 1950s to the 1970s, prompting extensive stocking efforts for restoration. By the mid-2010s, walleye populations showed signs of recovery, but the genetic legacy of stocking on population structure at the genomic level remains unclear. Using a dataset of 45,600 genome-aligned SNP loci genotyped in 1,075 walleye individuals, we investigated the genetic impacts of over 50 years of stocking across the Great Lakes. Population structure was associated with both natural geographic barriers and stocking from non-native sources. Admixture between Lake Erie walleye and walleye from the re-populated Tittabawassee River indicate that stocking may have re-distributed putatively adaptive alleles around the Great Lakes. Genome scans identified FST outliers and evidence of selective sweeps, indicating local adaptation of spawning populations is likely. Notably, one genomic region showed strong differentiation between Muskegon River and walleye from the Tittabawassee River which was re-populated by Muskegon strain walleye, suggesting admixture and selection both impact the observed genetic diversity. Overall, our study underscores how artificial propagation and translocations can significantly alter the evolutionary trajectory of populations. The findings highlight the complex interplay between stocking practices and population genetic diversity, emphasizing the need for careful management strategies to preserve the genetic integrity of wild populations amidst conservation efforts.
https://doi.org/10.5061/dryad.zkh1893k9
Description of the data and file structure
These data were collected as part of a population genetic assessment of Walleye (Sander vitreus) in the Great Lakes. All sequence data was collected using BestRAD sequencing and PSTI restriction site enzyme digest. Sequence data was processed using STACKS2 and a reference aligned pipeline. Detailed methods can be found in the open access publication Euclide et al., 2024: https://doi.org/10.1111/mec.17558.
Files and variables
File: Filt1.mac4.R10.mm20.minDP10.ind90.mm80.ind20.maf05.ind20.HDPlot.recode.vcf.gz
Description: This file contains all of the genotype information used in the final analysis.
Genotyped SNP variants were filtered to retain high-quality loci using VCFTOOLs v.0.1.16 (Danecek et al., 2011). Filtering took place in a stepwise fashion: (1) remove all SNPs missing greater than 80% of genotype calls (--max-missing 0.2), (2) remove all SNPs containing an mean depth of coverage less than 10 (--min-meanDP 10), (3) remove all individuals missing >90% of calls at the remaining sites, (4) remove all SNPs missing more than 20% of genotype calls at remaining sites and individuals (--max-missing 0.8), (5) remove all individuals missing >20% of remaining sites, (6) remove all SNPs with a minor allele frequency <0.05 in the total dataset (--maf 0.05), (7) remove all individuals missing >20% of calls at the remaining sites. (8) Possible paralogs were identified using HDPlot (McKinney et al., 2017) and all markers that were outside of thresholds H<0.6; D >0.45 and <0.57 (Figure S1) were removed.
File: popmap.txt
Description: this file contains the sample IDs (column 1) and population information (column 2) for samples included in analysis.
Variables
- E_BB_19-12141: Sample names/IDs
- E_BB: Sample collection location code
File: Filt1.mac4.R10.mm20.minDP10.ind90.mm80.ind20.maf05.ind20.HDPlot.num.LD02.recode.vcf.gz
Description: This file contains a linkage disequilibrium filtered version of the file Filt1.mac4.R10.mm20.minDP10.ind90.mm80.ind20.maf05.ind20.HDPlot.recode.vcf.gz. This linkage disequilibrium filtered version was used in the final analysis of genetic diversity and population structure. This file contains fewer total variants (SNPs), but will be less impacted by linkage among loci which can bias certain population genetic analyses.
VCF File Naming Strategy
VCF Files are named purposefully to describe filtering parameters -
- mac4 = minor-allele-count 4
- R10 = Retention of SNPs contained in at least 10% of individuals
- mm20 = Retention of SNPs containing genotype calls in more than 20% of individuals
- minDP10 = Retention of genotype calls with a depth of coverage > 10
- ind90 = Removal of individuals missing >90% of genotype calls
- mm80 = Retention of SNPs containing genotype calls in more than 80% of individuals
- ind20 = Removal of individuals missing >20% of genotype calls
- maf05 = Removal of SNPs with a minor allele frequency < 0.05
- ind20 = Removal of individuals missing >20% of genotype calls
- HDPlot = Retention of SNPs that passed quality filters identified using HDplot (McKinney et al. 2017)
- LD02 = Retention of SNPs with linkage disequilibrium score R2 < 0.2
VCF files can be summarized using the software VCFtools (Danecek et al., 2011; https://vcftools.github.io/index.html) or opened using a text editor such as BBEdit (https://www.barebones.com/products/bbedit/index.html). Alternative methods for analyzing and viewing VCF files include the R package VCFr (https://cran.r-project.org/web/packages/vcfR/index.html).
Sampling design and DNA extraction
Major walleye spawning populations throughout the Great Lakes were identified through expert knowledge and discussion with state and provincial agency biologists. Many of the selected populations had been sampled for previous studies that have shown local genetic structure (e.g., Wilson et al., 2007; Stepien et al., 2009). Hatchery stocking took place at or near almost all our identified spawning sites at least once in the past with the exception of Lake Erie spawning sites (Table S1). Walleye continue to be stocked in many locations around the Great Lakes today with hundreds-of-thousands of yearling-equivalent walleye stocked annually (USFWS/GLFC, 2023). From the list of identified spawning populations, 29 were selected as target sample locations based on the availability of DNA or tissue archives, historical importance to fisheries, and geographic distribution (Figure 1). Two populations outside of the Great Lakes Lake Gogebic, Michigan and Oneida Lake, New York, were also included (Figure 1). Lake Gogebic was the location of a major hatchery for the Michigan Department of Natural Resources (DNR) for many years and was stocked with many strains of walleye from the 1950s through the 1980s. Importantly, it was heavily stocked with Saginaw Bay reef-spawning walleye prior to the Saginaw Bay population crash in the 1960s and so likely represents a mixture of older Lake Huron and Lake Michigan walleye ancestry (Norcross, 1986). Oneida Lake has been used as a major gamete source for walleye stocked into Lake Ontario and has been shown in previous studies to be genetically similar to Lake Ontario walleye (Stepien et al., 2009). Samples of tissue (scales, spines, or fin clips) or extracted DNA for 27 of the 29 selected sample sites were shipped to the University of Wisconsin Stevens Point for processing. All the samples from the Great Lakes were collected at known spawning locations during the spawning season. These samples had been collected for previous walleye aging studies or genetic research projects during walleye spawning runs (April to May) in the decade leading up to our analysis under local protocols. Additional fin clips were collected for this project from walleye spawning runs in the Black River, Lake Ontario (2020) and Kakagon River (2019), Lake Superior following local protocols of the New York Department of Environmental Conservation (NYDEC) and Bad River Tribe of the Lake Superior Chippewa, respectively. All tissue samples were extracted for DNA using Qiagen DNeasy Blood and Tissue 96 Kits (Qiagen) following standard Qiagen protocols. All DNA was quantified using PicoGreen assays conducted on a QuantIt plate reader prior to the preparation of sequencing libraries. Detailed metadata for all samples used in the present study can be found at Geome-db (Genomic Observatories MetaDatabase: https://n2t.net/ark:/21547/R2299; Riginos et al., 2020).
Rapture sequencing
Samples were sequenced using the Rapture BestRAD sequencing protocol outlined in Ali et al. (2016) with the panel first described in Euclide et al. (2022). This panel includes a single 80 nucleotide bait with 2 x tiling for each of 99,636 SNP loci identified from a preliminary PstI RAD-sequencing survey of 48 walleye collected in the Great Lakes (N = 9 – 10 from each of lakes Superior, Huron, Michigan, Erie, and Ontario) designed by Arbor Biosciences (Ann Arbor, MI). Target loci were selected based on minor allele frequency (>0.01) and alignment position along a draft walleye genome assembly (Euclide et al., 2022). Rapture sequencing libraries (N = 14) were constructed for 1,289 walleyes spanning the 29 walleye spawning locations. All individuals were labeled with both a unique 8-bp sample barcodes (N = 96), and a secondary 6-bp plate barcode denoting which library the individual was contained within. Libraries were sent to Novogene (Sacramento, CA) and sequenced with a target coverage of 20 to 25X in equal concentration across three NovaSeq S4 lanes.
Sequence data were processed using a reference genome aligned genotype calling pipeline in STACKS v. 2.0 (Rochette et al., 2019). Sequence reads for each individual were demultiplexed using process_radtags (-c -q -r –filter_illumina --bestrad -t 140) and then adapters and low-quality reads were trimmed in trimmomatic (--phred33 LLUMINACLIP:TruSeq_adaptors.fa:2:30:10 SLIDINGWINDOW:4:15 MINLEN:50; Bolger et al., 2014). Trimmed paired-end reads were aligned to the draft walleye genome assembly using the BWA mem algorithm (Li & Durbin, 2009) and sorted and filtered using samtools (-b -h -q 20 -f 2 -F 2308; Li et al., 2009). The genome assembly used was produced using Oxford Nanopore sequence data collected from walleye sampled from Lake Erie, Ohio (Kuhl et al., 2024; GenBank accession: GCA_031162955.1). The reference contained 12X long-read and 75X short-read coverage. Assembled contig N50 was 6.2 Mbp, and 96.5% of the assembled sequence could be assigned to 24 chromosomal scaffolds (chr.scf) (BUSCO scores: C:98.6%[S:95.5%,D:3.1%],F:0.8%,M:0.6%,n:4584). Bam files for each individual were then filtered for PCR duplicates and variants were genotyped using gstacks (--rm-pcr-duplicates). Individual genotypes were then output into VCF format using populations (flags: --mac 4 -R 0.1).
Genotyped SNP variants were filtered to retain high-quality loci using VCFTOOLs v.0.1.16 (Danecek et al., 2011). Filtering took place in a stepwise fashion: (1) remove all SNPs missing greater than 80% of genotype calls (--max-missing 0.2), (2) remove all SNPs containing an mean depth of coverage less than 10 (--min-meanDP 10), (3) remove all individuals missing >90% of calls at the remaining sites, (4) remove all SNPs missing more than 20% of genotype calls at remaining sites and individuals (--max-missing 0.8), (5) remove all individuals missing >20% of remaining sites, (6) remove all SNPs with a minor allele frequency <0.05 in the total dataset (--maf 0.05), (7) remove all individuals missing >20% of calls at the remaining sites. (8) Possible paralogs were identified using HDPlot (McKinney et al., 2017) and all markers that were outside of thresholds H<0.6; D >0.45 and <0.57 (Figure S1) were removed. The final dataset contained 1075 individuals genotyped at 45,600 SNPs.
