Neutral and adaptive drivers of genomic change in introduced brook trout (Salvelinus fontinalis) populations revealed by pooled sequencing
Brookes, Brent et al. (2022), Neutral and adaptive drivers of genomic change in introduced brook trout (Salvelinus fontinalis) populations revealed by pooled sequencing, Dryad, Dataset, https://doi.org/10.5061/dryad.np5hqbzvd
Understanding the drivers of successful species invasions is important for conserving native biodiversity and for mitigating the economic impacts of introduced species. However, whole-genome resolution investigations of the underlying contributions of neutral and adaptive genetic variation in successful introductions are rare. Increased propagule pressure should result in greater neutral genetic variation, while environmental differences should elicit selective pressures on introduced populations, leading to adaptive differentiation. We investigated neutral and adaptive variation among nine introduced brook trout (Salvelinus fontinalis) populations using whole-genome pooled sequencing. The populations inhabit isolated alpine lakes in western Canada and descend from a common source, with an average of ~19 (range of 7-41) generations since introduction. We found some evidence of bottlenecks without recovery, no strong evidence of purifying selection, and little support that varying propagule pressure or differences in local environments shaped observed neutral genetic variation differences. Putative adaptive loci analysis revealed non-convergent patterns of adaptive differentiation among lakes with minimal putatively adaptive loci (0.001%-0.15%) that did not correspond with tested environmental variables. Our results suggest that (i) introduction success is not always strongly influenced by genetic load, (ii) observed differentiation among introduced populations can be idiosyncratic, population-specific, or stochastic, and (iii) conservatively, in some introduced species, colonization barriers may be overcome by support through one aspect of propagule pressure or benign environmental conditions.
This dataset was collected by sampling Salvelinus fontinalis from lakes in the Rocky Mountains of Alberta and BC. DNA extractions from fin tissue were conducted using Qiagen blood and tissue kits (Qiagen, Germany) and the manufacturer protocol. To ensure equal quantities of DNA in the pooled samples, DNA quality and quantity were initially assessed by 1% Agarose gel electrophoresis using HindIII digested Lambda DNA run at 100V to assess possible DNA degradation. Multiple quality tests per individual were conducted on a Qubit Fluorometer 2.0 (Invitrogen, USA) selecting for quantity >20 ng/µL and confirmed in a NanoDrop spectrophotometer (Thermo Scientific, USA) as well as estimates of 260/230 and 260/280 ratios greater than >1.8 quality.
Individual DNA was then pooled by sex, and population (18 total pools) with 20 individuals in each pool; exceptions were Cobb females (n=17) and both McNair sexes (n=8) due to low population and sample sizes. Twenty individuals per pool were chosen to ensure a balance between available population samples and to have equal representation between sexes (two pools of 20 per population) while maintaining accurate unbiased allele frequency estimates (Anand et al., 2016; Boitard et al., 2012). Albeit from different sexes, the adoption of two pools per population also provided a degree of sampling replication for some population genomic analyses, such as genomic-wide diversity and genetic differentiation. Fifty µL of each individual sample was selected for each pool at a dilution of 10 ng/µL, with DNA concentrations confirmed both prior to and post using a Qubit Fluorometer. DNA was pooled together at a final concentration of 3 ng/µL, confirmed using a Qubit Fluorometer.
Genomic libraries of these pooled DNA samples were prepared by Génome Québec Innovation Centre, Montréal, Québec, Canada via a shotgun approach with PCR with Illumina TrueSeq LT adaptors (Illumina, USA). All pools passed quality and quantity requirements and were sequenced each on two lanes of NovaSeq 6000 S4 flowcell (Illumina) and paired-end reads of 100 base pairs (bp). Coverage was estimated based on the assumption that the brook trout genome is approximately 3Gb, based on the Animal Genome Size Database (http://www.genomesize.com/).
A reference genome of charr (Salvelinus sp.) available from NCBI (ASM291031v2, Christensen et al., 2018; Genome size = ~2.4GB, scaffold N50 = 1.02Mbp, Contig N50 = 55.6Kbp, masked mapping) was used due to its close phylogenetic and karyotypic relationship with brook trout (Timusk et al., 2011). The reference genome was prepared using Burrows-Wheeler Aligner (BWA) v 0.7.12 (Li & Durbin, 2009), indexed with SAMtools v 1.5 (Li et al., 2009), and a dictionary was created using Picard tools v 2.17.11 (http://broadinstitute.github.io/picard/, accessed 20-11-2019) to permit sequence alignment.
SNP discovery was performed using the PPalign module of the PoolParty pipeline v 0.8 (Micheletti & Narum, 2018); the methods and packages in this module are detailed below. Mapping, alignment trimming, and filtering to the charr reference genome were performed with BWA-MEM v 0.7.12, SAMtools v 1.5 using a mapping quality threshold of 10 and SAMblaster v 0.1.24 (Faust & Hall, 2014), while filtering for a quality score threshold of 20 was performed by BBMap v 37.93 (Bushnell, sourceforge.net/projects/bbmap/, accessed 20-11-2019), and summarized with Fastqc v 0.11.7 (Andrews, 2010). SNP filtering was carried out using common parameters for salmonid species, found below (Horn et al., 2020). Duplicate sequences and unpaired reads were filtered using SAMtools, BBMap, and Picard tools with a minimum fastq trimming length of 25bp. An indel window of 15bp was used to mask SNPs around indel regions. SNP calling was facilitated conservatively by BCFtools v 1.5 (Li et al., 2009) with a quality score of 20, a minimum global allele frequency of 0.05, and a minimum global coverage of 10. Raw reads were checked for quality using FastQC and MultiQC v1.7 (Ewels et al., 2016). After SNP calling, multiallelic SNPs that could be paralogs were removed following Létourneau et al., 2018; Narum et al., 2017; and Terekhanova et al., 2019. Finally, for all analyses, the PPanalyze module of PoolParty was used to filter out duplicated loci and filter for loci common between all populations with minimum global coverage of 20, maximum global coverage of 100, and minimum allele frequency of 0.05. Of all tested SNPs, the proportion of putatively adaptive loci - i.e. loci with greater deviation from average - was negligible (0.15%), and therefore putatively adaptive loci were not removed. Following alignment, mpileup files were run through the PPstats module of PoolParty to estimate depth of coverage, alignment statistics, and genome coverage. Collectively, a total of 362,493 SNPs remained that were common among all populations in the dataset (and biallelic, with scaffold removed); these were used for all subsequent genomic analyses, except the Cochran Mantel Haenszel (CMH) tests described below.
These .sync files are labeled for the entire dataset and for the common loci dataset. The latter is used for the majority of our published work. Information on the format of the files can be found in the README file.
Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada, Award: Special Projects Grant 2016
Fonds de recherche du Québec – Nature et technologies
Groupe de recherche interuniversitaire en limnologie
Quebec Centre for Biodiversity Science