Skip to main content
Dryad logo

Data from: Modeling multilocus selection in an individual-based, spatially-explicit landscape genetics framework

Citation

Landguth, Erin et al. (2019), Data from: Modeling multilocus selection in an individual-based, spatially-explicit landscape genetics framework, Dryad, Dataset, https://doi.org/10.5061/dryad.d2547d7zp

Abstract

We implemented multilocus selection in a spatially-explicit, individual-based framework that enables multivariate environmental gradients to drive selection in many loci as a new module for the landscape genetics programs, CDPOP and CDMetaPOP. Our module simulates multilocus selection using a linear additive model, providing a flexible platform to evaluate a wide range of genotype-environment associations. Importantly, the module allows simulation of selection in any number of loci under the influence of any number of environmental variables. We validated the module with individual-based selection simulations under Wright-Fisher assumptions (Figure 3 and data provided here). We then evaluated results for simulations under a simple landscape selection model (Figure 4 and data provided here). Next, we simulated individual-based multilocus selection across a complex selection landscape with three loci linked to three different environmental variables (Figure 5 and data provided here). Finally, we demonstrated how the program can be used to simulate multilocus selection under varying selection strengths across different levels of gene flow in a landscape genetics framework (Figure 6 and data provided here). This new module provides a valuable addition to the study of landscape genetics, allowing for explicit evaluation of the contributions and interactions between gene flow and selection-driven processes across complex, multivariate environmental and landscape conditions.

Methods

Simulated datasets are provided for Figures 3-6 of the manuscript. 

Figure 3: The simulations for the single (A) and double (B) diallelic locus selection models. For the single diallelic locus selection model, we used F = X 1 ([b 111 A 111 + b 112 A 112 ]), and set the average effects, b111 = 10 and b112 = -10. For the double diallelic locus selection model, we used F = X 1 ([b 111 A 111 + b 112 A 112 ]+[ b 121 A 121 + b 122 A 122 ]), and set set b111 = 10, b112 = -10, b121 = 10, and b122 = -10. X1 was a uniform spatial selection surface with all values of 1. Each simulated dataset contains 50 replicates. For both the single and double diallelic locus selection model simulations, the Wright-Fisher model was assumed (i.e., random mating, sexual reproduction with both female and male with replacement, offspring randomly disperse until a constant population is reached that has an equal sex ratio, no mutation, and non-overlapping generations) with one exception: each mated pair produced 2 offspring to ensure a constant population size. We simulated individual genetic exchange across 100 non-overlapping generations among 1000 randomly spatially located individuals in a 1024 x 1024 gridded landscape for each selection model. All simulated populations contained an additional 50 diallelic neutral loci.

Figure 4: Three simulation datasets of a single spatially-variable selection landscape and single diallelic locus. These datasets were produced with the same simulation parameters as in Figure 3's datasets, but replacing the uniform spatial seelction surface with a spatially-variable seelction surface. Using a 1024 x 1024 gridded raster, we created a categorical landscape that included an upper triangle with values of 1, a lower triangle with values of -1, and diagonal cells with a value of 0. The three simulated datasets varied how the genotypes are initialized by starting the simulations with (i) only AA, (ii) only aa, and (iii) random assignment. 50 replicates are included in each dataset.

Figure 5: Simulation dataset of a complex landscape and three loci. These data considered three environmental variables that affect fitness as shown in Figure 1 of the manuscrip, with three loci and two alleles per locus operating in the selection process. The first environmental variable was the previously described categorical landscape used for datasets in Figure 4.. The second environmental variable was a gradient landscape with continuous values ranging from 1 to -1 from the North-South. The third environmental variable represented a fragmented landscape with equal proportion of values for 1 (e.g., favored habitat) and -1 (e.g., non-favored habitat) created in the program QRULE. For simplicity, we set b111 = 10 and b112 = -10 for the first locus and environmental variable, X1, b221 = 10 and b222 = -10 for the second locus and environmental variable, X2, and b331 = 10 and b332 = -10 for the third locus and environmental variable, X3, where X1, X2, and X3 are the diagonal, gradient, and habitat variables shown in Figure 1, respectively. The remaining betas were set to 0. Unlike the previous panmictic movement simulations, we restricted movement of the individuals in these simulations to follow an inverse-square probability function constrained to a 25% maximum threshold of the entire landscape. We initialized all genotypes randomly at the start of the simulations. 3 replicates are included.

Figure 6: Multilocus selection simulations under three selection strengths (strong, moderate, and weak) and three dispersal scenarios (5%, 10%, and 15% of the landscape). These simulations include 1000 loci with two alleles per locus: 100 loci under selection in response to a single environmental variable and 900 neutral loci. We used a gradient landscape with continuous values ranging from 1 to -1 from the North-South. We set the first l = 1, 2, …, 20 loci effect sizes to b1l1 = 0.15 and b1l2 = -0.15, the following l = 21, 22, …, 50 loci effect sizes to b1l1 = 0.10 and b1l2 = -0.10, and the last l = 51, 52, …, 100 loci effect sizes to b1l1 = 0.05 and b1l2 = -0.05 (reflecting “strong” [n=20], “moderate” [n=30], and “weak” [n=50] selection, respectively). We increased the population size to 5000 for these simulations within the same previous 1024 x1024 simulation landscape. We restricted movement of the individuals in these final simulations to follow an inverse-square probability function constrained to 5%, 10%, and 15% maximum threshold of the entire landscape. We initialized all genotypes randomly at the start of the simulations and ran the simulations for 200 generations, using the first 100 generations as a ‘burn-in’ period where no selection was operating.

Usage Notes

CDPOP individual files are included in folders (e.g., batchrun0mcrun0), and labeled grid0.csv,..., grid{final time}.csv. Within each grid{time}.csv file, includes many field identifies (X,Y, age, sex, and genotypes, etc.). For more information on the format of these files, please refer to the CDPOP usermanual found at github.com/ComputationalEcologyLab/CDPOP/docs. 

Funding

National Science Foundation, Award: EF-1442486

National Science Foundation, Award: EF-1442597

National Science Foundation, Award: DEB-1340852