Pitfalls and windfalls of detecting demographic declines using population genetics in long-lived species
Data files
Jul 20, 2024 version files 96.88 GB
-
nWF_slim_output_10.tar.gz
17.43 GB
-
nWF_slim_output_2.tar.gz
19.72 GB
-
nWF_slim_output_20.tar.gz
16.80 GB
-
nWF_slim_output_5.tar.gz
18.27 GB
-
pWF_slim_output.tar.gz
24.40 GB
-
README.md
6.31 KB
-
scripts.zip
38.06 KB
-
slim.zip
7.05 KB
-
sum_stat_output.tar.gz
270.84 MB
Abstract
Detecting recent demographic changes is a crucial component of species conservation and management, as many natural populations face declines due to anthropogenic habitat alteration and climate change. Genetic methods allow researchers to detect changes in effective population size (Ne) from sampling at a single timepoint. However, in species with long lifespans, there is a lag between the start of a decline in a population and the resulting decrease in genetic diversity. This lag slows the rate at which diversity is lost, and therefore makes it difficult to detect recent declines using genetic data. However, the genomes of old individuals can provide a window into the past, and can be compared to those of younger individuals, a contrast that may help reveal recent demographic declines. To test whether comparing the genomes of young and old individuals can help infer recent demographic bottlenecks, we use forward-time, individual-based simulations with varying mean individual lifespans and extents of generational overlap. We find that age information can be used to aid in the detection of demographic declines when the decline has been severe. When average lifespan is long, comparing young and old individuals from a single timepoint has greater power to detect a recent (within the last 50 years) bottleneck event than comparing individuals sampled at different points in time. Our results demonstrate how longevity and generational overlap can be both a hindrance and a boon to detecting recent demographic declines from population genomic data.
https://doi.org/10.5061/dryad.w0vt4b91p
This repository details the generation and analysis of simulated data for exploring the application of age-aware sampling to detecting demographic declines. There is no empirical data associated with this study, but simulated datafiles are uploaded and detailed below. All code required to reproduce analyses in the paper are below. Please reach out to Meaghan with questions at meaghaniclark (at) gmail.com.
Data
pWF_slim_output.tar.gz
nWF_slim_output_2.tar.gz
nWF_slim_output_5.tar.gz
nWF_slim_output_10.tar.gz
nWF_slim_output_20.tar.gz
These directories contain simulated data output from slim. Perennial model outputs (“nWF”) are split by average age. File names denote the average age, bottleneck severity, and replicate number in that order for the data file. For example, “tree_pWF_1_10_5.trees” indicates that this is a tree sequence file from the annual slim mode (average age = 1),
sum_stats_output.tar.gz
This directory contains genetic diversity data from simulations. The file format convention is the same as above. Each replicate of the simulation is associated with five output files: “age_bin”, “temporal”, “permute_age_bin”, “permute_temporal” and “summary”.
Software Information
Required software
- SLiM v.3.7.1
- msprime v.1.0.2
- pyslim v.1.0.4
- numpy v.1.21.6
- pandas v.1.2.5
- tskit v.0.5.3
- R v.4.1.1
Required R packages:
- ape
- magick
- pdftools
- MetBrewer
- vioplot
- scales
Additional resources
For more information about SLiM:
For more information about tree sequences and tree sequence processing:
- tskit: https://tskit.dev/
-
pyslim: https://tskit.dev/pyslim/docs/latest/introduction.html
- msprime: https://tskit.dev/msprime/docs/stable/intro.html
Helpful information about installing slim and python packages required for tree sequence processing here.
Simulation code:
demo_change_nWF.slim
: Slim model for Perennial simulation (non-Wright Fisher). Output of the simulation is a .trees file detailing geneological relationships between individuals in the population, individual ages, and population census size over time.
demo_change_pWF.slim
: Slim model for Annual simulation (“pseudo-Wright Fisher”). Output of the simulation is a .trees file detailing geneological relationships between individuals in the population and population census size over time.
no_bottleneck_nWF.slim
: Perennial slim simulation with no bottleneck and no tree sequence recording; used for calculating pre-bottleneck diversity to ensure equal Ne between simulation replicates.
Scripts for running simulations
All simulations were run on a high-throughput computing cluster with a slurm job scheduler.
wrapper-run_slim_all.sh
: Submission script that starts a slurm array job to run slim simulations. Key parameter values (average age (A), bottleneck severity (R), replicate number, census population size, and number of burn in generations) are defined in a “sim_block” file, specified as a command line argument to the wrapper script. The type of simulation (nWF or pWF) is also provided via command line.
run_slim_all.sbatch
: Executable script to run simulations. Key parameters are passed to this script from wrapper-run_slim_all.sh
.
wrapper-run_slim_no_bottleneck.sh
: Submission script that starts a slurm array jobs with the executable
run_slim_nobottleneck.sbatch
. Simulation type (pWF or nWF), census population size, and average age are specified via command line argument.
run_slim_nobottleneck.sbatch
: Executable script to run simulation without a bottleneck (no_bottleneck_nWF.slim
).
Files required for array jobs:
array_index_key_1.txt
array_index_key_2_5.txt
array_index_key_10_20.txt
Each of these files contains four columns (average age (A), bottleneck severity (R), replicate number, census population size, and number of burn in generations) and is used to start a slurm array job to run simulations. array_index_key_1.txt
start annual simulation jobs. array_index_key_2_5.txt
starts perennial simulation jobs for average ages 2 and 5. array_index_key_10_20.txt
starts perennial simulation jobs for average ages 10 and 20.
Tree sequence processing:
tree_2_sum.py
: Python script that reads in a .trees file generated by slim, recapitates it, overlays mutations, loads metadata output by demo_change_nWF.slim
or demo_change_pWF.slim
, and loops through relevant timepoints in the simulation, sampling bins of saved individuals based on age or time, and outputs Wu and Watterson’s theta and pi values for bins.
wrapper-run_processing.sh
: Submission script that starts a slurm array job to process tree sequences output from slim. User must edit the script to specify the location and naming convention of .trees files. Key parameter values (average age (A), bottleneck severity (R), replicate number, census population size, and number of burn in generations) are defined in a “array_key” file, specified as a command line argument to the wrapper script, and identical to the “sim_blocks” for running wrapper-run_slim_all.sh
. One job is run per .trees file.
run_processing.sbatch
: Executable script that starts running tree_2_sum.py
to process .trees files.
Analysis scrpts:
get_pWF_sample_sizes.R
: This R script reads in genetic diversity and sample size data from the output of tree_2_sum.py
and uses it to calculate average sample sizes from the perennial simulations.
make_figures.R
: This R script reads in genetic diversity data from the output of tree_2_sum.py
. Data is wrangled. Stats for Clark et al. 2024 are calculated and permutation and delta power analyses are performed. This script also makes all figures seen in the paper.
viz_functions.R
: This R script defines custom functions used in make_figures.R
.
All data for this publication were generated via evolutionary simulations in SLiM. Here, we archive all scripts necesarily to generate, analyze, and visualize the results presented in Clark et al. 2024.
First, we performed simulations in SLiM using a perennial and annual model for a variety of average lifespans (for the perennial model), and bottleneck severities. The output of these simulations is (1) a .tree file contain the geneological history of the population, from which we will extract information about genetic diversity, (2) individual-based metadata for all individuls alive during the simulation sampling time: the generation number, individual pedigree id and the individual's age, (3) Census population size information about the population at each generation in the sampling period.
Second, we used tskit, msprime, and pyslim to load and process .tree files as tree sequences. We then loop through focal sampling points in the tree sequence, and sampling individuals to perform age and temporal comparisons. Genetic diversity data from the sampled bins is exported as .txt files.
Finally, genetic diversity data is loaded in R, permutation tests are performed to test for significant differences in genetic diversity between bins, and figures are created.