Evolution is more repeatable in the introduction than range expansion phase of colonization
Data files
Jan 24, 2024 version files 11.02 GB
-
aux_model_summary_betai.out
225.72 MB
-
aux_model_summary_yij_pij.out
10.41 GB
-
BayPass_manual.pdf
557.82 KB
-
baypass_r0.98_d3_L3_M30_q0.99_a35.txt
379.52 MB
-
core_freq_PCA.csv
176.49 KB
-
core_model_mat_omega.out
119.81 KB
-
go_core_out.txt
1.07 KB
-
go_core.txt
7.07 KB
-
go_edge_out.txt
2.42 KB
-
go_edge.txt
1.90 KB
-
go_founder_out.txt
6.30 KB
-
go_founder.txt
13.54 KB
-
go_shuffled_out.txt
2.74 KB
-
go_shuffled.txt
7.44 KB
-
GO_summary.txt
5.34 KB
-
README.md
4.35 KB
-
suppmat_outlier_df.csv
588.37 KB
-
tribolium.covariates
772 B
-
tribolium.poolsize
288 B
-
Z.csv
2.09 KB
Abstract
How repeatable is evolution at genomic and phenotypic scales? We studied the repeatability of evolution during 8 generations of colonization using replicated microcosm experiments with the red flour beetle, Tribolium castaneum. Based on the patterns of shared allele frequency changes that occurred in populations from the same generation or experimental location, we found adaptive evolution to be more repeatable in the introduction and establishment phases of colonization than in the spread phase, when populations expand their range. Lastly, by studying changes in allele frequencies at conserved loci, we found evidence for the theoretical prediction that range expansion reduces the efficiency of selection to purge deleterious alleles. Overall, our results increase our understanding of adaptive evolution during colonization, demonstrating that evolution can be highly repeatable, while also showing that stochasticity still plays an important role.
README: Evolution is more repeatable in the introduction than range expansion phase of colonization
https://doi.org/10.5061/dryad.0k6djhb6t
Code and data associated with analysis and figures.
Sharing/Access information
The full list of NCBI Sequence Read Archive accessions can be found at https://dx.doi.org/10.6084/m9.figshare.c.4440284
Code/Software
Figures were generated with R, scripts split up by figure/analysis type.
Details by file
Data
BayPass_manual.pdf Baypass Software Manual (Version 2.1). Includes complete description of output files and the variables therein starting on page 21.
Z.csv
- Key relating sequence library names to treatments. Columns are sample prefix and dummy variables to indicate which treatment each sample was from (founder, core, edge, shuffled).
baypass_r0.98_d3_L3_M30_q0.99_a35.txt, tribolium.covariates, tribolium.poolsize
- Input files required to run Baypass and generate outlier loci. See BayPass_manual.pdf for more details about file formats.
aux_model_summary_yij_pij.out, aux_model_summary_betai.out, core_model_mat_omega.out
- Unmodified output files generated by Baypass required or downstream analyses. See BayPass_manual.pdf for more details about file formats.
suppmat_outlier_df.csv
Merged information for all outlier loci including genome position and Bayes factors. The first columns (COVARIABLE,MRK,M_Beta,SD_Beta,M_Delta,BF.dB) are generated by Baypass (see page 23 of the BayPass_manual.pdf for description of the (outprefix_)summary_pij.out file format). The subsequent columns are:
- treatment - which treatment (founder, core, edge, shuffled) the outlier locus the outlier was found in.
- chrom - The chromosome name for the outlier locus
- pos - The zero-indexed position of the outlier locus
- pos2 - The one-indexed position of the outlier locus
- ref_alt - The reference and alternate nucleotide found at the locus
- freq - The reference allele frequency at the locus
chrom_num - Simplified chromosome name for the outlier locus used for plotting
The following columns following the VCF format, some of which are redundant with the previous columns but were kept as a sanity check after merging several data frames.
CHROM - Identical to chrom
POS - Identical to pos
ID - Unique identifier used for filtering consisting of values from chrom, pos, and ref_alt
REF - The reference nucleotide found at the locus
ALT - The alternate nucleotide found at the locus
QUAL - Unused column mandatory for VCF format and kept as "." following conventions for the format.
FILTER - Unused column mandatory for VCF format and kept as "." following conventions for the format.
INFO - Additional information about the locus. In this case output from the Variant Effect Predictor (VEP) software indicting what functional part of the genome the locus fell into (if any).
core_freq_PCA.csv
- Principal components of allele frequencies
GO_summary.txt
- Final GO output containing p values averaged across genes and treatments that were discovered in.
go_shuffled.txt, go_founder.txt, go_edge.txt, go_core.txt
- Genes identified with outliers used as input for GO analyses
go_shuffled_out.txt, go_founder_out.txt, go_edge_out.txt, go_core_out.txt
- Output files for GO analyses
software
- fais_baypass.py
- Script for making input for Baypass from bam files
- make_files.sh
- Steps for running Baypass
- helper_functions.R
- functions and file loading used in other scripts
- go.R
- Steps for selecting genes for GO analysis
- TABLE2_allsites_catogries.R
- Steps for identifying where loci of interest fall with respect to genome content.
- get_conserved_pij.py
- Helper script to filter for conserved loci.
- FIGURE1_pca_viz.R
- Steps to create PCA figure.
- FIGURE2_baypass_BFmc.R
- Steps to create outlier figure.
- cons_pipeline.sh
- Steps to indentify conserved loci.
- run_mummer.sh
- Script to run mummer
- FIGURE5_conserved_sites.R
- Steps to create distribution of allele frequencies at conserved loci.