How repeatable is evolution at genomic and phenotypic scales? We studied the repeatability of evolution during 8 generations of colonization using replicated microcosm experiments with the red flour beetle, Tribolium castaneum. Based on the patterns of shared allele frequency changes that occurred in populations from the same generation or experimental location, we found adaptive evolution to be more repeatable in the introduction and establishment phases of colonization than in the spread phase, when populations expand their range. Lastly, by studying changes in allele frequencies at conserved loci, we found evidence for the theoretical prediction that range expansion reduces the efficiency of selection to purge deleterious alleles. Overall, our results increase our understanding of adaptive evolution during colonization, demonstrating that evolution can be highly repeatable, while also showing that stochasticity still plays an important role.

Code and data associated with analysis and figures.

Sharing/Access information

The full list of NCBI Sequence Read Archive accessions can be found at https://dx.doi.org/10.6084/m9.figshare.c.4440284

Code/Software

Figures were generated with R, scripts split up by figure/analysis type.

Details by file

Data

BayPass_manual.pdf Baypass Software Manual (Version 2.1). Includes complete description of output files and the variables therein starting on page 21.
Z.csv
- Key relating sequence library names to treatments. Columns are sample prefix and dummy variables to indicate which treatment each sample was from (founder, core, edge, shuffled).
baypass_r0.98_d3_L3_M30_q0.99_a35.txt, tribolium.covariates, tribolium.poolsize
- Input files required to run Baypass and generate outlier loci. See BayPass_manual.pdf for more details about file formats.
aux_model_summary_yij_pij.out, aux_model_summary_betai.out, core_model_mat_omega.out
- Unmodified output files generated by Baypass required or downstream analyses. See BayPass_manual.pdf for more details about file formats.
suppmat_outlier_df.csv
- Merged information for all outlier loci including genome position and Bayes factors. The first columns (COVARIABLE,MRK,M_Beta,SD_Beta,M_Delta,BF.dB) are generated by Baypass (see page 23 of the BayPass_manual.pdf for description of the (outprefix_)summary_pij.out file format). The subsequent columns are:
  - treatment - which treatment (founder, core, edge, shuffled) the outlier locus the outlier was found in.
  - chrom - The chromosome name for the outlier locus
  - pos - The zero-indexed position of the outlier locus
  - pos2 - The one-indexed position of the outlier locus
  - ref_alt - The reference and alternate nucleotide found at the locus
  - freq - The reference allele frequency at the locus
  - chrom_num - Simplified chromosome name for the outlier locus used for plotting
    
    The following columns following the VCF format, some of which are redundant with the previous columns but were kept as a sanity check after merging several data frames.
  - CHROM - Identical to chrom
  - POS - Identical to pos
  - ID - Unique identifier used for filtering consisting of values from chrom, pos, and ref_alt
  - REF - The reference nucleotide found at the locus
  - ALT - The alternate nucleotide found at the locus
  - QUAL - Unused column mandatory for VCF format and kept as "." following conventions for the format.
  - FILTER - Unused column mandatory for VCF format and kept as "." following conventions for the format.
  - INFO - Additional information about the locus. In this case output from the Variant Effect Predictor (VEP) software indicting what functional part of the genome the locus fell into (if any).
core_freq_PCA.csv
- Principal components of allele frequencies
GO_summary.txt
- Final GO output containing p values averaged across genes and treatments that were discovered in.
go_shuffled.txt, go_founder.txt, go_edge.txt, go_core.txt
- Genes identified with outliers used as input for GO analyses
go_shuffled_out.txt, go_founder_out.txt, go_edge_out.txt, go_core_out.txt
- Output files for GO analyses

software

fais_baypass.py
- Script for making input for Baypass from bam files
make_files.sh
- Steps for running Baypass
helper_functions.R
- functions and file loading used in other scripts
go.R
- Steps for selecting genes for GO analysis
TABLE2_allsites_catogries.R
- Steps for identifying where loci of interest fall with respect to genome content.
get_conserved_pij.py
- Helper script to filter for conserved loci.
FIGURE1_pca_viz.R
- Steps to create PCA figure.
FIGURE2_baypass_BFmc.R
- Steps to create outlier figure.
cons_pipeline.sh
- Steps to indentify conserved loci.
run_mummer.sh
- Script to run mummer
FIGURE5_conserved_sites.R
- Steps to create distribution of allele frequencies at conserved loci.

Evolution is more repeatable in the introduction than range expansion phase of colonization

Data files

Abstract

Sharing/Access information

Code/Software

Details by file

Data

software

Evolution is more repeatable in the introduction than range expansion phase of colonization

Data files

Abstract

README: Evolution is more repeatable in the introduction than range expansion phase of colonization

Sharing/Access information

Code/Software

Details by file

Data

software

Works referencing this dataset