Understanding natural selection's effect on genetic variation is a major goal in biology, but the genome-scale consequences of contemporary selection are not well known. In a release and recapture field experiment we transplanted stick insects to native and novel host plants and directly measured allele frequency changes within a generation at 186 576 genetic loci. We observed substantial, genome-wide allele frequency changes during the experiment, most of which could be attributed to random mortality (genetic drift). However, we also documented that selection affected multiple genetic loci distributed across the genome, particularly in transplants to the novel host. Host-associated selection affecting the genome acted on both a known colour-pattern trait as well as other (unmeasured) phenotypes. We also found evidence that selection associated with elevation affected genome variation, although our experiment was not designed to test this. Our results illustrate how genomic data can identify previously underappreciated ecological sources and phenotypic targets of selection.
mean genotypes
This file contains our point estimate of each individuals genotype for each locus. There are 186,575 rows with one row per locus. The genotype is given as the number of reference alleles (between 0 and 2) and is estimated at SUM_g = {0, 1, 2} g * Pr (g).
mngenTimemaAll.txt
read counts
This compressed direcotry contains one file for each experimental population labeled by the block number and host plant treatment (A = Adenostoma, C = Ceanothus) that includes the sequence read data. The data for each SNP begins with a line that gives the locus (GBS contig number) and position (position within the contig). This is followed by one line (row) per individual with the count or number of sequences with the non-reference and reference allele.
SnpCnts.tar.gz
source code for null models
This compressed directory includes the C++ source code we used to simulate expected allele frequency change via genetic drift. This software requires GSL. If the appropriate libraries are in standard locations the software should compile with, g++ -O3 -o simdrift main.C func.C -lgsl -lm -lgslcblas. Type the program name (e.g., simdrift) after compiling for usage instructions.
SimNullMods.tar.gz
soource code to estimate selection coefficients
This source code implements a Bayesian model to estimate selection coefficients. The software was written in C++ and requires GSL to compile. The software takes a genotype and a phenotype file (the phenotypes are just 0 or 1 and indicate whether an individual died = 0 or lived = 1). The input files contain the following,
genotype file:
line 1: number of loci, number of populations
line 2: pop1 sample size, pop2 sample size, ...
line 3: block for pop1, block for pop2, ....
line 4: treatment for pop1, treatment for pop2, ....
line 5-N: population number, locus number, individual number, Pr(AA), Pr(Aa), Pr(aa)
phenotype file:
line 1: phenotypes for pop1 (in the same order as in the genotype file)
line 2: phenotypes for pop2
Here are the command line options,
./wigs version 0.1 -- 21 August 2012
Usage: wigs -g genotypefile -s survivalfile [options]
-g Infile with genotype posterior probabilities
-s Infile with suvival data for each individual
-x Number of Markov chain Monte Carlo steps
-b Number of MCMC steps to discard as a burnin
-t Specify chain thinning, write samples every nth step
-d Prior probability a SNP experienced selection [0.5]
-f Outfile prefix
wigs.tar.gz
reference sequence
This FASTA file contains the GBS-based reference sequence that we assembled GBS reads to. The consensus sequence of each GBS contig is separated by a string of 30 N's.
mod_ref.fasta