Skip to main content

Genotyping of marine sticklebacks - Predicting future from past: The genomic basis of recurrent and rapid stickleback evolution

Cite this dataset

Jones, Felicity (2021). Genotyping of marine sticklebacks - Predicting future from past: The genomic basis of recurrent and rapid stickleback evolution [Dataset]. Dryad.


The data provided here was used in the following manuscript:

Predicting future from past: The genomic basis of recurrent and rapid stickleback evolution

Garrett A Roberts Kingman, Deven N Vyas, Felicity C Jones, Shannon D Brady, Heidi I Chen, Kerry Reid, Mark Milhaven, Thomas S Bertino, Windsor E Aguirre, David C Heins, Frank A von Hippel, Peter J Park, Melanie Kirch, Devin M Absher, Richard M Myers, Federica Di Palma, Michael A Bell*, David M Kingsley*, Krishna R Veeramah*

Similar forms often evolve repeatedly in nature, raising longstanding questions about the underlying mechanisms. Here we use repeated evolution in sticklebacks to identify a large set of genomic loci that change recurrently during colonization of new freshwater habitats by marine fish. The same loci used repeatedly in extant populations also show rapid allele frequency changes when new freshwater populations are experimentally established from marine ancestors. Dramatic genotypic and phenotypic changes arise within 5-7 years, facilitated by standing genetic variation and linkage between adaptive regions. Both the speed and location of changes can be predicted using empirical observations of recurrence in natural populations or fundamental genomic features like allelic age, recombination rates, density of divergent loci, and overlap with mapped traits. A composite model trained on these stickleback features can also predict the location of key evolutionary loci in Darwin’s finches, suggesting similar features are important for evolution across diverse taxa.


SNP discovery and genotyping array design.  

An Illumina GoldenGate custom SNP genotyping array was designed to genotype large numbers of individuals at SNPs tagging previously identified adaptive loci that have diverged in parallel among marine and freshwater fish throughout the stickleback species range (Jones et al 2012). SNPs were selected to tag 72 "adaptive" genomic regions (235 SNPs) and 50 "random" and putatively neutral genomic regions (128 SNPs) defined by cyclic-rotation of the bed intervals of the tagged adaptive regions by a random integer to ensure roughly equal linkage-disequilibrium among adaptive loci and random loci.

Tagged SNPs were ascertained for the array using variant call data from Illumina short read sequencing of 10 marine and 11 freshwater fish (Jones et al, 2012).  For targeted adaptive loci, between 3 to 5 SNPs with an allele frequency difference >=0.9 and with alternate alleles present in 4 or more individuals of each ecotype were selected for each target region.  For "random" loci between 2-5 SNPs with a minor allele frequency between 0.35 to 0.65 in the combined sample of 21 fish were selected. Where possible SNPs with nearby variants in the flanking 60 base pairs were excluded, and the SNPs with the highest 'design score' from Illumina's design verification process were selected for the genotyping array.

Sample collections for SNP Genotyping array. 

751, 655 and 237 marine stickleback fish sampled from Rabbit Slough, Resurrection Bay, and Glacier Spit, Alaska respectively, were used for genotyping with a custom SNP genotyping array. 

SNP genotyping array samples.

DNA from SNP genotyping array samples were extracted from clipped spines or fins using overnight proteinase K digestion followed by phenol:chloroform isolation as previously described previously (Peichel et al, 2001). DNA concentration was quantified using a nanodrop spectrophotometer.

SNP genotyping. 

SNP genotyping was performed on 250ng of genomic DNA using Illumina GoldenGate 384 custom SNP arrays (Illumina) according to manufacturer's protocols. Genotype calls were made using GenomeStudio 2.0 software (Illumina), with results from each SNP individually inspected to verify genotype clusters.

Ancestral Sequence imputation.

The ancestral genome sequence for Gasterosteus was imputed by comparison of Gasterosteus aculeatus with two outgroup species (Gasterosteus wheatllandi and Pungitius pungitius). The resulting ancestral genome sequence is available here in fasta sequence format. Please see supplementary information for more details.

Usage notes

The genotyping dataset is in Plink format with gasAcu1-4 genome coordinates.

The imputed ancestral genome sequence is in fasta format.