Skip to main content

Data from: Adaptation via pleiotropy and linkage: association mapping reveals a complex genetic architecture within the stickleback Eda locus

Cite this dataset

Archambeault, Sophie; Bärtschi, Luis; Merminod, Aurélie; Peichel, Catherine (2020). Data from: Adaptation via pleiotropy and linkage: association mapping reveals a complex genetic architecture within the stickleback Eda locus [Dataset]. Dryad.


Genomic mapping of the loci associated with phenotypic evolution has revealed genomic “hotspots”, or regions of the genome that control multiple phenotypic traits. This clustering of loci has important implications for the speed and maintenance of adaptation and could be due to pleiotropic effects of a single mutation or tight genetic linkage of multiple causative mutations affecting different traits. The threespine stickleback (Gasterosteus aculeatus) is a powerful model for the study of adaptive evolution because the marine ecotype has repeatedly adapted to freshwater environments across the northern hemisphere in the last 12,000 years. Freshwater ecotypes have repeatedly fixed a 16 kilobase haplotype on chromosome IV that contains Ectodysplasin (Eda), a gene known to affect multiple traits, including defensive armor plates, lateral line sensory hair cells, and schooling behavior. Many additional traits have previously been mapped to a larger region of chromosome IV that encompasses the Eda freshwater haplotype. To identify which of these traits specifically map to this adaptive haplotype, we made crosses of rare marine fish heterozygous for the freshwater haplotype in an otherwise marine genetic background. Further, we performed fine-scale association mapping in a fully interbreeding, polymorphic population of freshwater stickleback to disentangle the effects of pleiotropy and linkage on the phenotypes affected by this haplotype. Although we find evidence that linked mutations have small effects on a few phenotypes, a small 1.4 kb region within the first intron of Eda has large effects on three phenotypic traits: lateral plate count, and both the number and patterning of the posterior lateral line neuromasts. Thus, the Eda haplotype is a hotspot of adaptation in stickleback due to both a small, pleiotropic region affecting multiple traits as well as multiple linked mutations affecting additional traits.


Description of datasets: This work includes association mapping of multiple phenotypes to genetic markers within the Eda locus for three different sets of stickleback collections: (1) lab crosses of Puget Sound marine stickleback heterozygous for the freshwater Eda haplotype; (2) lab crosses of Puget Sound marine stickleback heterozygous for the freshwater NAKA SNP; and (3) wild-caught stickleback from the polymorphic, freshwater Lake Washington population. 

Please see the associated manuscript for sample processing, phenotypic trait measurements and genotyping details.

Genotypes: Genotypes for the Lake Washington wild-caught fish are presented in three versions: raw, trimmed, and filled. Raw genotypes are the collected genotypes and therefore contain missing information as not all fish were successfully genotyped at every marker. The trimmed and filled genotype files have been processed to remove missing data in two different ways as described in the manuscript. The results in the manuscript are based on the trimmed Lake Washington genotype file, but the filled genotype file gives qualitatively similar results. There were relatively few missing genotypes for the NAKA and Puget Sound datasets, so the raw genotypes are uploaded and used in the analysis.

Trait values & outliers: The trait values are uploaded in the unedited form as well as the trimmed form, where outliers have been removed. Outliers were considered values > 4 SD from the mean of the sample. Please see the manuscript for descriptions of additional removed outliers such as rare wild-caught fish that were outside the age class of the fish caught in the fall. The trimmed dataset is used in the uploaded analysis script.

Usage notes

Follow the instructions in the "calculating.LODs.and.p-vals.R" script to run the analyses of the three mapping datasets.