Skip to main content

Genetics of mercury accumulation in Stickleback, genotypes and metal accumulation phenotypes

Cite this dataset

Calboli, Federico; Delahaut, Vyshal (2021). Genetics of mercury accumulation in Stickleback, genotypes and metal accumulation phenotypes [Dataset]. Dryad.


Anthropogenic stressors, such as pollutants, act as selective factors that can leave measurable changes in allele frequencies in the genome.  Metals are of particular concern among pollutants, because of interference with vital biological pathways. We use the three-spined stickleback as a model for adaptation to mercury pollution in natural populations. We collected sticklebacks from 21 locations in Flanders (Belgium), measured the accumulated levels of mercury in the skeletal muscle tissue, and genotyped the fish by sequencing (GBS). The spread of muscle mercury content across locations was considerable, ranging from 21.5 to 327 ng/g dry weight (DW). We then conducted a genome wide association study (GWAS) between 28,450 SNPs and the accumulated levels of mercury, using different approaches. Based on a linear mixed model analysis, the GWAS yielded multiple hits with a single top hit on Chromosome 4, with eight more SNPs (Single Nucleotide Polymorphism)  suggestive of association. A second approach, a latent factor mixed model analysis, highlighted one single SNP on Chromosome 11. Finally, an outlier test identified one additional SNP on chromosome 4 that appeared under selection. Out of all ten SNPs we identified as associated with mercury in muscle, three SNPs all located on Chromosome 4 and positioned within a 2.5 kbp distance of an annotated gene. Based on these results and the genome coverage of our SNPs, we conclude that the selective effect of mercury pollution in Flanders causes a significant association with at least one locus on Chromosome 4 in three-spined stickleback.


Please see description in the paper

Usage notes

The data comes as a R data file.  The objects in the file correspond to the phenotype, genotype, population, and genetic map.  The data dictionary is as:

fish25: an object with the genetic data coded as 0/1/2 (following the PLINK 1.9 convention -- see the PLINK 1.9 documentation), each line corresponding to a fish, each column corresponding to a SNP

bim25: an object containing the genotype map (corresponding to the BIM file used by PLINK). The first column is the chromosome name, the second column the SNP name, the third column the position in cM (unknown for all SNPs), the fourth column is the position in base pairs, the fifth and sixth columns are the two alleles.

fam25: an object containing the family data (corresponding to the FAM file used by PLINK).  The first column is the population information, the second column is the individual ID. 

phenos2: an object with phenotype data.  The first column is the fish ID, the second is the code for the drainage basin, the third is the sampling location code, the fourth the sampling location name, the fifth and sixth are the coordinates, the seventh is the mercury in muscle, the eighth is the individuals' length, and the final column is the individuals' sex.


Research Foundation - Flanders, Award: G053317N

Scientific Research Network, Award: W0.037.10 N

Scientific Research Network, Award: W0.037.10 N