70K SNP array data for Lumpfish (Cyclopterus lumpus) across the trans-Atlantic
Data files
Apr 24, 2024 version files 144.53 MB
-
Lump_recluster_downsampled_MAF.map
1.70 MB
-
Lump_recluster_downsampled_MAF.ped
142.82 MB
-
README.md
5.41 KB
Abstract
In marine species with large populations and high dispersal potential, large-scale genetic differences and clinal trends in allele frequency can provide insight into the evolutionary processes that shape diversity. Lumpfish, Cyclopterus lumpus, is found throughout the North Atlantic and has traditionally been harvested for roe and more recently used as a cleaner fish in salmon aquaculture. We used a 70K SNP array to evaluate trans-Atlantic differentiation, genetic structuring, and clinal variation across the North Atlantic. Basin-scale structuring between the Northeast and Northwest Atlantic was significant, with enrichment for loci associated with developmental/mitochondrial function. We identified a putative structural variant on chromosome 2, likely contributing to differentiation between Northeast and Northwest Atlantic Lumpfish, and consistent with post-glacial trans-Atlantic secondary contact. Redundancy Analysis identified climate associations both in the Northeast (N = 1269 loci) and Northwest (N = 1637 loci), with 103 shared loci between them. Clinal patterns in allele frequencies were observed in some loci (15% - Northwest and 5% - Northeast) of which 708 loci were shared and involved with growth, developmental processes, and locomotion. The combined evidence of trans-Atlantic differentiation, environmental associations, and clinal loci, suggests that both regional and large-scale potentially-adaptive population structuring is present across the North Atlantic.
https://doi.org/10.5061/dryad.j3tx95xp3
This dataset contains SNPs in .ped/.map format.
Lumpfish has been listed as near threatened in the North Atlantic, however this assessment has not ben updated since 2013 (IUCN Red List). Therefore in the below table, we have provided approximate latitude and longitudes in order to prevent unintended risks to these potentially threatened populations.
Description of the data and file structure
The data is in .ped/.map format. The ped file contains columns with data in the following order: Family ID, Individual ID, Paternal ID, Maternal ID, Sex, Affection, and Genotypes. The order of the genotypes is explained by the .map file which has columns in the following order: Chromosome, SNP name, Genetic position, and Physical position. There is no data present for Paternal ID, Maternal ID, Sex, and Genetic position, therefore all these columns contain zeros. Missing data for Affection (all of it is missing) is represented by a -9. Even though there is no data in these columns, they need to be present for different programs to be able to read the files in these formats. Column headers cannot be present in the files, otherwise they will not work in different programs.
Example line #1 from .ped file:
BAI BAI_2019_001_K8.CEL 0 0 0 -9 G G C C G G C T T C T C A G...
Where BAI is the Family ID, BAI_2019_001_K8.CEL is the Individual ID, 0 0 0 -9 is the missing data for Paternal ID, Maternal ID, Sex, and Affection, and G G C C G G C T T C T C A G... is the genotype for this individual.
Example line #1 from .map file:
1 AX-298066833 0 22038
Where 1 is the chromosome, AX-298066833 is the SNP name, 0 is the genetic position, and 22038 is the physical position.
The following table contains all information required for translating Family IDs in the SNP data, including population codes, latitude and longitude, and the number of sampled individuals per location (N):
| Continent | Location | Code | Latitude | Longitude | N |
|---|---|---|---|---|---|
| Europe | Iceland | IC* | 63.9 | -22.6 | 14 |
| Europe | Iceland | ISH | 65.8 | -20.3 | 30 |
| Europe | Iceland | IVO | 65.7 | -14.8 | 30 |
| Europe | Faroe Isl. | FI | 62.1 | -6.5 | 15 |
| Europe | Norway | NO1 | 63.0 | 7.4 | 15 |
| Europe | Norway | NO2* | 64.4 | 11.3 | 8 |
| Europe | Norway | NO3 | 59.0 | 5.8 | 7 |
| Europe | Denmark | DE | 55.5 | 12.2 | 8 |
| Europe | Sweden | SW | 58.3 | 19.1 | 5 |
| Europe | Scotland | OH | 57.2 | -6.1 | 13 |
| Europe | Ireland | IR | 52.1 | -10.3 | 13 |
| Europe | Ireland | KEI | 51.8 | -10.3 | 30 |
| Europe | United Kingdom | UK1 | 49.4 | -2.6 | 15 |
| Europe | United Kingdom | UK2 | 50.6 | -2.4 | 15 |
| North America | NL, Canada | COH | 51.5 | -55.5 | 30 |
| North America | NL, Canada | NIP | 49.7 | -55.8 | 30 |
| North America | NL, Canada | TNR* | 48.6 | -53.5 | 30 |
| North America | NL, Canada | CHA | 48.3 | -53.2 | 30 |
| North America | NL, Canada | WIT | 47.2 | -52.7 | 30 |
| North America | NL, Canada | BAI | 47.1 | -54.8 | 30 |
| North America | Gulf of St. Lawrence, Canada | GSL | various (53 sites) | various (53 sites) | 105 |
| North America | NS, Canada | PPG | 44.6 | -66.8 | 13 |
| North America | Gulf of Maine, USA | US1 | 44.9 | -67.0 | 14 |
| North America | Gulf of Maine, USA | BIM | 44.5 | -67.6 | 6 |
| North America | Gulf of Maine, USA | FBM | 44.4 | -68.2 | 19 |
| North America | Gulf of Maine, USA | US2 | 44.4 | -68.2 | 14 |
| North America | Gulf of Maine, USA | MSB | 42.8 | -70.5 | 15 |
Code/Software
Code that has been used in this submission can be found at https://github.com/babslangille/Genomic-differentiation-structural-variants-and-clinal-variation
Sampling and DNA preparation
A total of 570 individuals were sampled from 26 locations across the North Atlantic Ocean from 2009 to 2019 (Fig. 1, Table 1) in conjunction with scientific research surveys (trawl and beach seines) and commercial fishing activities. There was a total of 11 sites in the NW Atlantic (the same NW sites as in Langille et al. (2023), although in this study samples have been downsampled): Newfoundland (N = 369), Gulf of Saint Lawrence (GSL; N = 108), New Brunswick (N = 13), and the Gulf of Maine (N = 70). There were a total of 15 sites in the NE Atlantic: Iceland (N = 134), the Faroe Islands (N = 15), Norway (N = 30), Denmark (N = 8), Sweden (N = 5), Scotland (N = 13), Ireland (N = 43), and the United Kingdom (N = 30). One site from Iceland (IC), one from Norway (NO2), and one from Newfoundland (TNR) were samples of juveniles; all other samples were of adults. Tissue samples were preserved in 95% ethanol and subsequent DNA extractions were performed using DNeasy Blood and Tissue kits (Qiagen) according to manufacturer’s protocols. Genomic DNA was visualized by 1% agarose gel electrophoresis and quantified using Quant-iT PicoGreen ds-DNA Assay kits (Thermofisher) on a fluorescent plate reader. Genomic DNA was normalized to 15 ng/ml and sent to the Centre of Integrative Genomics (CIGENE, Ås, Norway) for genotyping on the Lumpfish 70K SNP Affymetrix Axiom array (produced by CIGENE and Aquagen).
70K SNP array filtering
A custom Affymtrix 70K SNP array (developed by Aquagen and CIGENE) was mapped to the Lumpfish genome from North America (Holborn et al., 2022). Sample/SNP level filtering was performed as quality control in Plink 1.9 (Chang et al., 2015); the SNP set was filtered by individuals with high genotyping (--mind) 0.05, sample call rates (--geno) 0.05, and minor allele frequency (--maf) 0.01. Individuals were filtered by relatedness in Plink, using the –genome flag and a pi-hat threshold of 0.35.
- Langille, Barbara L; Kess, Tony; Nugent, Cameron M et al. (2024). Trans-Atlantic genomic differentiation and parallel environmental and allelic variation in Lumpfish (Cyclopterus lumpus). ICES Journal of Marine Science. https://doi.org/10.1093/icesjms/fsae057
