Data from: Reduced-representation sequencing detects trans-Arctic connectivity and local adaptation in polar cod (Boreogadus saida)
Data files
Jan 07, 2025 version files 55.51 MB
-
allele_frequencies_35adaptive.txt
21.61 KB
-
allele_frequencies_845neutral.txt
521.19 KB
-
pcod_1192.vcf
21.78 MB
-
pcod_35adaptive.vcf
642.85 KB
-
pcod_845neutral.vcf
15.55 MB
-
pcod_922.vcf
16.97 MB
-
RDA_Environmental_matrix.txt
12.05 KB
-
README.md
9.44 KB
Abstract
Information on connectivity and genetic structure of marine organisms remains sparse in frontier ecosystems such as the Arctic Ocean. Filling these knowledge gaps becomes increasingly urgent, as the Arctic is undergoing rapid physical, ecological, and socio-economic changes. The abundant and widely distributed polar cod (Boreogadus saida) is highly adapted to Arctic waters, and its larvae and juveniles live in close association with sea ice. Through a reduced-representation sequencing approach, this study explored the spatial genetic structure of polar cod at a circum-Arctic scale. Genomic variation was partitioned into neutral and adaptive components to respectively investigate genetic connectivity and local adaptation. Based on 922 high-quality single nucleotide polymorphism (SNP) markers genotyped in 611 polar cod, broad-scale differentiation was detected among three groups: (i) Beaufort-Chukchi seas, (ii) all regions connected by the Transpolar Drift, ranging from the Laptev Sea to Iceland, including the European Arctic, and (iii) West Greenland. Patterns of neutral genetic structure suggested broadscale oceanographic and sea ice drift features (i.e. Beaufort Gyre and Transpolar Drift) as important drivers of connectivity. Genomic variation at 35 outlier loci indicated adaptive divergence of the West Greenland, and the Beaufort-Chukchi Seas populations, possibly driven by environmental conditions. Sea ice decline and changing ocean currents can alter or disrupt connectivity between polar cod from the three genetic groups, potentially undermining their resilience to climate change, even in putative refugia, such as the Central Arctic Ocean and the Arctic Archipelago.
README: Data from: Reduced-representation sequencing detects trans-Arctic connectivity and local adaptation in polar cod (Boreogadus saida)
https://doi.org/10.5061/dryad.fqz612k30
Description of the data and file structure
This archive is linked to Maes, Verheye et al. (accepted), exploring connectivity patterns and local adaptation in Polar cod (Boreogadus saida), using single nucleotide polymorphisms (SNPs) datasets derived from RAD-sequencing. Data cover 9 ecoregions of the Arctic Ocean. Genetic data include VCF format datasets of (1) 1192 SNPs, unfiltered for Hardy-Weinberg equilibrium (hwe) and used for outlier analyses, (2) 922 SNPs filtered for hwe, (3) 845 putatively neutral SNPs, obtained by removing all detected outliers to the 922 SNPs dataset and (4) 35 putatively adaptive SNPs identified by at least two outlier detection methods (PCAdapt, OutFLANK, BayeScan). The environmental matrix and allele frequencies of putatively neutral and adaptive loci for 41 populations of Polar Cod used in the Redundancy Analyses are also included. Based on 922 high-quality SNPs genotyped in 611 polar cod, broad-scale differentiation was detected among three groups: (i) Beaufort-Chukchi seas, (ii) all regions connected by the Transpolar Drift, ranging from the Laptev Sea to Iceland, including the European Arctic, and (iii) West Greenland. Patterns of neutral genetic structure suggested broadscale oceanographic and sea ice drift features (i.e. Beaufort Gyre and Transpolar Drift) as important drivers of connectivity. Genomic variation at 35 outlier loci indicated adaptive divergence of the West Greenland, and the Beaufort-Chukchi Seas populations, possibly driven by environmental conditions.
Files and variables
File: RDA_Environmental_matrix.txt
Description: For Redundancy Analyses, we employed a set of 16 climate and biogeochemical variables, as potential drivers of genomic variation in polar cod, to characterise the seascape. We obtained these variables from a georeferenced dataset describing seawater conditions from the Copernicus Marine Environment Monitoring Service (CMEMS: https://marine.copernicus.eu/). This dataset (product ARCTIC_MULTIYEAR_PHY_002_003) includes the following oceanographic variables as monthly mean values with a 12.5 km resolution for the period 1991-2021 : sea surface temperature (‘SST’ ; °C), sea surface salinity (‘SSS’ ; [10-3]), mixed layer thickness (MLT; m), surface sea current and sea ice velocity (‘SCV’ and ‘SICV’; m/s), sea ice area fraction (‘SIA’; %) and sea ice thickness (‘SIT’ ; m). These variables were imported in R as stacks of monthly raster layers to compute the overall means and standard deviations and retrieve the values for each of the sampling sites. For ice-related variables (SIA and SIT), the maximum summer and winter values were calculated, as well as the overall standard deviation.
Variables
- Population: 41 populations of Polar Cod across 9 ecoregions of the Arctic Ocean
- SST_OM: Overall mean in sea surface temperature (°C)
- SST_OSD: Overall standard deviation in sea surface temperature (°C)
- SSS_OM: Overall mean in sea surface salinity [10−3]
- SSS_OSD: Overall standard deviation in sea surface salinity [10−3]
- SIA_SM: Summer mean in sea ice area fraction (%)
- SIA_WM: Winter mean in sea ice area fraction (%)
- SIA_OSD: Summer standard deviation in sea ice area fraction (%)
- SIT_SM: Summer mean in sea ice thickness (m)
- SIT_WM: Winter mean in sea ice thickness (m)
- SIT_OSD: Summer standard deviation in sea ice thickness
- SCV_OM: Overall mean in sea current velocity (m/s)
- SCV_OSD: Overall standard deviation in sea current velocity (m/s)
- SICV_OM: Overall mean in sea-ice current velocity (m/s)
- SICV_OSD: Overall standard deviation in sea-ice current velocity (m/s)
- MLT_OM: Overall mean in mixed layer thickness (m)
- MLT_OSD: Overall standard deviation in mixed layer thickness (m)
File: pcod_35adaptive.vcf
Description: This file was created as a product of genetic tissue sequencing. Genomic DNA was extracted from fin clips using the NucleoSpin® Tissue kit (Macherey-Nagel). A modified version of the Elshire et al. (2011) genotyping-by-sequencing (GBS) method was used, with a single restriction enzyme (PstI)* *and size selection (320-720 bp). Individuals were sequenced on an Illumina Novaseq platform 6000 (PE100). Stacks v2.5 was used to process the GBS data. Reads were aligned to a draft *Boreogadus saida *reference genome (GCA_900302515.1) using Bowtie2 v2.3.4.3. SNPs were called using default parameters in Stacks, which employs a Bayesian genotype caller, the results of which were exported to variant call format (vcf) files. After filtering on missing data, MAF, heterozygosity and linkage disequilibrium, a dataset of 1192 SNPs for 611 Polar Cod individuals was obtained. Three outlier detection methods (PCAdapt, OutFLANK, BayeScan) were applied on this 1192 SNPs dataset and the 35 putatively adaptive SNPs identified by at least two of the outlier detection methods were retained and exported in vcf format.
File: allele_frequencies_35adaptive.txt
Description: For Redundancy Analyses, allele frequencies of 35 putatively adaptive SNPs were computed for 41 populations of Polar Cod across the Arctic Ocean, using the makefreq function in R [adegenet package].
Variables
Number of variables: 35 putatively adaptive loci
Number of rows: 41 populations
Variable list: (numeric) allele frequencies
Data type: numeric
File: allele_frequencies_845neutral.txt
Description: For Redundancy Analyses, allele frequencies of 845 putatively adaptive SNPs were computed for 41 populations of Polar Cod across the Arctic Ocean, using the makefreq function in R [adegenet package].
Variables
Number of variables: 845 putatively adaptive loci
Number of rows: 41 populations
Variable list: (numeric) allele frequencies
File: pcod_1192.vcf
Description: This file was created as a product of genetic tissue sequencing. Genomic DNA was extracted from fin clips using the NucleoSpin® Tissue kit (Macherey-Nagel). A modified version of the Elshire et al. (2011) genotyping-by-sequencing (GBS) method was used, with a single restriction enzyme (PstI)* *and size selection (320-720 bp). Individuals were sequenced on an Illumina Novaseq platform 6000 (PE100). Stacks v2.5 was used to process the GBS data. Reads were aligned to a draft *Boreogadus saida *reference genome (GCA_900302515.1) using Bowtie2 v2.3.4.3. SNPs were called using default parameters in Stacks, which employs a Bayesian genotype caller, the results of which were exported to variant call format (vcf) files. After filtering on missing data, MAF, heterozygosity and linkage disequilibrium, a dataset of 1192 SNPs for 611 Polar Cod individuals was obtained.
File: pcod_922.vcf
Description: This file was created as a product of genetic tissue sequencing. Genomic DNA was extracted from fin clips using the NucleoSpin® Tissue kit (Macherey-Nagel). A modified version of the Elshire et al. (2011) genotyping-by-sequencing (GBS) method was used, with a single restriction enzyme (PstI)* *and size selection (320-720 bp). Individuals were sequenced on an Illumina Novaseq platform 6000 (PE100). Stacks v2.5 was used to process the GBS data. Reads were aligned to a draft *Boreogadus saida *reference genome (GCA_900302515.1) using Bowtie2 v2.3.4.3. SNPs were called using default parameters in Stacks, which employs a Bayesian genotype caller, the results of which were exported to variant call format (vcf) files. After filtering on missing data, MAF, heterozygosity, linkage disequilibrium and Hardy-Weinberg equilibrium, a dataset of 922 SNPs for 611 Polar Cod individuals was obtained.
File: pcod_845neutral.vcf
Description: This file was created as a product of genetic tissue sequencing. Genomic DNA was extracted from fin clips using the NucleoSpin® Tissue kit (Macherey-Nagel). A modified version of the Elshire et al. (2011) genotyping-by-sequencing (GBS) method was used, with a single restriction enzyme (PstI)* *and size selection (320-720 bp). Individuals were sequenced on an Illumina Novaseq platform 6000 (PE100). Stacks v2.5 was used to process the GBS data. Reads were aligned to a draft *Boreogadus saida *reference genome (GCA_900302515.1) using Bowtie2 v2.3.4.3. SNPs were called using default parameters in Stacks, which employs a Bayesian genotype caller, the results of which were exported to variant call format (vcf) files. After filtering on missing data, MAF, heterozygosity and linkage disequilibrium, a dataset of 1192 SNPs for 611 Polar Cod individuals was obtained. Three outlier detection methods (PCAdapt, OutFLANK, BayeScan) were applied on this 1192 SNPs dataset and identified 77 SNPs as outliers. The latter SNPs were removed to the 922 SNPs dataset filtered for Hardy-Weinberg equilibrium, resulting in the present 845 SNPs 'putatively neutral' dataset, exported as vcf format.
Code/software
R Environment for Statistical Computing
R 4.4.1
vcf files were loaded in R using package vcfR.
The RDA was performed using the vegan R package.
Methods
A total of 652 polar cod samples were collected in the Central Arctic Ocean, Fram Strait and the Arctic Ocean shelves bordering Alaska, Canada, Russia, Greenland and Northern Europe, during several expeditions between 2003 and 2021. Fish were collected using bottom trawl, bongo net, Young fish trawl, Surface and Under-Ice Trawl, ROVnet, Multpelt 832 pelagic trawl, or zooplankton net. We collected fin clips of all fish sampled and stored them in 96 % ethanol. Genomic DNA was extracted from fin clips using the NucleoSpin® Tissue kit (Macherey-Nagel), following manufacturer’s instructions. A modified version of the Elshire et al. (2011) genotyping-by-sequencing (GBS) method was used, with a single restriction enzyme (PstI) and size selection (320-720 bp).Three GBS libraries were paired-end sequenced on an Illumina Novaseq platform 6000 (PE100) at the Genomics Core Leuven (www.genomicscore.be). Sequences were quality checked using FastQC v0.11.8 and Stacks v2.5 was used to process the GBS data. Reads were aligned to a draft Boreogadus saida reference genome (GenBank accession number GCA_900302515.1) using Bowtie2 v2.3.4.3. SNPs were called using default parameters in Stacks, which employs a Bayesian genotype caller, and filtered using VCFtools and various R packages (SNPRelate v1.28.0, poppr v.2.9.3, hierfstat v.0.5-11, pegas v1.1), resulting in SNP datasets for 611 individuals. For the Redundancy analysis, we employed a set of 16 climate and biogeochemical variables, obtained from a georeferenced dataset describing seawater conditions from the Copernicus Marine Environment Monitoring Service.