Atlantic herring population baseline (genotypes) for genetic stock identification
Data files
Oct 23, 2024 version files 4.93 MB
-
Pub_genotypes_baseline.xlsx
882.01 KB
-
Pub_genotypes_mixed.csv
3.98 MB
-
Pub_station.xlsx
33.64 KB
-
Pub_Supplementary_Table_1.xlsx
20.53 KB
-
README.md
9.87 KB
Abstract
Sustainable fisheries management is important for the continued harvest of the world’s marine resources, especially as they are increasingly challenged by a range of climatic and anthropogenic factors. One of the pillars of sustainable fisheries management is the accurate identification of the biological units, i.e., populations. Here, we developed and implemented a genetic baseline for Atlantic herring harvested in the Norwegian offshore fisheries to investigate the validity of the current management boundaries. This was achieved by genotyping >15,000 herring from the northern European seas, including samples of all the known populations in the region, with a panel of population-informative SNPs mined from existing genomic resources. The final genetic baseline consisted of ~1,000 herring from 12 genetically distinct populations. We thereafter used the baseline to investigate mixed catches from the North and Norwegian Seas, revealing that each management area consisted of multiple populations, as previously suspected. However, substantial numbers (up to 50% or more within a sample) of herring were found outside of their expected management areas, e.g., North Sea autumn-spawning herring north of 62°N (average = 19.2%), Norwegian spring-spawning herring south of 62°N (average = 13.5%), and western Baltic spring-spawning herring outside their assumed distribution area in the North Sea (average = 20.0%). Based upon these extensive observations, we conclude that the assessment and management areas currently in place for herring in this region need adjustments to reflect the populations present. Furthermore, we suggest that for migratory species, such as herring, a paradigm shift from using static geographic stock boundaries towards spatial dynamic boundaries is needed to meet the requirements of future sustainable management regimes.
README: Atlantic herring population baseline (genotypes) for genetic stock identification
https://doi.org/10.5061/dryad.rfj6q57kb
Description of the data and file structure
Genetic samples of known herring populations were collected to establish a genetic baseline for population identification. This baseline was used to assign unknown individuals from mixed-populations samples to the population of its origin.
Files and variables
File: Pub_station.xlsx
Description: Overview of mixed-population samples
Variables
- Year: collecting year
- serialnumber: ID for the sample/haul
- Year_serial: Unique identifier for each sample/haul
- Type: Defines the type of the samples (commercial vs. scientific). In the case of scientific samples, the survey is provided: HERAS = HERring Acoustic Survey; IESNS = International Ecosystem Survey in the Nordic Seas; IESSNS = International Ecosystem Summer Survey in the Nordic Seas
- longitudestart: Longitude of the sample/haul
- latitudestart: Latitude of the sample/haul
- No: Number of individuals per sample/haul
File: Pub_Supplementary_Table_1.xlsx
Description: Information about the 59 SNPs used to establish the baseline. For each SNP (locus), diploid genotypes, i.e., two alleles per locus, are provided where A = Adenine, C = Cytosin, G = Guanine, and T = Thymine.
Variables
- Chromosome: Chromosome of the herring genome where the SNP/locus is located
- Position: Exact position of the SPN/locus (bp)
- SNP-primer-ID: ID of the SNP/locus used in the genotypes datasets
- Number genotypes: Number of genotypes for the SNP/locus
- Splits: Information about the splitting characteristic of the SNP/locus
- Flanking region: Flanking region needed to produces primers to extract the specific SNP/locus
File: Pub_genotypes_baseline.xlsx
Description: Genotypes of the baseline individuals. The dataset includes 59 SNPs used to establish the baseline. For each SNP (locus), diploid genotypes, i.e., two alleles per locus, are provided where A = Adenine, C = Cytosin, G = Guanine, and T = Thymine.
Missing genotypes or missing variables are marked with "n/a" referring to "not available".
The file includes 2 separate sheets: "Final Baseline samples" and "All Baseline samples". The first sheets includes only individuals used for establishing the final baseline presented in the paper. Whereas the second sheet, includes all baseline individuals considers during the process of establishing the final baseline. Population abbreviations are explained in the table below, as well as which sampling sites represent the final baseline samples. The data file Pub_Supplementary_Table_1.xlsx contains information about the specific genotypes (SNPs).
Sampling site | Pop_short | Baseline |
---|---|---|
Gulf of Riga | BASH-GR | BASH |
Bornholm Basin | BASH-CB | BASH |
Western Baltic Sea (Rügen) | BASH-RU | BASH |
Vistula Lagoon (Gdańsk) | CBSS-GD | CBSS |
Gulf of Finland | CBSS-GF | CBSS |
Southern Bight | Downs | Downs |
North Sea | NSAS | NSAS |
Norwegian Sea | NSS | NSS |
Kattegat | WBSS-KA | WBSS |
Ringkøbing Fjord | WBSS-RF | WBSS |
Western Baltic Sea (Rügen) | WBSS-RU | WBSS |
Skagerrak East (Öckerö) | WBSS-SKE | WBSS-SK |
Skagerrak West (Høvåg) | WBSS-SKW | WBSS-SK |
The Minch, West of Scotland | Sp-6aN | Sp-6a |
Donegal | Sp-6aS | Sp-6a |
Rossfjordvannet | ROSSFJ | Pacific-Hybrids |
Balsfjorden | BALSFJ | Pacific-Hybrids |
Gloppenfjorden | GLOPFJ | Local-Fjords |
Lindåspollen | Lindas | Local-Fjords |
Lustrafjorden | LUFJ | Local-Fjords |
Sognefjorden | SOGNEFJ | Local-Fjords |
Trondheimsfjorden | THF | THF |
North Atlantic (Faroe Islands) | FASH | NASS |
North Atlantic (Iceland) | ISSH | NASS |
North Atlantic (Lofoten) | NASH | NASS |
Kirkefjorden | KIRKFJ | Removed from the final baseline |
Austefjorden | AUFJ | Removed from the final baseline |
Dalsfjorden | DAFJ | Removed from the final baseline |
Gursken | GURS | Removed from the final baseline |
Hjørundfjorden | HJFJ | Removed from the final baseline |
Romsdalsfjorden | ROMSFJ | Removed from the final baseline |
Sykkylven | SYKK | Removed from the final baseline |
Volda | VOL | Removed from the final baseline |
NSS-Strandfjorden | NSSSF | Removed from the final baseline |
Variables
- Baseline: Abbreviation of the genetic population used in the Baseline
- Unifier: A unique ID for each individual
- Pop_short: Abbreviation for the actual spawning site
- UherIMR03: genotype
- UherIMR04: genotype
- UherIMR06: genotype
- UherIMR07: genotype
- UherIMR08: genotype
- UherIMR09: genotype
- UherIMR13: genotype
- UherIMR14: genotype
- UherIMR15: genotype
- UherIMR18: genotype
- UherIMR21: genotype
- UherIMR22: genotype
- UherIMR24: genotype
- UherIMR27: genotype
- UherIMR28: genotype
- UherIMR32: genotype
- UherIMR33: genotype
- UherIMR34: genotype
- UherIMR35: genotype
- Uher_10_008: genotype
- Uher_100_037: genotype
- Uher_101_038: genotype
- Uher_122_151: genotype
- Uher_14_010: genotype
- Uher_148_057: genotype
- Uher_1523_141: genotype
- Uher_185: genotype
- Uher_188: genotype
- Uher_189: genotype
- Uher_192: genotype
- Uher_2123_144: genotype
- Uher_222: genotype
- Uher_224: genotype
- Uher_241_077: genotype
- Uher_244: genotype
- Uher_252: genotype
- Uher_256: genotype
- Uher_259: genotype
- Uher_274: genotype
- Uher_29_014: genotype
- Uher_291: genotype
- Uher_294: genotype
- Uher_325: genotype
- Uher_330: genotype
- Uher_333: genotype
- Uher_337: genotype
- Uher_342: genotype
- Uher_343: genotype
- Uher_346: genotype
- Uher_346_096: genotype
- Uher_347: genotype
- Uher_349: genotype
- Uher_356: genotype
- Uher_4386_147: genotype
- Uher_46_024: genotype
- Uher_5_007: genotype
- Uher_899_128: genotype
- Uher_958_131: genotype
- UherC12_15859613: genotype
File: Pub_genotypes_mixed.csv
Description: Genotypes of the individuals from mixed-popultion samples. These individuals have been assigned to their population of origin using the established baseline. This dataset also includes 59 SNPs which have been used to establish the baseline. For each SNP (locus), diploid genotypes, i.e., two alleles per locus, are provided where A = Adenine, C = Cytosin, G = Guanine, and T = Thymine.
Missing genotypes or missing variables are marked with "n/a" referring to "not available".
Variables
- Unifier: A unique ID for each individual
- Pop_short: Abbreviation for the survey and year
- Year: collecting year
- serialnumber: ID for the sample/haul
- Year_serial: Unique identifier for each sample/haul
- longitudestart: Longitude of the sample/haul
- latitudestart: Latitude of the sample/haul
- Type: Defines the type of the samples (commercial vs. scientific), and which survey it belongs to
- UherIMR03: genotype
- UherIMR04: genotype
- UherIMR06: genotype
- UherIMR07: genotype
- UherIMR08: genotype
- UherIMR09: genotype
- UherIMR13: genotype
- UherIMR14: genotype
- UherIMR15: genotype
- UherIMR18: genotype
- UherIMR21: genotype
- UherIMR22: genotype
- UherIMR24: genotype
- UherIMR27: genotype
- UherIMR28: genotype
- UherIMR32: genotype
- UherIMR33: genotype
- UherIMR34: genotype
- UherIMR35: genotype
- Uher_10_008: genotype
- Uher_100_037: genotype
- Uher_101_038: genotype
- Uher_122_151: genotype
- Uher_14_010: genotype
- Uher_148_057: genotype
- Uher_1523_141: genotype
- Uher_185: genotype
- Uher_188: genotype
- Uher_189: genotype
- Uher_192: genotype
- Uher_2123_144: genotype
- Uher_222: genotype
- Uher_224: genotype
- Uher_241_077: genotype
- Uher_244: genotype
- Uher_252: genotype
- Uher_256: genotype
- Uher_259: genotype
- Uher_274: genotype
- Uher_29_014: genotype
- Uher_291: genotype
- Uher_294: genotype
- Uher_325: genotype
- Uher_330: genotype
- Uher_333: genotype
- Uher_337: genotype
- Uher_342: genotype
- Uher_343: genotype
- Uher_346: genotype
- Uher_346_096: genotype
- Uher_347: genotype
- Uher_349: genotype
- Uher_356: genotype
- Uher_4386_147: genotype
- Uher_46_024: genotype
- Uher_5_007: genotype
- Uher_899_128: genotype
- Uher_958_131: genotype
- UherC12_15859613: genotype
Access information
Other publicly accessible locations of the data:
- Not applicable
Data was derived from the following sources:
- Not applicable
Methods
DNA was extracted from all herring (N = 18,138) in 96-well plates using the Qiagen DNeasy 96 Blood & Tissue Kit or by Beckman Coulter DNAdvance – Genomic DNA Isolation Kit on a Biomek i5 Automated Workstation following manufacturer’s instructions (Beckman Coulter 2021; Qiagen 2016).
A panel of SNPs distinguishing between populations assumed to occur in the Norwegian Sea and North Sea was established from mining genomic data from the herring genome (Han et al. 2020; Pettersson et al. 2019) following a similar set of criteria as described in Bekkevold et al. (2023). It should be noted that the 6a spring-spawning herring (Sp-6a) samples were added after the SNP panel was designed and as such the panel may not be optimized to discriminate this population from other populations. Otherwise, all baseline populations were considered during the SNP panel development. In short, SNPs were selected based on sequence data (Han et al. 2020) and their inferences about genomic regions influenced by selective sweeps, as well as association with characteristics such as spawning time, salinity, and geography. Additional markers identified by Han et al. (2020) to discriminate populations along the Norwegian coast and inside the fjords were also included. The final panel consisted of 60 SNPs (Suppl. Table 1) covering 20 chromosomes all associated with selective outlier regions from Han et al. (2020). In contrast to Bekkevold et al. (2023), which focused on establishing a SNP panel for population differentiation primarily in the North Sea-Baltic Sea transition area, the SNPs chosen here were selected to primarily differentiate populations likely present in the North Sea, Norwegian Sea and along the Norwegian coast. Therefore, the number of SNPs to discriminate among populations within the Baltic Sea, e.g., southern vs. northern central Baltic herring or WBSS from inner Danish water vs. WBSS from Rügen (Bekkevold et al., 2023), was reduced in the present panel to only identify the overall groups. One out of the 60 SNPs displayed linkage with the sex determination gene (Rafati et al. 2020), i.e. it was not informative on population/stock level, and was thus not included in the following baseline analysis. For the final panel, primers and unextended primers (UEP) were organized originally in 4 multiplexes but later in 3 multiplexes using the Assay Design software (Agena BioscienceÔ) for high throughput genotyping on an Agena MassARRAYÒ iPLEXÒ Platform (Agena Bioscience n.d.). We investigated how many individuals from the baseline samples would be removed by choosing three levels of acceptance of missing data (5.2%, 9.2% and 25%). Finally, a compromise threshold was decided at 25% to prevent depleting samples with few individuals. Along all the process of baseline building, efforts were made to balance the number of individuals per sample to avoid overrepresentation of populations. However, it is worth mentioning that in the final baseline, consisting of genotypes from 1,098 individuals, only 9 of the individuals (i.e., <1%) reached the maximum of 25% missing data whereas 416 individuals (38%) displayed ≤5.2% missing data.