Data from: Application of a GT-seq SNP panel to quantify interannual variation in mixed-stock composition in the largest commercial fishery for Arctic Charr (Salvelinus alpinus) in Canada

Perrot, Océane 1 2 3 4 ; Harris, Les N.5; Ekaluktutiak Hunters and Trappers Organization (EHTO); Legagneux, Pierre1 3 6; Moore, Jean-Sébastien1 2 3 4; Bernatchez, Louis1 2 4

Published Mar 31, 2026 on Dryad. https://doi.org/10.5061/dryad.np5hqc07g

Data files

Mar 31, 2026 version files 7.44 MB

Data_generated_with_GTscore.zip

2.20 MB
GTscore.R

51.51 KB
GTseq_saal_all_samples_brut.zip

3.45 MB
GTseq_saal_clean.zip

1.73 MB
README.md

4.58 KB

Abstract

Aquatic resources are central to the global food supply and economy, particularly in the Arctic, where Inuit communities have long depended on wild resources. In the 1960s, the community of Iqaluktuuttiaq (Cambridge Bay, Nunavut, Canada) established commercial fisheries targeting Arctic Char (Salvelinus alpinus, L.), providing key employment and income. In a rapidly changing Arctic, molecular tools can enhance sustainable fisheries management. We applied a GT-seq (Genotyping-in-Thousands) panel of 377 SNP loci to distinguish five local stocks (Ekalluk, Halokvik, Jayko, Lauchlan, and Surrey) and quantify their contributions to annual harvests. A total of 1387 samples collected from four commercial sites over eight years (2012–2020) were analyzed to assess temporal variation in stock composition. Results revealed unequal stock contributions, with Ekalluk and Halokvik comprising over 75 % of catches. Spring harvests showed stable stock composition across years, while fall harvests exhibited significant interannual variation (p < 0.05). Environmental factors such as ice breakup timing (IB) and duration of 50 % marine ice cover (I50) had no significant influence on stock contribution variability, likely due to limited data. Thus, GTseq offers the opportunity to explore more adaptive management by capturing interannual variability in stock mixing.

Dataset DOI: 10.5061/dryad.np5hqc07g

Description of the data and file structure

This README describes the datasets and file structure associated with the genetic analyses.

Folder: GTseq_saal_all_samples_brut.zip

Description: This folder contains the raw data obtained from the four GT-seq sequencing runs. Below is a description of the variables in each file.

Files: AlleleReads_haplotypes.txt and AlleleReads_singleSNPs.txt

Sample names
Locus/Haplotype

File: GTscore_individualSummary.txt

Sample: sample IDs
Total Reads: total number of reads analyzed
Off-target Reads: reads that do not correspond to an expected GT-seq locus
Primer Only Reads: reads containing only the primer sequence(s), with no usable information from the locus
Primer Probe Reads: reads containing the expected primer + target sequence (or “probe”/internal) structure
Off-target Proportion = Off-target Reads / Total Reads
Primer Only Proportion = Primer Only Reads / Total Reads
Primer Probe Proportion = Primer Probe Reads / Total Reads

File: GTscore_locusSummary.txt

Locus: locus IDs
Primer Reads: total number of reads associated with this locus based on primer detection
Primer Probe Reads: number of reads for this locus where GTscore recognized both the locus primer and the expected in-silico probe for this SNP
Primer Probe Proportion: Primer Probe Reads / Primer Reads

File: LocusTable_haplotypes.txt

Description: detailed loci per haplotype

File: LocusTable_singleSNPs.txt

Description: detailed loci per SNP

File: sampleFiles

Description: sample IDs

Each line begins with a sample ID (e.g., saalBYRs_001_17)
sa = Salvelinus (genus)
al = alpinus (species)
BYR = locality
s (or m) = source (or mixture)
001 = individual number
17 = year of harvest

Folder: GTseq_saal_clean.zip

Description: This folder contains the cleaned dataset, with individuals removed when sequencing quality was insufficient or genotyping failed.

Files in this folder follow the same structure and variable definitions as those described above, but after removal of low-quality individuals and non-functional loci/SNPs.

Folder: Data_generated_with_GTscore.zip

Description: This folder contains data generated using the GTscore pipeline, in .gen or .txt format. These files show the haplotype composition of each individual in both .gen and .txt formats. These datasets are ready for genetic analysis.

Samples are divided into three categories:

All: all individuals included in the study
Sources: reference individuals with known origin (subset of “All”)
Mixed: individuals from commercial fisheries with unknown origin (subset of “All”)

Genepop files

These files are formatted for use in GENEPOP and contain multilocus genotype data for all individuals.

Title line: general file or dataset description
Locus names: listed after the title line and before the first “Pop” line (e.g., NC_036838_1_8237766_1)
Pop: indicates the beginning of a population group
Individual genotype lines:
Each line begins with a sample ID (e.g., saalBYRs_001_17)
sa = Salvelinus (genus)
al = alpinus (species)
BYR = locality
s (or m) = source (or mixture)
001 = individual number
17 = year of harvest
Genotype codes:
Each genotype code corresponds to one locus and follows the standard GENEPOP format
Codes are written as four digits for diploid data
The first two digits = allele 1
The last two digits = allele 2
0000 = missing genotype data

RUBIAS .txt files

Description: These files are formatted for use in the RUBIAS package and contain sample metadata used for genetic stock identification analyses.

sample_type: indicates the type of sample included in the analysis
reference: individuals of known origin used as baseline samples
mixture: individuals of unknown origin to be assigned
repunit: reporting unit assigned to each individual
collection: collection or baseline population name assigned to each individual
indiv: sample ID

File: GTscore.R

Description: Is the GTscore script from https://github.com/gjmckinney/GTscore, used in our pipeline to generate our datasets in .txt and .gen

Code/software

All data were analyzed using R software.