Data from: Application of a GT-seq SNP panel to quantify interannual variation in mixed-stock composition in the largest commercial fishery for Arctic Charr (Salvelinus alpinus) in Canada
Data files
Mar 31, 2026 version files 7.44 MB
-
Data_generated_with_GTscore.zip
2.20 MB
-
GTscore.R
51.51 KB
-
GTseq_saal_all_samples_brut.zip
3.45 MB
-
GTseq_saal_clean.zip
1.73 MB
-
README.md
4.58 KB
Abstract
Aquatic resources are central to the global food supply and economy, particularly in the Arctic, where Inuit communities have long depended on wild resources. In the 1960s, the community of Iqaluktuuttiaq (Cambridge Bay, Nunavut, Canada) established commercial fisheries targeting Arctic Char (Salvelinus alpinus, L.), providing key employment and income. In a rapidly changing Arctic, molecular tools can enhance sustainable fisheries management. We applied a GT-seq (Genotyping-in-Thousands) panel of 377 SNP loci to distinguish five local stocks (Ekalluk, Halokvik, Jayko, Lauchlan, and Surrey) and quantify their contributions to annual harvests. A total of 1387 samples collected from four commercial sites over eight years (2012–2020) were analyzed to assess temporal variation in stock composition. Results revealed unequal stock contributions, with Ekalluk and Halokvik comprising over 75 % of catches. Spring harvests showed stable stock composition across years, while fall harvests exhibited significant interannual variation (p < 0.05). Environmental factors such as ice breakup timing (IB) and duration of 50 % marine ice cover (I50) had no significant influence on stock contribution variability, likely due to limited data. Thus, GTseq offers the opportunity to explore more adaptive management by capturing interannual variability in stock mixing.
Dataset DOI: 10.5061/dryad.np5hqc07g
Description of the data and file structure
This README describes the datasets and file structure associated with the genetic analyses.
Folder: GTseq_saal_all_samples_brut.zip
Description: This folder contains the raw data obtained from the four GT-seq sequencing runs. Below is a description of the variables in each file.
Files: AlleleReads_haplotypes.txt and AlleleReads_singleSNPs.txt
- Sample names
- Locus/Haplotype
File: GTscore_individualSummary.txt
- Sample: sample IDs
- Total Reads: total number of reads analyzed
- Off-target Reads: reads that do not correspond to an expected GT-seq locus
- Primer Only Reads: reads containing only the primer sequence(s), with no usable information from the locus
- Primer Probe Reads: reads containing the expected primer + target sequence (or “probe”/internal) structure
- Off-target Proportion = Off-target Reads / Total Reads
- Primer Only Proportion = Primer Only Reads / Total Reads
- Primer Probe Proportion = Primer Probe Reads / Total Reads
File: GTscore_locusSummary.txt
- Locus: locus IDs
- Primer Reads: total number of reads associated with this locus based on primer detection
- Primer Probe Reads: number of reads for this locus where GTscore recognized both the locus primer and the expected in-silico probe for this SNP
- Primer Probe Proportion: Primer Probe Reads / Primer Reads
File: LocusTable_haplotypes.txt
Description: detailed loci per haplotype
File: LocusTable_singleSNPs.txt
Description: detailed loci per SNP
File: sampleFiles
Description: sample IDs
- Each line begins with a sample ID (e.g., saalBYRs_001_17)
- sa = Salvelinus (genus)
- al = alpinus (species)
- BYR = locality
- s (or m) = source (or mixture)
- 001 = individual number
- 17 = year of harvest
Folder: GTseq_saal_clean.zip
Description: This folder contains the cleaned dataset, with individuals removed when sequencing quality was insufficient or genotyping failed.
Files in this folder follow the same structure and variable definitions as those described above, but after removal of low-quality individuals and non-functional loci/SNPs.
Folder: Data_generated_with_GTscore.zip
Description: This folder contains data generated using the GTscore pipeline, in .gen or .txt format. These files show the haplotype composition of each individual in both .gen and .txt formats. These datasets are ready for genetic analysis.
Samples are divided into three categories:
- All: all individuals included in the study
- Sources: reference individuals with known origin (subset of “All”)
- Mixed: individuals from commercial fisheries with unknown origin (subset of “All”)
Genepop files
These files are formatted for use in GENEPOP and contain multilocus genotype data for all individuals.
- Title line: general file or dataset description
- Locus names: listed after the title line and before the first “Pop” line (e.g., NC_036838_1_8237766_1)
- Pop: indicates the beginning of a population group
- Individual genotype lines:
- Each line begins with a sample ID (e.g., saalBYRs_001_17)
- sa = Salvelinus (genus)
- al = alpinus (species)
- BYR = locality
- s (or m) = source (or mixture)
- 001 = individual number
- 17 = year of harvest
- Genotype codes:
- Each genotype code corresponds to one locus and follows the standard GENEPOP format
- Codes are written as four digits for diploid data
- The first two digits = allele 1
- The last two digits = allele 2
- 0000 = missing genotype data
RUBIAS .txt files
Description: These files are formatted for use in the RUBIAS package and contain sample metadata used for genetic stock identification analyses.
- sample_type: indicates the type of sample included in the analysis
- reference: individuals of known origin used as baseline samples
- mixture: individuals of unknown origin to be assigned
- repunit: reporting unit assigned to each individual
- collection: collection or baseline population name assigned to each individual
- indiv: sample ID
File: GTscore.R
Description: Is the GTscore script from https://github.com/gjmckinney/GTscore, used in our pipeline to generate our datasets in .txt and .gen
Code/software
All data were analyzed using R software.
