Mixed-stock analysis reveals long-distance movements and few populations with large harvest contributions in lake-migratory brook trout
Data files
Apr 28, 2025 version files 1.23 MB
-
Chamlian_et_al_mixed_stock_analysis_data.zip
1.22 MB
-
README.md
8.92 KB
Abstract
Effective fishery management relies on knowing the relative contributions of distinct populations to mixed-stock harvests. Mixed-stock analyses increasingly adopt single-nucleotide polymorphism (SNP) panels but could make better use of precise spatial information to improve understanding of population distributions, movements, and structure. Together with local partners, we collected and georeferenced 1051 samples of lake-migratory brook trout between 2020-2022 from three large lakes in Quebec (Mistassini, Mistasiniishish, Waconichi). We then used a GT-seq (Genotyping-in-Thousands by sequencing) SNP panel to infer population genetic structure and determine spatial harvest contributions of genetically distinct populations. Our results revealed population structure in two of three study lakes, with few populations (n = 1-2) contributing the majority (> 80%) of mixed-stock harvest in each lake. We also detected extensive movements of brook trout within and between lakes, spatial segregation of populations in one lake, and an unknown (unsampled) population in another lake. Our results illustrate the precision afforded by combining GT-seq and georeferencing of samples to generate insights into the ecology and genetics of migratory fishes, thereby facilitating local decision-making for sustainable fisheries.
https://doi.org/10.5061/dryad.wdbrv15z2
Description of the data and file structure
This compressed folder contains all input data as well as scripts/code required to generate outputs from that data. The data is organized in folders, and each folder is named after its related analysis (e.g., the folder "CPUE" contains all data and code associated with catch-per-unit-effort analyses discussed in our research article "Mixed-stock analysis reveals long-distance movements and few populations with large harvest contributions in lake-migratory brook trout").
Files and variables
Folder: CPUE
Description: Contains two sub-folders: mistassini_rivers_cpue & mixed-stock_fishery_cpue. Each folder contains an R file to run the respective analyses (e.g., R file "annual_mixed_stock_fishery_cpue_analysis" contains the code required to run linear regression on the catch-per-unit-effort data of the three study lakes). Raw input files for each analysis are contained in the same location as the R files.
.csv file: mistassini_rivers_cpue_raw
population_year: Name of source population followed by the time period in brackets ().
CPUE: raw catch-per-unit-effort calculated as the number of fish caught per angler per 8-hour fishing day.
R file: mistassini_sources_cpue_visualization
Script that requires R to run. Uses mistassini_rivers_cpue_raw as input data and generates a box plot for visualization.
.csv files: albanel_cpue, mistassini_cpue, waconichi_cpue
year: year the data was collected
CPUE: annual catch-per-unit-effort calculated as the number of fish caught per angler per 8-hour fishing day.
R file: annual_mixed_stock_fishery_cpue_analysis
Script that requires R to run. Uses albanel_cpue, mistassini_cpue, and waconichi_cpue as input data to run a linear regression on each lake's CPUE over time.
Folder: Rubias
Description: Contains one R file: mixed_stock_assignments; and one .txt file: dataR_source_and_mixed_stock. This R file contains scripts to run mixed-stock assignment analyses.
.txt file: dataR_source_and_mixed_stock
sample_type: type of collected sample (reference or mixture). Reference samples are collected from spawning individuals in their source rivers, and mixture samples are collected from mixed-stock individuals in the lakes.
repunit: reporting unit of the individual. For mixture individuals, this field is null because their source population is unknown.
collection: river or lake in which the sample was collected.
indiv: individual fish identification.
LG01_100261162_1 to LG42_1816745_1.1: Alleles of loci included in the SNP panel. Each field contains a single letter: A, C, T, or G (or NA for null fields). The letter corresponds to the sequenced nucleotide at a given allele of a given locus for a given individual.
R file: mixed_stock_assignment
A script that requires R to run. Uses dataR_source_and_mixed_stock as input data to run mixed-stock analyses and assign mixture individuals to their population of origin based on the allele frequencies of the reference samples.
Folder: snpR_for_PCA_and_Fst
Description: Contains one R markdown file: snpr_analyses; and two .txt files: 5sources_brooktrout_data & 6sources_brooktrout_data. The R markdown file contains scripts to run population structure analyses.
.txt file: 5sources_brooktrout_data
indiv: individual fish identification.
pop: population of origin. For spawning individuals, this field contains the name of one of five known source populations (named after the river in which they spawn). For mixed-stock individuals, this field contains the name of the lake in which they were sampled.
LG01_100261162_1 to LG42_1816745_1: Loci included in the SNP panel. Each field contains a pair of the following letters: A, C, T, or G (or NN for null fields). The letters correspond to the sequenced nucleotides at both alleles of a given locus for a given individual.
.txt file: 6sources_brooktrout_data
indiv: individual fish identification.
pop: population of origin. For spawning individuals, this field contains the name of one of six source populations (five known populations named after the river in which they spawn, and a putative sixth population detected through structure and mixed-stock analyses; see our research article "Mixed-stock analysis reveals long-distance movements and few populations with large harvest contributions in lake-migratory brook trout"). For mixed-stock individuals, this field contains the name of the lake in which they were sampled.
LG01_100261162_1 to LG42_1816745_1: Loci included in the SNP panel. Each field contains a pair of the following letters: A, C, T, or G (or NN for null fields). The letters correspond to the sequenced nucleotides at both alleles of a given locus for a given individual.
R markdown file: snpr_analyses
A script that requires R to run. Uses 5sources_brooktrout_data or 6sources_brooktrout_data as input data to run population structure analyses via Principal Component Analysis, as well as calculate pairwise fixation index (Fst), expected heterozygosity (He), observed heterozygosity (Ho), Wright’s inbreeding coefficient (Fis), and the number of private alleles of all source populations. Step-by-step comments are included in the code to guide the user through the analysis.
Folder: STRUCTURE_6sources
Description: Contains one R file: 6sources_STRUCTURE_analysis; one .txt file: labels_pop_all; and one sub-folder: STRUCTURE_runs. The R file contains scripts to generate and analyze STRUCTURE plots for the 6-source dataset, using the raw STRUCTURE runs contained in the STRUCTURE_runs subfolder.
.txt file: labels_pop_all
pop: population of origin. This list is used by the R script to label and group individuals by population for visualization in STRUCTURE plots.
R file: 6sources_STRUCTURE_analysis
A script that requires R to run. Uses the files contained in the sub-folder STRUCTURE_runs to generate STRUCTURE plots and results of individual assignment to population clusters.
Sub-folder: STRUCTURE_runs
Contains 35 files corresponding to 35 runs on the program STRUCTURE.
k1_run_1 to k7_run_35: Individual runs for K = 1 to K = 7 (i.e., from one population cluster to seven population clusters used by the program to assign individuals based on relatedness from allele frequencies). Each K was run five times for robustness, for a total of 35 runs from K = 1 to K = 7.
Folder: Temporal_comparison_MANOVA
Description: Contains one R file: temporal_comparison; and two .csv files: 2021_vs_2022 & historical_vs_contemporary. The R file contains the code to run MANOVA tests for stability in the spatial distribution of Mistassini Lake's populations over time. Raw input files are contained in the same location as the R file.
.csv file: 2021_vs_2022
sector: spatial sector of Mistassini Lake (see our research article "Mixed-stock analysis reveals long-distance movements and few populations with large harvest contributions in lake-migratory brook trout" for details on the lake's division into nine sectors).
year: sampling year.
RUP_prop: proportion of individuals originating from Rupert River found in a given sector of the lake.
inflows_prop: proportion of individuals originating from the inflow rivers (Cheno and Papas) found in a given sector of the lake.
.csv file: historical_vs_contemporary
sector: spatial sector of Mistassini Lake (see our research article "Mixed-stock analysis reveals long-distance movements and few populations with large harvest contributions in lake-migratory brook trout" for details on the lake's division into nine sectors).
year: historical or contemporary time period. Historical time period corresponds to "year00to01" (i.e., 2000 to 2001), and contemporary time period corresponds to "year21to22" (i.e., 2021 to 2022).
RUP_prop: proportion of individuals originating from Rupert River found in a given sector of the lake.
inflows_prop: proportion of individuals originating from the inflow rivers (Cheno and Papas) found in a given sector of the lake.
R file: temporal_comparison
Script that requires R to run. Uses 2021_vs_2022 and historical_vs_contemporary as input data to run MANOVAs testing for the effects of sector and time on the population proportions found within Mistassini Lake.
Code/software
R or RStudio (preferred) is required to view and run the scripts contained in this dataset. The raw data files (.csv or .txt) can be viewed with any text reader.
The required packages will all get installed and loaded when the scripts are run.
