Data from: Using lower river genetic stock identification (GSI) and a tributary specific mark-recapture program to estimate genetic unit escapements of Pacific Salmon within large river systems

Whitmore, Ryan W.1 ; Hankin, David G.2; Rondeau, Eric B.1

Published Mar 17, 2026 on Dryad. https://doi.org/10.5061/dryad.pzgmsbd2w

Data files

Mar 17, 2026 version files 1.80 MB

01_Data_v1.1.zip

1.78 MB
02_Scripts.zip

12.69 KB
README.md

5.63 KB

Abstract

The data and scripts contained within this repository allow for the reproduction of the analyses described in the manuscript "Using lower river genetic stock identification (GSI) and a tributary-specific mark-recapture program to estimate genetic unit escapements of Pacific Salmon within large river systems", an escapement estimation approach that combines lower river genetic stock identification (GSI) with an accurate mark-recapture escapement estimate for a single major spawning tributary (the “reference unit”), with the example under investigation being the Skeena River coho salmon, the Skeena River Tyee test fishery, and the Witset Mark-Recapture program on the Bulkley River. This repository is organized into two pieces. First, a folder containing the genetic data used underlying the methods (Baseline, mixtures, and Colony assignments as made). Second, a folder containing the R scripts used in executing the analyses.

Dataset DOI: 10.5061/dryad.pzgmsbd2w

Description of the data and file structure

Data and scripts to perform the analysis in: "Using lower river genetic stock identification (GSI) and a tributary-specific mark-recapture program to estimate genetic unit escapements of Pacific Salmon within large river systems".

Files and variables

File: 01_Data_v1.1.zip

Description:

PBT_Colony_Results

PID20190105_and_more_Skeena_test_GN(19)_and_more_sc240-265-266-306-361-389-434_results.txt.gz - Contains PBT assignments made over the full length of the project, from 2019-2024. Brood-years considered for assignment were 2:-4 from Tyee sampling (e.g., 2024 had brood years 2022-2020 considered.

Skeena_formatted_baseline

bco_SNP_Skeena_v5.0.1_2025-03-18_CUs-Bulk.txt.gz - Contains the baseline data used for the CU-Bulkley analyses, with collection sites grouped into reporting units following delineations described within the manuscript

bco_SNP_Skeena_v5.0.1_2025-03-18_GUs_v4.txt.gz - Contains the baseline data used for the GU analyses, with collection sites grouped into reporting units following delineations described within the manuscript. The v4 indicates the iteration of reporting units used, with v4 being final.

Tyee_mixture_data

mco_2019.txt.gz - Tyee mixed-stock data from 2019 collection year. Individuals failing QC and species ID screens have been removed from the data.

mco_2020.txt.gz - Tyee mixed-stock data from 2020 collection year. Individuals failing QC and species ID screens have been removed from the data.

mco_2021.txt.gz - Tyee mixed-stock data from 2021 collection year. Individuals failing QC and species ID screens have been removed from the data.

mco_2022.txt.gz - Tyee mixed-stock data from 2022 collection year. Individuals failing QC and species ID screens have been removed from the data.

mco_2023.txt.gz - Tyee mixed-stock data from 2023 collection year. Individuals failing QC and species ID screens have been removed from the data.

mco_2024.txt.gz - Tyee mixed-stock data from 2024 collection year. Individuals failing QC and species ID screens have been removed from the data.

All genotype data follow genotyping with the ThermoFisher panel AgriSeq panel WG00087, with "REF" and "ALT" coded as "1" and "2" respectively. SNP positions and alleles are described fully in Beacham et al. (020 https://doi.org/10.1139/cjfas-2019-0339, Supplementary Table 2.

File: 02_Scripts.zip

Description:

Delta_Boot_methods

DeltaBoot.CU.PlusMid. 25. final.fn.R - Delta Boot method to run the CU-Bulkley style analyses

DeltaBoot.GU.Bulkley. 25. final.fn.R - Delta Boot method to run the GU-Bulkley style analyses

Plotting_methods

rubias_LOO_assignment_plot_example_GUs_v4_Blue_PNG.R - script utilized to run and visualize leave-one-out analyses presented in this work. Baseline and repunits files are input to run for each baseline - example provides the "CU-Bulkley". Reporting units are a two-column format, or "repunit" and "Display_order" to sort the data into a meaningful order downstream. Absent the repunits file, data would be ordered in whatever way they are output from the "unique" function on the repunit column in the baseline. Origin of script from ADF&G, and currently described in: https://github.com/commfish/GCLr/confusion_matrix.r and https://github.com/commfish/GCLr/plot_confusion_matrix.r

Rubias_methods

AID20210027_CU-Bulkley_2023.R - script utilized to run the blow-up bootstrap methodology described in this work. The example is set up to expect the 2023 year and the CU-Bulkley baseline. A previous option to implement a selection of individuals by random sampling with replacement is maintained as an option, but not evaluated within this work. Output is two files - a "standard" run of rubias, and a file of proportions generated by the bootstrapping methodology selected above across i iterations for input into the Delta-boot methodology for calculation of covariances. For the manuscript, I was set to 10,000 across all combinations. Each combination of baseline (GU + CU-Bulkle0 and year/run size (2019:2024) was run separately, for a total of 12 combinations of outputs from the above script.

Run_sizes_N_setting.txt - the run-size values used for each year in the blowup methodology. Each combination of baseline (GU + CU-Bulkley and year/run size (2019:2024) was run separately, for a total of 12 combinations, outputs from the above script.

Code/software

The data is tab-delimited text files, and can be opened in any software.

The scripts are in R, and the packages required are described within. No special tools required, although, depending on the resources available, additional steps to parallelize can be potentially utilized.

Access information

Other publicly accessible locations of the data:

Some of the data for the baseline will be part of previous dryad repositories, such as https://datadryad.org/dataset/doi:10.5061/dryad.msbcc2fvk and https://datadryad.org/dataset/doi:10.5061/dryad.g4f4qrfs3, but only subsets and reduced marker counts.