Genomics-enabled mixed-stock analysis uncovers intraspecific migratory complexity and detects unsampled populations in a harvested fish
Data files
Mar 04, 2025 version files 2.62 MB
-
README.md
35.94 KB
-
Walleye_GTseq_Genotypes.csv
1.90 MB
-
Walleye_GTseq_Metadata_and_assignments_v3.csv
684.36 KB
Abstract
Population contributions to annual harvests provide key insights to conservation, especially in migratory species that return to specific reproductive areas and may establish genetically distinct populations. In this context, genetic stock identification (GSI) requires reference samples from source populations, yet sampling might be challenging as reproductive areas could be remote and/or unknown. To investigate intraspecific variation in walleye (Sander vitreus) populations harvested in two large lakes in northern Quebec, we used genotyping-by-sequencing data to develop a panel of 336 single nucleotide polymorphisms. We then genotyped 1465 fish and assessed individual migration distances from GPS records. Samples were assigned to a source population using two methods, one requiring allele frequencies of known populations (RUBIAS) and the other without prior knowledge (STRUCTURE). Individual assignments to a known population reached 93% consistency between both methods in the main lake where we identified all five major source populations. However, the analyses also revealed up to three small unsampled populations. Furthermore, populations were characterized by large differences in average migration distance. In contrast, assignment consistency reached 99% in the neighboring lake and walleye were assigned with high confidence to two populations having a similar distribution throughout the lake. The complex population structure and migration patterns in the main lake suggest a more heterogenous habitat and thus, greater potential for local adaptation. This study highlights how combining analytical approaches can inform the robustness of GSI results in a given system and detect intraspecific diversity and complexity relevant for conservation.
https://doi.org/10.5061/dryad.f1vhhmh62
Description of the data and file structure
| file | Column nb | variable | variable_type | description |
|---|---|---|---|---|
| Walleye_GTseq_Genotypes.csv | 1 | sample_type | RUBIAS_input_file | RUBIAS format sample_type column: (i) "reference" = samples of known origin (source samples) and (ii) "mixture" = samples of unknown origin (also called mixed-stock samples). |
| Walleye_GTseq_Genotypes.csv | 2 | repunit | RUBIAS_input_file | RUBIAS format repunit column: (i) reference samples (source samples) have population labels (non-abbreviated population names) and (ii) mixed-stock samples have "NA". |
| Walleye_GTseq_Genotypes.csv | 3 | collection | RUBIAS_input_file | RUBIAS format collection column: (i) reference samples (source samples) have population labels (ICO, PER, CHA, MAU, TAK, MET, TEM) and (ii) mixed-stock samples have lake labels (MIST=Mistassini; ALB=Mistasiniishish). |
| Walleye_GTseq_Genotypes.csv | 4 | collection_icoper_merged | Alternative_to_collection | RUBIAS format collection column (alternative version with ICO and PER merged): (i) reference samples (source samples) have population labels (ICOPER, CHA, MAU, TAK, MET, TEM) and (ii) mixed-stock samples have lake labels (MIST=Mistassini; ALB=Mistasiniishish). |
| Walleye_GTseq_Genotypes.csv | 5 | indiv | RUBIAS_input_file | sample ID |
| Walleye_GTseq_Genotypes.csv | 6 | structure_k10_Qmax | STRUCTURE_assignments | STRUCTURE assignment at K10 based on maximum Q-value. "NA" denotes samples which were removed from the analyses (e.g., did not pass filters, see paper). |
| Walleye_GTseq_Genotypes.csv | 7 | structure_k10_Q90 | STRUCTURE_assignments | STRUCTURE assignment at K10 based on Q-values above 0.9. Population assignments are indicated when Q>=0.9. "NA" denotes samples with Q<0.9 or which were removed from the analyses (e.g., did not pass filters, see paper). |
| Walleye_GTseq_Genotypes.csv | 8+ | 606 columns: genotypes (alleles) | Alleles (303 biallelic SNPs) | Allele 1 of loci x: LociNAME_1; Allele 2 of loci x: LociNAME_2. LociNAME include chromosome ID and position (chrID_nPosition) |
| Walleye_GTseq_Metadata_and_assignments.csv | 1 | type | Metadata | Source (reference) or Mixed-Stock (mixture) |
| Walleye_GTseq_Metadata_and_assignments.csv | 2 | lake | Metadata | Lake Mistassini (MIST) or Lake Mistasiniishish (ALB) |
| Walleye_GTseq_Metadata_and_assignments.csv | 3 | population | Metadata | Sampling location: river for source samples; lake for mixed-stock |
| Walleye_GTseq_Metadata_and_assignments.csv | 4 | ID | Metadata | Sample ID |
| Walleye_GTseq_Metadata_and_assignments.csv | 6 | year | Metadata | Sampling year |
| Walleye_GTseq_Metadata_and_assignments.csv | 7 | fishing_method | Metadata | Fishing method (angling or gillnet) |
| Walleye_GTseq_Metadata_and_assignments.csv | 8 | migration_distance_km | Migration | Migration distances in kilometers (km) measured with Google Earth. All source samples have missing values (NA). In addition, we could not confidently assess migration distances for a some samples (see paper). |
| Walleye_GTseq_Metadata_and_assignments.csv | 9 | RUBIAS_assign | RUBIAS_output | RUBIAS inferred population assignments |
| Walleye_GTseq_Metadata_and_assignments.csv | 10 | RUBIAS_PofZ | RUBIAS_output | RUBIAS PofZ (posterior means of group membership). Usually an assignment with PofZ<0.8 is considered uncertain (low confidence) |
| Walleye_GTseq_Metadata_and_assignments.csv | 11 | RUBIAS_logLik | RUBIAS_output | RUBIAS Loglikelihood |
| Walleye_GTseq_Metadata_and_assignments.csv | 12 | RUBIAS_z-score | RUBIAS_output | RUBIAS z-scores |
| Walleye_GTseq_Metadata_and_assignments.csv | 13 | RUBIAS_Missingloci | RUBIAS_output | RUBIAS missing values (number of missing loci per individual) |
| Walleye_GTseq_Metadata_and_assignments.csv | 14 | K6_QMAX | STRUCTURE_output | STRUCTURE highest Q value at K6 |
| Walleye_GTseq_Metadata_and_assignments.csv | 15 | K6_clusternb | STRUCTURE_output | STRUCTURE cluster number at K6 |
| Walleye_GTseq_Metadata_and_assignments.csv | 16 | K6_assign | STRUCTURE_output | STRUCTURE assignment at K6 (population assignment based on the highest Q value) |
| Walleye_GTseq_Metadata_and_assignments.csv | 17 | K7_QMAX | STRUCTURE_output | STRUCTURE highest Q value at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 18 | K7_clusternb | STRUCTURE_output | STRUCTURE cluster number at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 19 | K7_assign | STRUCTURE_output | STRUCTURE assignment at K7 (population assignment based on the highest Q value) |
| Walleye_GTseq_Metadata_and_assignments.csv | 20 | K8_QMAX | STRUCTURE_output | STRUCTURE highest Q value at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 21 | K8_clusternb | STRUCTURE_output | STRUCTURE cluster number at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 22 | K8_assign | STRUCTURE_output | STRUCTURE assignment at K8 (population assignment based on the highest Q value) |
| Walleye_GTseq_Metadata_and_assignments.csv | 23 | K9_QMAX | STRUCTURE_output | STRUCTURE highest Q value at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 24 | K9_clusternb | STRUCTURE_output | STRUCTURE cluster number at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 25 | K9_assign | STRUCTURE_output | STRUCTURE assignment at K9 (population assignment based on the highest Q value) |
| Walleye_GTseq_Metadata_and_assignments.csv | 26 | K10_QMAX | STRUCTURE_output | STRUCTURE highest Q value at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 27 | K10_clusternb | STRUCTURE_output | STRUCTURE cluster number at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 28 | K10_assign | STRUCTURE_output | STRUCTURE assignment at K10 (population assignment based on the highest Q value) |
| Walleye_GTseq_Metadata_and_assignments.csv | 29 | K2_1 | STRUCTURE_output | STRUCTURE Q values at K2 |
| Walleye_GTseq_Metadata_and_assignments.csv | 30 | K2_2 | STRUCTURE_output | STRUCTURE Q values at K2 |
| Walleye_GTseq_Metadata_and_assignments.csv | 31 | K3_1 | STRUCTURE_output | STRUCTURE Q values at K3 |
| Walleye_GTseq_Metadata_and_assignments.csv | 32 | K3_2 | STRUCTURE_output | STRUCTURE Q values at K3 |
| Walleye_GTseq_Metadata_and_assignments.csv | 33 | K3_3 | STRUCTURE_output | STRUCTURE Q values at K3 |
| Walleye_GTseq_Metadata_and_assignments.csv | 34 | K4_1 | STRUCTURE_output | STRUCTURE Q values at K4 |
| Walleye_GTseq_Metadata_and_assignments.csv | 35 | K4_2 | STRUCTURE_output | STRUCTURE Q values at K4 |
| Walleye_GTseq_Metadata_and_assignments.csv | 36 | K4_3 | STRUCTURE_output | STRUCTURE Q values at K4 |
| Walleye_GTseq_Metadata_and_assignments.csv | 37 | K4_4 | STRUCTURE_output | STRUCTURE Q values at K4 |
| Walleye_GTseq_Metadata_and_assignments.csv | 38 | K5_1 | STRUCTURE_output | STRUCTURE Q values at K5 |
| Walleye_GTseq_Metadata_and_assignments.csv | 39 | K5_2 | STRUCTURE_output | STRUCTURE Q values at K5 |
| Walleye_GTseq_Metadata_and_assignments.csv | 40 | K5_3 | STRUCTURE_output | STRUCTURE Q values at K5 |
| Walleye_GTseq_Metadata_and_assignments.csv | 41 | K5_4 | STRUCTURE_output | STRUCTURE Q values at K5 |
| Walleye_GTseq_Metadata_and_assignments.csv | 42 | K5_5 | STRUCTURE_output | STRUCTURE Q values at K5 |
| Walleye_GTseq_Metadata_and_assignments.csv | 43 | K6_1 | STRUCTURE_output | STRUCTURE Q values at K6 |
| Walleye_GTseq_Metadata_and_assignments.csv | 44 | K6_2 | STRUCTURE_output | STRUCTURE Q values at K6 |
| Walleye_GTseq_Metadata_and_assignments.csv | 45 | K6_3 | STRUCTURE_output | STRUCTURE Q values at K6 |
| Walleye_GTseq_Metadata_and_assignments.csv | 46 | K6_4 | STRUCTURE_output | STRUCTURE Q values at K6 |
| Walleye_GTseq_Metadata_and_assignments.csv | 47 | K6_5 | STRUCTURE_output | STRUCTURE Q values at K6 |
| Walleye_GTseq_Metadata_and_assignments.csv | 48 | K6_6 | STRUCTURE_output | STRUCTURE Q values at K6 |
| Walleye_GTseq_Metadata_and_assignments.csv | 49 | K7_1 | STRUCTURE_output | STRUCTURE Q values at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 50 | K7_2 | STRUCTURE_output | STRUCTURE Q values at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 51 | K7_3 | STRUCTURE_output | STRUCTURE Q values at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 52 | K7_4 | STRUCTURE_output | STRUCTURE Q values at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 53 | K7_5 | STRUCTURE_output | STRUCTURE Q values at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 54 | K7_6 | STRUCTURE_output | STRUCTURE Q values at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 55 | K7_7 | STRUCTURE_output | STRUCTURE Q values at K7 |
| Walleye_GTseq_Metadata_and_assignments.csv | 56 | K8_1 | STRUCTURE_output | STRUCTURE Q values at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 57 | K8_2 | STRUCTURE_output | STRUCTURE Q values at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 58 | K8_3 | STRUCTURE_output | STRUCTURE Q values at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 59 | K8_4 | STRUCTURE_output | STRUCTURE Q values at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 60 | K8_5 | STRUCTURE_output | STRUCTURE Q values at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 61 | K8_6 | STRUCTURE_output | STRUCTURE Q values at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 62 | K8_7 | STRUCTURE_output | STRUCTURE Q values at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 63 | K8_8 | STRUCTURE_output | STRUCTURE Q values at K8 |
| Walleye_GTseq_Metadata_and_assignments.csv | 64 | K9_1 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 65 | K9_2 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 66 | K9_3 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 67 | K9_4 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 68 | K9_5 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 69 | K9_6 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 70 | K9_7 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 71 | K9_8 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 72 | K9_9 | STRUCTURE_output | STRUCTURE Q values at K9 |
| Walleye_GTseq_Metadata_and_assignments.csv | 73 | K10_1 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 74 | K10_2 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 75 | K10_3 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 76 | K10_4 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 77 | K10_5 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 78 | K10_6 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 79 | K10_7 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 80 | K10_8 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 81 | K10_9 | STRUCTURE_output | STRUCTURE Q values at K10 |
| Walleye_GTseq_Metadata_and_assignments.csv | 82 | K10_10 | STRUCTURE_output | STRUCTURE Q values at K10 |
