Data from: Distinct patterns of inheritance shape lifehistory traits in steelhead trout
Data files
Oct 23, 2023 version files 101.18 MB

Beulke_et_al_2023.zip

README.md
Abstract
Lifehistory variation is the raw material of adaptation, and understanding its genetic and environmental underpinning is key to designing effective conservation strategies. We used largescale genetic pedigree reconstruction of anadromous steelhead trout (Oncorhynchus mykiss) from the Russian River, California, USA to elucidate sexspecific patterns of lifehistory traits and their heritability. SNP data from adults returning from sea over a 14year period were used to identify 13,474 parentoffspring trios. These pedigrees were used to determine age structure, size distributions, and family sizes for these fish, as well as to estimate the heritability of two key lifehistory traits, spawn date and age at maturity (first reproduction). Spawn date was highly heritable (h^{2} = 0.73) and had a crosssex genetic correlation near unity. We provide the first estimate of heritability for age at maturity in oceangoing fish from this species and found it to be high heritable (h^{2} from 0.29–0.62, depending upon sex and calculation), with a much lower genetic correlation across sexes. We also evaluated genotypes at a migrationassociated inversion polymorphism and found sexspecific correlations with age at maturity. The significant heritability of these two key reproductive traits in these imperiled fish, and their patterns of inheritance in the two sexes, is consistent with predictions of both natural and sexually antagonistic selection (sexes experience opposing selection pressures). This emphasizes the importance of anthropogenic factors, including hatchery practices and ecosystem modifications, in shaping the fitness of this species, thus providing important guidance for management and conservation efforts.
README: Data from: Distinct patterns of inheritance shape lifehistory traits in steelhead trout
https://doi.org/10.5061/dryad.4f4qrfjjq
Metadata records, SNP genotype data, fisheries trap counts, pedigree analysis results, bayesian linear regression results, animal model results and R notebooks used to complete the analyses.
The combination of these data and the R code will enable the reproduction of all the results presented in the paper.
Description of the data and file structure
Files in Beulke_et_al_2023.zip:
Primary Data Files:
 RR_steelhead_20072020_complied_genotype_data.csv
 RR_steelhead_trap_counts.csv
Additional Data Files:
 animal_model [directory]
 bayesian_linear_regressions [directory]
 snppit_results [directory]
 RR_steelhead_sibling_spawn_dates.csv
 RR_noSAD_results_1.tsv
 RR_SAD_results_1.tsv
Content Descriptions:
Primary Data Files:
The following two data files are the foundation of all the analyses completed for this paper. All other data provided ensures reproducibility in situations were stochasticity played a role in the results.
 RR_steelhead_20072020_complied_genotype_data.csv  Metadata and SNP genotype data for fish spawned at the Russian River fish facilities from 20072020. Main data used in the analyses.
##### Column Descriptions for RR_steelhead_20072020_complied_genotype_data.csv #####
Metadata columns (missing data reported as NA):
 NMFS_DNA_ID  Unique ID given to each sample
 GENOTYPE_NUMBER  Genotype replicas are recorded here
 BOX_ID  Box number where the DNA sample is stored.
 BOX_POSITION  Position in the plate were there tissue sample is stored
 SAMPLE_ID  ID associated with a sample before reaching our lab
 BATCH_ID  Batch number given to a related group of samples
 PROJECT_NAME  Project name tied to the DNA samples
 GENUS  Genus of the sample
 SPECIES  Species of the sample
 LENGTH  Fork length measurements, millimeters (mm)
 WEIGHT  Body weight, grams (g)
 SEX  Phenotypic sex, recorded at time of sample collection
 AGE  Age of the specimen, if provided with sample
 REPORTED_LIFE_STAGE  Life stage information provided with sample
 PHENOTYPE  Miscellaneous physical characteristics provide with sample
 HATCHERY_MARK  Information on presence/absence of fin clips provided with sample
 TAG_NUMBER  Information on fish tags provided with sample
 COLLECTION_DATE  Date the sample was collected, which is also the fish's spawn date
 ESTIMATED_DATE  If "yes" in this column, the collected date was estimated
 PICKER  Initials of person preparing the samples for DNA extraction
 PICK_DATE  Date the sample was prepared for DNA extration
 LEFTOVER_SAMPLE  If "no" or "none" in this column, there is no remaining tissue sample
 SAMPLE_COMMENTS  Comments related to the sample
 NMFS_DNA_ID_1  repeat of the unique identifier
 STATE_F  State where sample was collected
 COUNTY_F  County where the sample was collected
 WATERSHED  Watershed where the samples was collected
 TRIB_1  Additional tributary information about collection
 TRIB_2  Additional tributary information about collection
 WATER_NAME  Name of river or creek where the sample was collected
 REACH_SITE  Additional reach information about collection
 HATCHERY  Hatchery where the sample was collected
 STRAIN  Additional information on strain of sample collected
 LATITUDE_F  Geolocation information
 LONGITUDE_F  Geolocation information
 LOCATION_COMMENTS_F  Location comments
 Sample_ID  Additional sample information
Genotype columns (missing data reported as 0): Two columns are used for the two alleles of each SNP genotype. The alleles are listed as numbers, where A=1, C=2, G=3, T=4. The names correlated to SNP loci. A list of SNPs is also in the manuscript's supplemental information.
 Omy_AldA
 Omy_AldA_1
 SexID  These columns are genetic sex identifications from a sex ID SNP
 SexID_1
 SH95489423
 SH95489423_1
 SH10077163
 SH10077163_1
 SH102510682
 SH102510682_1
 SH105115367
 SH105115367_1
 SH108735311
 SH108735311_1
 SH110201359
 SH110201359_1
 SH11312873
 SH11312873_1
 SH117286374
 SH117286374_1
 SH119892365
 SH119892365_1
 SH127645308
 SH127645308_1
 OMGH1PROM1SNP1
 OMGH1PROM1SNP1_1
 Omy_arp630
 Omy_arp630_1
 SH96222125
 SH96222125_1
 SH100974386
 SH100974386_1
 SH102867443
 SH102867443_1
 SH105385406
 SH105385406_1
 SH109243222
 SH109243222_1
 SH110362585
 SH110362585_1
 SH114315438
 SH114315438_1
 SH117370400
 SH117370400_1
 SH120255332
 SH120255332_1
 SH128851273
 SH128851273_1
 Omy_aspAT123
 Omy_aspAT123_1
 Omy_g1282
 Omy_g1282_1
 SH9707773
 SH9707773_1
 SH101554306
 SH101554306_1
 SH103350395
 SH103350395_1
 SH105386347
 SH105386347_1
 SH109525403
 SH109525403_1
 SH110689148
 SH110689148_1
 SH11444887
 SH11444887_1
 SH117540259
 SH117540259_1
 SH120950569
 SH120950569_1
 SH128996481
 SH128996481_1
 Omy_COX1221
 Omy_COX1221_1
 Omy_gh475
 Omy_gh475_1
 SH97954618
 SH97954618_1
 SH101770410
 SH101770410_1
 SH103577379
 SH103577379_1
 SH105714265
 SH105714265_1
 SH109651445
 SH109651445_1
 SH111666301
 SH111666301_1
 SH114587480
 SH114587480_1
 SH11781581
 SH11781581_1
 SH121006131
 SH121006131_1
 SH129870756
 SH129870756_1
 Omy_nramp146
 Omy_nramp146_1
 Omy_gsdf291
 Omy_gsdf291_1
 SH98188405
 SH98188405_1
 SH101832195
 SH101832195_1
 SH103705558
 SH103705558_1
 SH106172332
 SH106172332_1
 SH109693461
 SH109693461_1
 SH112208328
 SH112208328_1
 SH114976223
 SH114976223_1
 SH118175396
 SH118175396_1
 SH123044128
 SH123044128_1
 SH130524160
 SH130524160_1
 Omy_Ogo4304
 Omy_Ogo4304_1
 Omy_mapK3103
 Omy_mapK3103_1
 SH98409549
 SH98409549_1
 SH101993189
 SH101993189_1
 SH104519624
 SH104519624_1
 SH106313445
 SH106313445_1
 SH109874148
 SH109874148_1
 SH112301202
 SH112301202_1
 SH115987812
 SH115987812_1
 SH11865491
 SH11865491_1
 SH12599861
 SH12599861_1
 SH130720100
 SH130720100_1
 OMY_PEPAINT6
 OMY_PEPAINT6_1
 Omy_mcsf371
 Omy_mcsf371_1
 SH98683165
 SH98683165_1
 SH102420634
 SH102420634_1
 SH105075162
 SH105075162_1
 SH107074217
 SH107074217_1
 SH110064419
 SH110064419_1
 SH11282082
 SH11282082_1
 SH116733349
 SH116733349_1
 SH118938341
 SH118938341_1
 SH127236583
 SH127236583_1
 SH131460646
 SH131460646_1
 ONMYCRBF_1SNP1
 ONMYCRBF_1SNP1_1
 SH95318147
 SH95318147_1
 SH99300202
 SH99300202_1
 SH102505102
 SH102505102_1
 SH105105448
 SH105105448_1
 SH10728569
 SH10728569_1
 SH110078294
 SH110078294_1
 SH113109205
 SH113109205_1
 SH11725996
 SH11725996_1
 SH119108357
 SH119108357_1
 SH127510920
 SH127510920_1
 SH131965120
 SH131965120_1
 Omy_R04944
 Omy_R04944_1
 RR_steelhead_trap_counts.csv  Counts of fish caught in the traps at the fish facilities on the Russian River.
##### Column Descriptions for RR_steelhead_trap_counts.csv #####
 Year  Year of the data collection
 Location  Location of the fish trap
 RR trap count  Number of fish caught in the specified trap on the Russian River (RR)
Additional Data: All of the following data provided ensures reproducibility in situations were stochasticity played a role in the results.
RR_steelhead_sibling_spawn_dates.csv  Random order of full sibling comparisons that was used to compare their spawn dates.
##### Column Descriptions for RR_steelhead_sibling_spawn_dates.csv #####
 ppair  parent pair, the IDs of the two parent fish are separated by an underscore
 kid_age  age of the offspring of the two parents, calculated by subtracting the parent spawn year from the offspring spawn year
 sib_1  ID of first full sibling
 sib_2  ID of second full sibling
 sib1_dayofyear  spawn date in day of year (001 = January 1) for first sibling
 sib2_dayofyear  spawn date in day of year (001 = January 1) for second sibling
RR_noSAD_results_1.tsv; RR_SAD_results_1.tsv  Initial run of SNPPIT pedigree analysis that led to removal of problematic loci from subsequent analyses.
##### Column Descriptiosn for SNPPIT results RR_noSAD_Results_1.tsv and RR_SAD_results_1.tsv #####
 OffspCollection  Offspring Collection, groups the fish by their spawning location
 Kid  ID of offspring
 Pa  ID of potential father
 Ma  ID of potential mother
 PopName  Population Name, groups the potential parents by their spawning location
 SpawnYear  Spawn year of the parents
 FDR  False discovery rate associated with accepting the current individual’s parentage assignment
 Pvalue  p value
 LOD logarithm of the odds
 P.Pr.C_Se_Se  The posterior probability of trio relationship C_Se_Se, more details in SNPPIT documentation, probability of true parent pair
 P.Pr.Max  The posterior probability of the trio relationship having the highest posterior probability
 MaxP.Pr.Relat  The trio relationship having highest posterior probability, relationship categories explained in SNPPIT documentation
 TotPaNonExc  The total number of putative fathers that were not excluded by Mendelian incompatibility with the kid.
 TotMaNonExc  The total number of putative mothers that were not excluded by Mendelian incompatibility with the kid
 TotUnkNonExc  The total number of putative parents of unknown sex that were not excluded by Mendelian incompatibility with the kid.
 TotPairsMendCompat  total pairs with Mendelian Compatibility (see SNPPIT documentation for more detail)
 TotPairsMendAndLogL  total pairs with Mendelian Compatiblity and Log L (see SNPPIT documentation for more detail)
 TotParsMendLoglAndRank  total pairs with Mendelian Compatiblity and Log L and Rank (see SNPPIT documentation for more detail)
 TotPairsNonExc  The total number of putative parent pairs that were not excluded by Mendelian incompatibility with the kid.
 KidMiss  The number of ungenotyped loci in the kid of a trio.
 PaMiss  The number of ungenotyped loci in the pa of a trio.
 MaMiss  The number of ungenotyped loci in the ma of a trio
 MI.Kid.Pa  The number of Mendelian incompati bilities between the kid and the pa in a trio.
 MI.Kid.Ma  The number of Mendelian incompat ibilities between the kid and the ma in a trio.
 MI.Trio  The total number of Mendelian incompatibilities in a trio
 MendIncLoci  A column holding a commaseparated list of the names of loci at which there were Mendelian incompatibilities at the inferred trio
animal_model [directory]  Details on input files and how to run the model are in the code provided on Github. Directory containing the outputs from the animal model run with MCMCglmm. The outputs are R data files (.rds) in list format containing categories "Sol" and "VCV"
 RR_genetic_corr_01.rds
 RR_genetic_corr_02.rds
 RR_genetic_corr_03.rds
 RR_genetic_corr_04.rds
 RR_genetic_corr_05.rds
 RR_genetic_corr_06.rds
bayesian_linear_regressions [directory]  Directory of the results of a series of Bayesian linear regressions that were used to calculate the crosssex genetic correlation in the trait of spawn date. The code used to create these files and to calculate the genetic correlation are on Github.
 bayes_lm_sdate_gen_cor.csv
 bayes_lm_sdate_ma_daughter.csv
 bayes_lm_sdate_ma_son.csv
 bayes_lm_sdate_pa_daughter.csv
 bayes_lm_sdate_pa_son.csv
##### Column Descriptions for bayes_lm_sdate_ma_daughter.csv\, bayes_lm_sdate_ma_son.csv\, bayes_lm_sdate_pa_daughter.csv\, and bayes_lm_sdate_pa_son.csv #####
 (Intercept)  coefficient estimates for the intecept of the linear regression
 dayofyear  coefficient estimates for the spawn date day of year for the linear regression
 sigma  sigma value for the linear regression, standard deviation of errors
##### Column Descriptions of bayes_lm_sdate_gen_cor.csv #####
 dayofyear  genetic correlation values for spawn date as day of year, calculated from using Bayesian linear regression results
snppit_results [directory]  Directory containing the pedigree analysis results from the program SNPPIT.
 no_sad [directory]  Directory containing the SNPPIT results files when sex and date information are not used (no_sad)
 snppit_input.txt  Input data file
 snppit_output_BasicDataSummary.txt  Basic information about the data that got read into the program
 snppit_output_ChosenSMAXes.txt  Information about the smax vectors used in the analysis
 snppit_output_FDR_Summary.txt  Offspring assigned to parents in each population, ranked by false discovery rate.
 snppit_output_ParentageAssignments.txt  Main output file that gives false discovery rates for all offspring with the most likely parents.
 snppit_output_PopSizesAnPiVectors.txt  Sizes of the populations and the expected fraction of different trios thereby implied.
 snppit_output_TrioPosteriors.txt  Posterior probabilities for all nonexcluded (by Mendelian incompatibility) parent pairs of every offspring in the data file.
 snppit_seeds  Seeds used to run SNPPIT
 no_sad_no_hatch [directory]  Directory containing the SNPPIT results files when sex, date and hatchery information are not used (no_sad_no_hatch). See above for file descriptions.
 snppit_input.txt
 snppit_output_BasicDataSummary.txt
 snppit_output_ChosenSMAXes.txt
 snppit_output_FDR_Summary.txt
 snppit_output_ParentageAssignments.txt
 snppit_output_PopSizesAnPiVectors.txt
 snppit_output_TrioPosteriors.txt
 snppit_seeds
 sad [directory]  Directory containing the SNPPIT results file when sex and date information are provided to the program (sad). See above for file descriptions.
 snppit_input.txt
 snppit_output_BasicDataSummary.txt
 snppit_output_ChosenSMAXes.txt
 snppit_output_FDR_Summary.txt
 snppit_output_ParentageAssignments.txt
 snppit_output_PopSizesAnPiVectors.txt
 snppit_output_TrioPosteriors.txt
 snppit_seeds
##### Column Description for snppit_output_FDR_Summary.txt files #####
PopName  Spawning location
RankInFDR  Amongst all the kids assigned to parents within a given parental population, this is the rank of the individual when sorted from smallest to largest pvalue (and hence also in the FDR).
Kid  offspring ID
Pa  father ID
Ma  mother ID
FDR  The false discovery rate associated with accepting the current individual’s parentage assignment but none of the individuals with higher pvalues.
FDC.est.to.pop  The estimated upper bound on the total number of false discoveries of parentage assignments to a particular population if you set your FDR cutoff just above this particular individual
Pvalue  The p value computed by simulation for a trio.
##### Column Descriptions for snppit_output_ParentageAssignments.txt files #####
 OffspCollection  Offspring Collection, groups the fish by their spawning location
 Kid  ID of offspring
 Pa  ID of potential father
 Ma  ID of potential mother
 PopName  Population Name, groups the potential parents by their spawning location
 SpawnYear  Spawn year of the parents
 FDR  False discovery rate associated with accepting the current individual’s parentage assignment
 Pvalue  p value
 LOD logarithm of the odds
 P.Pr.C_Se_Se  The posterior probability of trio relationship C_Se_Se, more details in SNPPIT documentation, probability of true parent pair
 P.Pr.Max  The posterior probability of the trio relationship having the highest posterior probability
 MaxP.Pr.Relat  The trio relationship having highest posterior probability, relationship categories explained in SNPPIT documentation
 TotPaNonExc  The total number of putative fathers that were not excluded by Mendelian incompatibility with the kid.
 TotMaNonExc  The total number of putative mothers that were not excluded by Mendelian incompatibility with the kid
 TotUnkNonExc  The total number of putative parents of unknown sex that were not excluded by Mendelian incompatibility with the kid.
 TotPairsMendCompat  total pairs with Mendelian Compatibility (see SNPPIT documentation for more detail)
 TotPairsMendAndLogL  total pairs with Mendelian Compatiblity and Log L (see SNPPIT documentation for more detail)
 TotParsMendLoglAndRank  total pairs with Mendelian Compatiblity and Log L and Rank (see SNPPIT documentation for more detail)
 TotPairsNonExc  The total number of putative parent pairs that were not excluded by Mendelian incompatibility with the kid.
 KidMiss  The number of ungenotyped loci in the kid of a trio.
 PaMiss  The number of ungenotyped loci in the pa of a trio.
 MaMiss  The number of ungenotyped loci in the ma of a trio
 MI.Kid.Pa  The number of Mendelian incompati bilities between the kid and the pa in a trio.
 MI.Kid.Ma  The number of Mendelian incompat ibilities between the kid and the ma in a trio.
 MI.Trio  The total number of Mendelian incompatibilities in a trio
 MendIncLoci  A column holding a commaseparated list of the names of loci at which there were Mendelian incompatibilities at the inferred trio
##### Column Descriptions for snppit_output_TrioPosteriors.txt #####
 OffspCollection  Offspring Collection, groups the fish by their spawning location
 Kid  ID of offspring
 Pa  ID of potential father
 Ma  ID of potential mother
 Rank  For a given kid, this is the rank of the parent pair when ranked from largest to smallest posterior probability of being parental.
 LOD  logarithm of the odds
 P.Pr.C_Se_Se  The posterior probability of trio relationship C_Se_Se, probability of true parent pair, see SNPPIT documentation for details on relationship categories
 P.Pr.C_Se_Si  The posterior probability of trio relationship C_Se_Si, see SNPPIT documentation for details on relationship categories
 P.Pr.C_Si_Se  The posterior probability of trio relationship C_Si_Se, see SNPPIT documentation for details on relationship categories
 P.Pr.C_Se_U  The posterior probability of trio relationship C_Se_U, see SNPPIT documentation for details on relationship categories
 P.Pr.C_U_Se  The posterior probability of trio relationship C_U_Se, see SNPPIT documentation for details on relationship categories
 P.Pr.C_Si_Si  The posterior probability of trio relationship C_Si_Si, see SNPPIT documentation for details on relationship categories
 P.Pr.C_Si_U  The posterior probability of trio relationship C_Si_U, see SNPPIT documentation for details on relationship categories
 P.Pr.C_U_Si  The posterior probability of trio relationship C_U_Si, see SNPPIT documentation for details on relationship categories
 P.Pr.C_U_U  The posterior probability of trio relationship C_U_U, see SNPPIT documentation for details on relationship categories
 P.Pr.Se_F  The posterior probability of trio relationship Se_F, see SNPPIT documentation for details on relationship categories
 P.Pr.F_Se  The posterior probability of trio relationship F_Se, see SNPPIT documentation for details on relationship categories
 P.Pr.H_Se  The posterior probability of trio relationship H_Se, see SNPPIT documentation for details on relationship categories
 P.Pr.Se_H  The posterior probability of trio relationship Se_H, see SNPPIT documentation for details on relationship categories
 P.Pr.F_Si  The posterior probability of trio relationship F_Si, see SNPPIT documentation for details on relationship categories
 P.Pr.Si_F  The posterior probability of trio relationship Si_F, see SNPPIT documentation for details on relationship categories
 P.Pr.F_U  The posterior probability of trio relationship F_U, see SNPPIT documentation for details on relationship categories
 P.Pr.U_F  The posterior probability of trio relationship U_F, see SNPPIT documentation for details on relationship categories
 P.Pr.F_F  The posterior probability of trio relationship F_F, see SNPPIT documentation for details on relationship categories
 KidMiss  The number of ungenotyped loci in the kid of a trio.
 PaMiss  The number of ungenotyped loci in the pa of a trio.
 MaMiss  The number of ungenotyped loci in the ma of a trio.
 MI.Kid.Pa  The number of Mendelian incompatibilities between the kid and the pa in a trio.
 MI.Kid.Ma  The number of Mendelian incompatibilities between the kid and the ma in a trio.
 MI.Trio  The total number of Mendelian incompatibilities in a trio.
Code
The R code used to run the analyses can be found at:
https://github.com/abeulke/Beulke_et_al_2023_Molecular_Ecology/