Data from: Distinct patterns of inheritance shape life-history traits in steelhead trout
Data files
Oct 23, 2023 version files 101.18 MB
Abstract
Life-history variation is the raw material of adaptation, and understanding its genetic and environmental underpinning is key to designing effective conservation strategies. We used large-scale genetic pedigree reconstruction of anadromous steelhead trout (Oncorhynchus mykiss) from the Russian River, California, USA to elucidate sex-specific patterns of life-history traits and their heritability. SNP data from adults returning from sea over a 14-year period were used to identify 13,474 parent-offspring trios. These pedigrees were used to determine age structure, size distributions, and family sizes for these fish, as well as to estimate the heritability of two key life-history traits, spawn date and age at maturity (first reproduction). Spawn date was highly heritable (h2 = 0.73) and had a cross-sex genetic correlation near unity. We provide the first estimate of heritability for age at maturity in ocean-going fish from this species and found it to be high heritable (h2 from 0.29–0.62, depending upon sex and calculation), with a much lower genetic correlation across sexes. We also evaluated genotypes at a migration-associated inversion polymorphism and found sex-specific correlations with age at maturity. The significant heritability of these two key reproductive traits in these imperiled fish, and their patterns of inheritance in the two sexes, is consistent with predictions of both natural and sexually antagonistic selection (sexes experience opposing selection pressures). This emphasizes the importance of anthropogenic factors, including hatchery practices and ecosystem modifications, in shaping the fitness of this species, thus providing important guidance for management and conservation efforts.
README: Data from: Distinct patterns of inheritance shape life-history traits in steelhead trout
https://doi.org/10.5061/dryad.4f4qrfjjq
Metadata records, SNP genotype data, fisheries trap counts, pedigree analysis results, bayesian linear regression results, animal model results and R notebooks used to complete the analyses.
The combination of these data and the R code will enable the reproduction of all the results presented in the paper.
Description of the data and file structure
Files in Beulke_et_al_2023.zip:
Primary Data Files:
- RR_steelhead_2007-2020_complied_genotype_data.csv
- RR_steelhead_trap_counts.csv
Additional Data Files:
- animal_model [directory]
- bayesian_linear_regressions [directory]
- snppit_results [directory]
- RR_steelhead_sibling_spawn_dates.csv
- RR_noSAD_results_1.tsv
- RR_SAD_results_1.tsv
Content Descriptions:
Primary Data Files:
The following two data files are the foundation of all the analyses completed for this paper. All other data provided ensures reproducibility in situations were stochasticity played a role in the results.
- RR_steelhead_2007-2020_complied_genotype_data.csv -- Metadata and SNP genotype data for fish spawned at the Russian River fish facilities from 2007-2020. Main data used in the analyses.
##### Column Descriptions for RR_steelhead_2007-2020_complied_genotype_data.csv #####
Metadata columns (missing data reported as NA):
- NMFS_DNA_ID -- Unique ID given to each sample
- GENOTYPE_NUMBER -- Genotype replicas are recorded here
- BOX_ID -- Box number where the DNA sample is stored.
- BOX_POSITION -- Position in the plate were there tissue sample is stored
- SAMPLE_ID -- ID associated with a sample before reaching our lab
- BATCH_ID -- Batch number given to a related group of samples
- PROJECT_NAME -- Project name tied to the DNA samples
- GENUS -- Genus of the sample
- SPECIES -- Species of the sample
- LENGTH -- Fork length measurements, millimeters (mm)
- WEIGHT -- Body weight, grams (g)
- SEX -- Phenotypic sex, recorded at time of sample collection
- AGE -- Age of the specimen, if provided with sample
- REPORTED_LIFE_STAGE -- Life stage information provided with sample
- PHENOTYPE -- Miscellaneous physical characteristics provide with sample
- HATCHERY_MARK -- Information on presence/absence of fin clips provided with sample
- TAG_NUMBER -- Information on fish tags provided with sample
- COLLECTION_DATE -- Date the sample was collected, which is also the fish's spawn date
- ESTIMATED_DATE -- If "yes" in this column, the collected date was estimated
- PICKER -- Initials of person preparing the samples for DNA extraction
- PICK_DATE -- Date the sample was prepared for DNA extration
- LEFTOVER_SAMPLE -- If "no" or "none" in this column, there is no remaining tissue sample
- SAMPLE_COMMENTS -- Comments related to the sample
- NMFS_DNA_ID_1 -- repeat of the unique identifier
- STATE_F -- State where sample was collected
- COUNTY_F -- County where the sample was collected
- WATERSHED -- Watershed where the samples was collected
- TRIB_1 -- Additional tributary information about collection
- TRIB_2 -- Additional tributary information about collection
- WATER_NAME -- Name of river or creek where the sample was collected
- REACH_SITE -- Additional reach information about collection
- HATCHERY -- Hatchery where the sample was collected
- STRAIN -- Additional information on strain of sample collected
- LATITUDE_F -- Geolocation information
- LONGITUDE_F -- Geolocation information
- LOCATION_COMMENTS_F -- Location comments
- Sample_ID -- Additional sample information
Genotype columns (missing data reported as 0): Two columns are used for the two alleles of each SNP genotype. The alleles are listed as numbers, where A=1, C=2, G=3, T=4. The names correlated to SNP loci. A list of SNPs is also in the manuscript's supplemental information.
- Omy_AldA
- Omy_AldA_1
- SexID -- These columns are genetic sex identifications from a sex ID SNP
- SexID_1
- SH95489-423
- SH95489-423_1
- SH100771-63
- SH100771-63_1
- SH102510-682
- SH102510-682_1
- SH105115-367
- SH105115-367_1
- SH108735-311
- SH108735-311_1
- SH110201-359
- SH110201-359_1
- SH113128-73
- SH113128-73_1
- SH117286-374
- SH117286-374_1
- SH119892-365
- SH119892-365_1
- SH127645-308
- SH127645-308_1
- OMGH1PROM1-SNP1
- OMGH1PROM1-SNP1_1
- Omy_arp-630
- Omy_arp-630_1
- SH96222-125
- SH96222-125_1
- SH100974-386
- SH100974-386_1
- SH102867-443
- SH102867-443_1
- SH105385-406
- SH105385-406_1
- SH109243-222
- SH109243-222_1
- SH110362-585
- SH110362-585_1
- SH114315-438
- SH114315-438_1
- SH117370-400
- SH117370-400_1
- SH120255-332
- SH120255-332_1
- SH128851-273
- SH128851-273_1
- Omy_aspAT-123
- Omy_aspAT-123_1
- Omy_g12-82
- Omy_g12-82_1
- SH97077-73
- SH97077-73_1
- SH101554-306
- SH101554-306_1
- SH103350-395
- SH103350-395_1
- SH105386-347
- SH105386-347_1
- SH109525-403
- SH109525-403_1
- SH110689-148
- SH110689-148_1
- SH114448-87
- SH114448-87_1
- SH117540-259
- SH117540-259_1
- SH120950-569
- SH120950-569_1
- SH128996-481
- SH128996-481_1
- Omy_COX1-221
- Omy_COX1-221_1
- Omy_gh-475
- Omy_gh-475_1
- SH97954-618
- SH97954-618_1
- SH101770-410
- SH101770-410_1
- SH103577-379
- SH103577-379_1
- SH105714-265
- SH105714-265_1
- SH109651-445
- SH109651-445_1
- SH111666-301
- SH111666-301_1
- SH114587-480
- SH114587-480_1
- SH117815-81
- SH117815-81_1
- SH121006-131
- SH121006-131_1
- SH129870-756
- SH129870-756_1
- Omy_nramp-146
- Omy_nramp-146_1
- Omy_gsdf-291
- Omy_gsdf-291_1
- SH98188-405
- SH98188-405_1
- SH101832-195
- SH101832-195_1
- SH103705-558
- SH103705-558_1
- SH106172-332
- SH106172-332_1
- SH109693-461
- SH109693-461_1
- SH112208-328
- SH112208-328_1
- SH114976-223
- SH114976-223_1
- SH118175-396
- SH118175-396_1
- SH123044-128
- SH123044-128_1
- SH130524-160
- SH130524-160_1
- Omy_Ogo4-304
- Omy_Ogo4-304_1
- Omy_mapK3-103
- Omy_mapK3-103_1
- SH98409-549
- SH98409-549_1
- SH101993-189
- SH101993-189_1
- SH104519-624
- SH104519-624_1
- SH106313-445
- SH106313-445_1
- SH109874-148
- SH109874-148_1
- SH112301-202
- SH112301-202_1
- SH115987-812
- SH115987-812_1
- SH118654-91
- SH118654-91_1
- SH125998-61
- SH125998-61_1
- SH130720-100
- SH130720-100_1
- OMY_PEPA-INT6
- OMY_PEPA-INT6_1
- Omy_mcsf-371
- Omy_mcsf-371_1
- SH98683-165
- SH98683-165_1
- SH102420-634
- SH102420-634_1
- SH105075-162
- SH105075-162_1
- SH107074-217
- SH107074-217_1
- SH110064-419
- SH110064-419_1
- SH112820-82
- SH112820-82_1
- SH116733-349
- SH116733-349_1
- SH118938-341
- SH118938-341_1
- SH127236-583
- SH127236-583_1
- SH131460-646
- SH131460-646_1
- ONMYCRBF_1-SNP1
- ONMYCRBF_1-SNP1_1
- SH95318-147
- SH95318-147_1
- SH99300-202
- SH99300-202_1
- SH102505-102
- SH102505-102_1
- SH105105-448
- SH105105-448_1
- SH107285-69
- SH107285-69_1
- SH110078-294
- SH110078-294_1
- SH113109-205
- SH113109-205_1
- SH117259-96
- SH117259-96_1
- SH119108-357
- SH119108-357_1
- SH127510-920
- SH127510-920_1
- SH131965-120
- SH131965-120_1
- Omy_R04944
- Omy_R04944_1
- RR_steelhead_trap_counts.csv -- Counts of fish caught in the traps at the fish facilities on the Russian River.
##### Column Descriptions for RR_steelhead_trap_counts.csv #####
- Year -- Year of the data collection
- Location -- Location of the fish trap
- RR trap count -- Number of fish caught in the specified trap on the Russian River (RR)
Additional Data: All of the following data provided ensures reproducibility in situations were stochasticity played a role in the results.
RR_steelhead_sibling_spawn_dates.csv -- Random order of full sibling comparisons that was used to compare their spawn dates.
##### Column Descriptions for RR_steelhead_sibling_spawn_dates.csv #####
- ppair -- parent pair, the IDs of the two parent fish are separated by an underscore
- kid_age -- age of the offspring of the two parents, calculated by subtracting the parent spawn year from the offspring spawn year
- sib_1 -- ID of first full sibling
- sib_2 -- ID of second full sibling
- sib1_dayofyear -- spawn date in day of year (001 = January 1) for first sibling
- sib2_dayofyear -- spawn date in day of year (001 = January 1) for second sibling
RR_noSAD_results_1.tsv; RR_SAD_results_1.tsv -- Initial run of SNPPIT pedigree analysis that led to removal of problematic loci from subsequent analyses.
##### Column Descriptiosn for SNPPIT results RR_noSAD_Results_1.tsv and RR_SAD_results_1.tsv #####
- OffspCollection -- Offspring Collection, groups the fish by their spawning location
- Kid -- ID of offspring
- Pa -- ID of potential father
- Ma -- ID of potential mother
- PopName -- Population Name, groups the potential parents by their spawning location
- SpawnYear -- Spawn year of the parents
- FDR -- False discovery rate associated with accepting the current individual’s parentage assignment
- Pvalue -- p value
- LOD-- logarithm of the odds
- P.Pr.C_Se_Se -- The posterior probability of trio relationship C_Se_Se, more details in SNPPIT documentation, probability of true parent pair
- P.Pr.Max -- The posterior probability of the trio relationship having the highest posterior probability
- MaxP.Pr.Relat -- The trio relationship having highest posterior probability, relationship categories explained in SNPPIT documentation
- TotPaNonExc -- The total number of putative fathers that were not excluded by Mendelian incompatibility with the kid.
- TotMaNonExc -- The total number of putative mothers that were not excluded by Mendelian incompatibility with the kid
- TotUnkNonExc -- The total number of putative parents of unknown sex that were not excluded by Mendelian incompatibility with the kid.
- TotPairsMendCompat -- total pairs with Mendelian Compatibility (see SNPPIT documentation for more detail)
- TotPairsMendAndLogL -- total pairs with Mendelian Compatiblity and Log L (see SNPPIT documentation for more detail)
- TotParsMendLoglAndRank -- total pairs with Mendelian Compatiblity and Log L and Rank (see SNPPIT documentation for more detail)
- TotPairsNonExc -- The total number of putative parent pairs that were not excluded by Mendelian incompatibility with the kid.
- KidMiss -- The number of ungenotyped loci in the kid of a trio.
- PaMiss -- The number of ungenotyped loci in the pa of a trio.
- MaMiss -- The number of ungenotyped loci in the ma of a trio
- MI.Kid.Pa -- The number of Mendelian incompati- bilities between the kid and the pa in a trio.
- MI.Kid.Ma -- The number of Mendelian incompat- ibilities between the kid and the ma in a trio.
- MI.Trio -- The total number of Mendelian incompatibilities in a trio
- MendIncLoci -- A column holding a comma-separated list of the names of loci at which there were Mendelian incompatibilities at the inferred trio
animal_model [directory] -- Details on input files and how to run the model are in the code provided on Github. Directory containing the outputs from the animal model run with MCMCglmm. The outputs are R data files (.rds) in list format containing categories "Sol" and "VCV"
- RR_genetic_corr_01.rds
- RR_genetic_corr_02.rds
- RR_genetic_corr_03.rds
- RR_genetic_corr_04.rds
- RR_genetic_corr_05.rds
- RR_genetic_corr_06.rds
bayesian_linear_regressions [directory] -- Directory of the results of a series of Bayesian linear regressions that were used to calculate the cross-sex genetic correlation in the trait of spawn date. The code used to create these files and to calculate the genetic correlation are on Github.
- bayes_lm_sdate_gen_cor.csv
- bayes_lm_sdate_ma_daughter.csv
- bayes_lm_sdate_ma_son.csv
- bayes_lm_sdate_pa_daughter.csv
- bayes_lm_sdate_pa_son.csv
##### Column Descriptions for bayes_lm_sdate_ma_daughter.csv\, bayes_lm_sdate_ma_son.csv\, bayes_lm_sdate_pa_daughter.csv\, and bayes_lm_sdate_pa_son.csv #####
- (Intercept) -- coefficient estimates for the intecept of the linear regression
- dayofyear -- coefficient estimates for the spawn date day of year for the linear regression
- sigma -- sigma value for the linear regression, standard deviation of errors
##### Column Descriptions of bayes_lm_sdate_gen_cor.csv #####
- dayofyear -- genetic correlation values for spawn date as day of year, calculated from using Bayesian linear regression results
snppit_results [directory] -- Directory containing the pedigree analysis results from the program SNPPIT.
- no_sad [directory] -- Directory containing the SNPPIT results files when sex and date information are not used (no_sad)
- snppit_input.txt -- Input data file
- snppit_output_BasicDataSummary.txt -- Basic information about the data that got read into the program
- snppit_output_ChosenSMAXes.txt -- Information about the smax vectors used in the analysis
- snppit_output_FDR_Summary.txt -- Offspring assigned to parents in each population, ranked by false discovery rate.
- snppit_output_ParentageAssignments.txt -- Main output file that gives false discovery rates for all offspring with the most likely parents.
- snppit_output_PopSizesAnPiVectors.txt -- Sizes of the populations and the expected fraction of different trios thereby implied.
- snppit_output_TrioPosteriors.txt -- Posterior probabilities for all non-excluded (by Mendelian incompatibility) parent pairs of every offspring in the data file.
- snppit_seeds -- Seeds used to run SNPPIT
- no_sad_no_hatch [directory] -- Directory containing the SNPPIT results files when sex, date and hatchery information are not used (no_sad_no_hatch). See above for file descriptions.
- snppit_input.txt
- snppit_output_BasicDataSummary.txt
- snppit_output_ChosenSMAXes.txt
- snppit_output_FDR_Summary.txt
- snppit_output_ParentageAssignments.txt
- snppit_output_PopSizesAnPiVectors.txt
- snppit_output_TrioPosteriors.txt
- snppit_seeds
- sad [directory] -- Directory containing the SNPPIT results file when sex and date information are provided to the program (sad). See above for file descriptions.
- snppit_input.txt
- snppit_output_BasicDataSummary.txt
- snppit_output_ChosenSMAXes.txt
- snppit_output_FDR_Summary.txt
- snppit_output_ParentageAssignments.txt
- snppit_output_PopSizesAnPiVectors.txt
- snppit_output_TrioPosteriors.txt
- snppit_seeds
##### Column Description for snppit_output_FDR_Summary.txt files #####
PopName -- Spawning location
RankInFDR -- Amongst all the kids assigned to parents within a given parental population, this is the rank of the individual when sorted from smallest to largest p-value (and hence also in the FDR).
Kid -- offspring ID
Pa -- father ID
Ma -- mother ID
FDR -- The false discovery rate associated with accepting the current individual’s parentage assignment but none of the individuals with higher p-values.
FDC.est.to.pop -- The estimated upper bound on the total number of false discoveries of parentage assignments to a particular population if you set your FDR cutoff just above this particular individual
Pvalue -- The p value computed by simulation for a trio.
##### Column Descriptions for snppit_output_ParentageAssignments.txt files #####
- OffspCollection -- Offspring Collection, groups the fish by their spawning location
- Kid -- ID of offspring
- Pa -- ID of potential father
- Ma -- ID of potential mother
- PopName -- Population Name, groups the potential parents by their spawning location
- SpawnYear -- Spawn year of the parents
- FDR -- False discovery rate associated with accepting the current individual’s parentage assignment
- Pvalue -- p value
- LOD-- logarithm of the odds
- P.Pr.C_Se_Se -- The posterior probability of trio relationship C_Se_Se, more details in SNPPIT documentation, probability of true parent pair
- P.Pr.Max -- The posterior probability of the trio relationship having the highest posterior probability
- MaxP.Pr.Relat -- The trio relationship having highest posterior probability, relationship categories explained in SNPPIT documentation
- TotPaNonExc -- The total number of putative fathers that were not excluded by Mendelian incompatibility with the kid.
- TotMaNonExc -- The total number of putative mothers that were not excluded by Mendelian incompatibility with the kid
- TotUnkNonExc -- The total number of putative parents of unknown sex that were not excluded by Mendelian incompatibility with the kid.
- TotPairsMendCompat -- total pairs with Mendelian Compatibility (see SNPPIT documentation for more detail)
- TotPairsMendAndLogL -- total pairs with Mendelian Compatiblity and Log L (see SNPPIT documentation for more detail)
- TotParsMendLoglAndRank -- total pairs with Mendelian Compatiblity and Log L and Rank (see SNPPIT documentation for more detail)
- TotPairsNonExc -- The total number of putative parent pairs that were not excluded by Mendelian incompatibility with the kid.
- KidMiss -- The number of ungenotyped loci in the kid of a trio.
- PaMiss -- The number of ungenotyped loci in the pa of a trio.
- MaMiss -- The number of ungenotyped loci in the ma of a trio
- MI.Kid.Pa -- The number of Mendelian incompati- bilities between the kid and the pa in a trio.
- MI.Kid.Ma -- The number of Mendelian incompat- ibilities between the kid and the ma in a trio.
- MI.Trio -- The total number of Mendelian incompatibilities in a trio
- MendIncLoci -- A column holding a comma-separated list of the names of loci at which there were Mendelian incompatibilities at the inferred trio
##### Column Descriptions for snppit_output_TrioPosteriors.txt #####
- OffspCollection -- Offspring Collection, groups the fish by their spawning location
- Kid -- ID of offspring
- Pa -- ID of potential father
- Ma -- ID of potential mother
- Rank -- For a given kid, this is the rank of the parent pair when ranked from largest to smallest posterior probability of being parental.
- LOD -- logarithm of the odds
- P.Pr.C_Se_Se -- The posterior probability of trio relationship C_Se_Se, probability of true parent pair, see SNPPIT documentation for details on relationship categories
- P.Pr.C_Se_Si -- The posterior probability of trio relationship C_Se_Si, see SNPPIT documentation for details on relationship categories
- P.Pr.C_Si_Se -- The posterior probability of trio relationship C_Si_Se, see SNPPIT documentation for details on relationship categories
- P.Pr.C_Se_U -- The posterior probability of trio relationship C_Se_U, see SNPPIT documentation for details on relationship categories
- P.Pr.C_U_Se -- The posterior probability of trio relationship C_U_Se, see SNPPIT documentation for details on relationship categories
- P.Pr.C_Si_Si -- The posterior probability of trio relationship C_Si_Si, see SNPPIT documentation for details on relationship categories
- P.Pr.C_Si_U -- The posterior probability of trio relationship C_Si_U, see SNPPIT documentation for details on relationship categories
- P.Pr.C_U_Si -- The posterior probability of trio relationship C_U_Si, see SNPPIT documentation for details on relationship categories
- P.Pr.C_U_U -- The posterior probability of trio relationship C_U_U, see SNPPIT documentation for details on relationship categories
- P.Pr.Se_F -- The posterior probability of trio relationship Se_F, see SNPPIT documentation for details on relationship categories
- P.Pr.F_Se -- The posterior probability of trio relationship F_Se, see SNPPIT documentation for details on relationship categories
- P.Pr.H_Se -- The posterior probability of trio relationship H_Se, see SNPPIT documentation for details on relationship categories
- P.Pr.Se_H -- The posterior probability of trio relationship Se_H, see SNPPIT documentation for details on relationship categories
- P.Pr.F_Si -- The posterior probability of trio relationship F_Si, see SNPPIT documentation for details on relationship categories
- P.Pr.Si_F -- The posterior probability of trio relationship Si_F, see SNPPIT documentation for details on relationship categories
- P.Pr.F_U -- The posterior probability of trio relationship F_U, see SNPPIT documentation for details on relationship categories
- P.Pr.U_F -- The posterior probability of trio relationship U_F, see SNPPIT documentation for details on relationship categories
- P.Pr.F_F -- The posterior probability of trio relationship F_F, see SNPPIT documentation for details on relationship categories
- KidMiss -- The number of ungenotyped loci in the kid of a trio.
- PaMiss -- The number of ungenotyped loci in the pa of a trio.
- MaMiss -- The number of ungenotyped loci in the ma of a trio.
- MI.Kid.Pa -- The number of Mendelian incompatibilities between the kid and the pa in a trio.
- MI.Kid.Ma -- The number of Mendelian incompatibilities between the kid and the ma in a trio.
- MI.Trio -- The total number of Mendelian incompatibilities in a trio.
Code
The R code used to run the analyses can be found at:
https://github.com/abeulke/Beulke_et_al_2023_Molecular_Ecology/