Genetic parentage reveals the (un)natural history of Central Valley Hatchery steelhead
Data files
Mar 11, 2024 version files 61.13 MB
-
cvsh.csv
14.80 MB
-
cvsh.rds
45.71 MB
-
omy05_genos.csv
618.36 KB
-
README.md
5.92 KB
Abstract
Populations composed of individuals descended from multiple distinct genetic lineages often feature significant differences in phenotypic frequencies. We considered hatchery production of steelhead, the migratory anadromous form of the salmonid species Oncorhynchus mykiss, and investigated how differences among genetic lineages and environmental variation impacted life history traits. We genotyped 23,670 steelhead returning to the four California Central Valley hatcheries over nine years from 2011–2019, confidently assigning parentage to 13,576 individuals to determine the age and date of spawning, and rates of iteroparity and repeat spawning within each year. We found steelhead from different genetic lineages showed significant differences in adult life-history traits despite inhabiting similar environments. Differences between coastal and Central Valley steelhead lineages contributed to significant differences in age at return, timing of spawning, and rates of iteroparity amongst programs. In addition, adaptive genomic variation associated with life history development in this species varied among hatchery programs and was associated with the age of steelhead spawners only in the coastal lineage population. Environmental variation likely contributed to variation in phenotypic patterns observed over time, as our study period spanned both a marine heatwave and a serious drought in California. Our results highlight evidence of a strong genetic component underlying known phenotypic differences in life history traits between two steelhead lineages.
README: California Central Valley Hatchery steelhead broodstock SNP parentage, 2011-2019
This dataset contains the metadata and SNP genotypes for the steelhead sampled at the four California Central Valley (CCV) from 2011-2019.
We reconstructed pedigrees, conducted population genetic analyses, analyzed life history characteristics (age at spawning, iteroparity, timing of spawning), and considered Omy05 genotype and phenotype. Metadata records, SNP genotype data, pedigree analysis results, and R notebooks were used to complete the analyses.
The use of these data and R code will enable the reproduction of all presented results.
We found steelhead from different genetic lineages showed significant differences in adult life-history traits despite inhabiting similar environments. Differences between coastal and Central Valley steelhead lineages contributed to significant differences in age at return, timing of spawning, and rates of iteroparity amongst programs. In addition, adaptive genomic variation associated with life history development in this species varied among hatchery programs and was associated with the age of steelhead spawners only in the coastal lineage population.
Description of the data and file structure
Steelhead were collected at the four Central Valley hatchery programs and fin clipped. Length was recorded (LENGTH), and samples were assigned a sample ID (SAMPLE_ID).
Primary data files:
These data are the base of all analyses for this manuscript.
- Unprocessed metadata and genotypes (cvsh.csv and cvsh.rds) are the metadata and genotypes for steelhead from the four California Central Valley steelhead programs from 2011-2019. SNPs are in a two-column format (one column per allele).
Metadata columns and descriptions (missing data reported as NA):
NMFS_DNA_ID -- Unique ID given to each sample
GENOTYPE_NUMBER -- Genotype replicas are recorded here
BOX_ID -- Box number where the DNA sample is stored.
BOX_POSITION -- Position in the plate where there tissue sample is stored
SAMPLE_ID -- ID associated with a sample before reaching our lab
BATCH_ID -- Batch number given to a related group of samples
PROJECT_NAME -- Project name tied to the DNA samples
GENUS -- Genus of the sample
SPECIES -- Species of the sample
LENGTH -- Fork length measurements, millimeters (mm)
WEIGHT -- Body weight, grams (g)
SEX -- Phenotypic sex, recorded at the time of sample collection
AGE -- Age of the specimen, if provided with sample
REPORTED_LIFE_STAGE -- Life stage information provided with sample
PHENOTYPE -- Miscellaneous physical characteristics provide with sample
HATCHERY_MARK -- Information on the presence/absence of fin clips provided with sample
TAG_NUMBER -- Information on fish tags provided with sample
COLLECTION_DATE -- Date the sample was collected, which is also the fish's spawn date
CollectionYear -- Brood year the sample was collected (samples collected in November or December are in the same brood year as those in the new calendar year)
ESTIMATED_DATE -- If "yes" in this column, the collected date was estimated
PICKER -- Initials of the person preparing the samples for DNA extraction
PICK_DATE -- Date the sample was prepared for DNA extraction
LEFTOVER_SAMPLE -- If "no" or "none" in this column, there is no remaining tissue sample
SAMPLE_COMMENTS -- Comments related to the sample
NMFS_DNA_ID_1 -- repeat of the unique identifier
STATE_F -- State where the sample was collected
COUNTY_F -- County where the sample was collected
WATERSHED -- Watershed where the samples were collected
TRIB_1 -- Additional tributary information about the collection
TRIB_2 -- Additional tributary information about the collection
WATER_NAME -- Name of river or creek where the sample was collected
REACH_SITE -- Additional reach information about the collection
HATCHERY -- Hatchery where the sample was collected
SNPPIT.Descriptor -- Hatchery acronym
STRAIN -- Additional information on the strain of the sample collected
LATITUDE_F -- Geolocation information
LONGITUDE_F -- Geolocation information
LOCATION_COMMENTS_F -- Location comments
Sample_ID -- Additional sample information
Columns AN through HQ are genotype columns (missing data reported as 0) for SNP loci. SNPs are in two-column format (one column per allele), with alleles listed as numbers (A=1, C=2, 3=G, 4=T)
Additional data files:
- Omy05 genotypes are in a separate file because Omy_R04944 was added to the panel in 2015. Metadata columns included are:
NMFS_DNA_ID -- Unique ID given to each sample (conserved from cvsh.rds/cvsh.csv)
SEX -- Phenotypic sex, recorded at the time of sample collection
Genetic_Sex -- Genetic sex calculated from Sex_ID genotypes
Sex -- Consensus sex from genetic and phenotypic sex.
COLLECTION_DATE -- Date the sample was collected, which is also the fish's spawn date
CollectionYear -- Brood year the sample was collected (samples collected in November or December are in the same brood year as those in the new calendar year)
Omy_R04944 and Omy_R04044_1 -- Omy05 genotype from SNP locus Omy_R04944. SNPs are in two-column format (one column per allele), with alleles listed as numbers (A=1, C=2, 3=G, 4=T)
SNPPIT.Descriptor -- Hatchery acronym
Reproducible R scripts to process these data for the analyses featured in the manuscript are included in the scripts/ directory. These scripts are numbered in chronological order for these analyses and include downloads for necessary packages.
Tables cited in the manuscript are in the tables/ directory, and figures cited in the manuscript are in the final_figures/ directory.
The additional/ directory includes some shapefiles used to generate Figure 1. Download information for these shapefiles is provided in the map figure script (7_map.Rmd). The final version of Figure 1 was compiled using Adobe Illustrator.
Sharing/Access information
NA
Methods
SNP loci, genotyping, and basic population genetics analysis
Samples were genotyped with a panel of 96 biallelic SNP markers (Abadía-Cardoso et al. 2013), including a Y chromosome-linked marker to determine genetic sex (Brunelli et al. 2008). However, the marker composition of the panel varied slightly over time, with 92 loci genotyped across all years of the study; markers not typed across all years were removed from downstream analyses (Table S1). All individuals were genotyped using TaqMan assays (Applied Biosystems) on 96.96 Dynamic Genotyping Arrays with the EP1 Genotyping System (Fluidigm Corporation) following the manufacturer’s protocols. Two negative controls were included in each array, and genotypes were called using SNP GENOTYPING ANALYSIS SOFTWARE V 3.1.1 (Fluidigm).
To evaluate genotyping error rates for each SNP marker we inferred parent-offspring trios using parentage analysis (see below) and estimated the minimum genotyping error rate expected to produce the Mendelian incompatibilities observed at each marker across the trios. Of the 23,670 genotyped samples, 83 yielded low-quality genotypes after the initial round of genotyping (indicated primarily by large fractions of missing genotypes). These samples were re-genotyped. Any individuals missing more than 10% of loci (fewer than 82 successful genotype calls) were identified and removed.