Further evidence from common garden rearing experiments of heritable traits separating lean and siscowet lake charr (Salvelinus namaycush) ecotypes
Data files
May 03, 2022 version files 54.15 MB
-
DataS1_RDACanidateLoci.fa
-
DataS2_QTL_Genes__NCBI_Salvelinus_namaycush_Annotation_Release_100__2021-01-13.csv
-
DataS3_biodat.csv
-
DataS4_master_metadata_10-19-21.csv
-
DataS5_LT_90_parsed_genotype_calls.txt.gz
-
DataS6_pops_hdplotmac3nohapsind80mm90.maf_new.vcf.gz
-
DataS7_F1-biochemical-lipid.csv
-
README
Abstract
Genetic evidence of selection for complex and polygenically regulated phenotypes can easily become masked by neutral population genetic structure and phenotypic plasticity. Without direct evidence of genotype-phenotype associations, it can be difficult to conclude to what degree a phenotype is heritable or a product of environment. Common garden laboratory studies control for environmental stochasticity and help to determine the mechanism that regulates traits. Here we assess lipid content, growth, weight, and length variation in full and hybrid F1 crosses of deep and shallow water sympatric lake charr ecotypes reared for nine years in a common garden experiment. Redundancy analysis (RDA) and quantitative-trait-loci (QTL) genomic scans are used to identify associations between genotypes at 19,714 single nucleotide polymorphisms (SNPs) aligned to the lake charr genome and individual phenotypes to determine the role that genetic inheritance plays in ecotype phenotypic diversity. Lipid content, growth, length, and weight differed significantly among lake charr crosses throughout the experiment suggesting that pedigree plays a large role in lake charr development. Polygenic scores of 15 SNPs putatively associated with lipid content and/or condition factor indicated that ecotype distinguishing traits are polygenically regulated and additive. A QTL identified on chromosome 38 contained >200 genes, some of which were associated with lipid metabolism and growth, demonstrating the complex nature of ecotype diversity. The results of our common garden study further indicate that lake charr ecotypes observed in nature are pre-determined at birth and that ecotypes differ fundamentally in lipid metabolism and growth.
Methods
Lipid content:
From 2015-2019, somatic lipid content (as a percent) was measured for each fish using a battery powered, handheld microwave oscillator (Distell Model 692 Fish Fatmeter, Distell Inc.). The fatmeter emits a low-powered microwave (2 GHz, 2000 MHz, power 2 mW) that interacts with water within the somatic tissues and uses the inverse relationship between water and lipid to estimate the lipid concentration (as a percent) in the tissue (Crossin and Hinch, 2005; Kent, 1990). Fatmeter readings taken on the Research-1 setting were collected on each fish at a site that was in the epaxial muscle mass just posterior to the head (site "S1" as shown in Sitar et al., 2020). From 2012 to 2014 the fish were too small to use the fatmeter on, but a subsample (n2012=12; n2013-2014=20) of fish from each cross were lethally sampled and a muscle sample (cross-section of the epaxial muscle behind the head and anterior to the dorsal fin) was excised and used for lipid analysis by Soxhlett extraction as described in Goetz et al. (2014) and Sitar et al. (2020).
Pedigree:
Parentage of all individuals was assigned using genotypes from a six-microsatellite marker panel amplified and genotyped for all parents and offspring using previously designed primers for Loci SnaMSU 01, 03, 06, 10, 11, and 12 (Rollins et al., 2009). Pedigrees were reconstructed using Colony2 Version 2.0.6.5 and a full-likelihood approach specifying maternal polygamy without inbreeding and no sibship prior (Jones & Wang, 2010).
Genotype data:
Restriction site associated DNA sequencing (RAD-seq) was conducted on 74 parents and 542 F1 offspring using SbfI and bestRAD library protocols outlined in Ali et al. (2016). An initial library was sequenced at BGI America (Cambridge, MA) on one lane of a HiSeq4000 and the remainder of the libraries were sent to Novogene (Sacramento, CA) where they were sequenced on seven HiSeq4000 lanes for paired-end 150 sequencing. Identification of SNPs and genotyping were conducted in STACKS v.2.3 using the de novo assembly pipeline (Rochette et al., 2019). Samples were demultiplexed with process_radtags (flags = c, -q, -r, -t 140). Stacks of similar sequences (loci) for each individual were identified with ustacks (flags = -m 3, -M 5, -H –max_locus_stacks 4, --model_type bounded, --bound_high 0.05) and a catalog of putative loci was generated based on sequences from the parents. Individual stacks were then aligned to the catalog in sstacks, and genotypes for all putative SNPs were assigned using gstacks. Finally, a datafile containing genotypes for all SNPs with a minor allele count greater than two and all individuals was generated using populations and subsequently filtered in VCFTools (Danecek, et al., 2011). To ensure that paralogs did not influence our findings, we ran HDPlot on unfiltered data for all crosses and removed loci identified as potential paralogs (McKinney et al., 2017). Parameters for this analysis were set by visually choosing threshold values for read depth ratio and proportion of heterozygotes that identified the loci conforming to theoretical expectations for singletons (McKinney et al., 2017; supplemental figure 1). Once putative paralogs were removed, quality filtering was conducted in VCFTools hierarchically. First, all individuals with more than 80% missing data were removed from the dataset. Second, SNPs missing more than 10% of genotypes were removed. Finally, for loci that contained more than one SNP, the SNP with the highest minor-allele-frequency was retained leaving a single SNP per-locus. All sites that passed the above thresholds were retained in downstream analysis. All bioinformatic analysis was conducted using the Turing High Performance Computing cluster at Old Dominion University, Virginia.