Data from: A dominance hypothesis argument for historical genetic gains and the fixation of heterosis in octoploid strawberry
Data files
Oct 16, 2024 version files 63.89 MB
-
README.md
8.89 KB
-
Supplemental_File_S1_Pedigree_H_MPH_BPH_WPH_6-01-2024.xlsx
240.71 KB
-
Supplemental_File_S2_50K_SNP_Array_Genotypic_Data.txt
34.96 MB
-
Supplemental_File_S3_Individual_Harvest_Yield_Count_Weight.csv
1.62 MB
-
Supplemental_File_S4_Individual_Harvest_TSS_TA_ANC.csv
221.54 KB
-
Supplemental_File_S5_Individual_Harvest_Firnness.csv
478.33 KB
-
Supplemental_File_S6_Cumulative_Harvest_Yield_Count_Weight.csv
156.36 KB
-
Supplemental_File_S7_ROSETTA_STONE.xlsx
11.56 KB
-
Supplemental_File_S8_GWAS.xlsx
26.20 MB
Abstract
Heterosis was the catalyst for the domestication of cultivated strawberry (Fragaria × ananassa), an interspecific hybrid species that originated in the 1700s. The hybrid origin was discovered because the phenotypes of spontaneous hybrids transgressed those of their parent species. The transgressions included fruit yield increases and other genetic gains in the twentieth century that sparked the global expansion of strawberry production. The importance of heterosis to the agricultural success of the hybrid species, however, has remained a mystery. Here we show that heterosis has disappeared (become fixed) among improved hybrids within a population (the California population) that has been under long-term selection for increased fruit yield, weight, and firmness. We found that the highest yielding hybrids are among the most highly inbred (59-79%), which seems counterintuitive for a highly heterozygous, outbreeder carrying heavy genetic loads. Although faint remnants of heterosis were discovered, the between-parent allele frequency differences and dispersed favorable dominant alleles necessary for heterosis have decreased nearly genome-wide within the California population. Conversely, heterosis was prevalent and significant among wide hybrids, especially for fruit count, a significant driver of genetic gains for fruit yield. We attributed the disappearance (fixation) of heterosis within the California population to increased homozygosity of favorable dominant alleles and inbreeding associated with selection, random genetic drift, and selective sweeps. Despite historical inbreeding, the highest yielding hybrids reported to-date are estimated to be heterozygous for 20,370-44,280 of the estimated 97,000-108,000 genes in the octoploid genome, the equivalent of an entire diploid genome or more.
https://doi.org/10.5061/dryad.866t1g20j
Description of the data and file structure
These data were collected as part of a study of hybrid vigor (heterosis) in cultivated strawberry, an octoploid hybrid species. This repository holds a single supplemental figure, the raw genotypic and phenotypic data collected as part of this study, and tabular files with statistics estimated from raw data.
Supplemental Figure S1. Genetic diversity of the post-1970 subset of California population individuals (n = 769) depicted in Fig. 1. The genetic relationship matrix used an input for the principal component analysis was estimated from a genome-wide sample of 28,513 single nucleotide polymorphisms (SNPs) genotyped with a 50K Axiom array. (A) This scatterplot shows Principal component 1 (PC1) and principal component 2 (PC2) coordinates for 27 elite parents and 356 elite × elite hybrids developed for the heterosis study. Dark blue points and labels identify photoperiod insensitive (day-neutral flowering) individuals, whereas light blue points and labels identify photoperiod sensitive (short-day flowering) individuals. (B) This scatterplot shows PC1 and PC2 coordinates for the 27 elite parents (light and dark blue labels and points), 356 elite × elite hybrids (gray points), and 385 additional post-1970 California population hybrid individuals (gray points).
Supplemental File S1. Heterozygosities, pedigrees, and phenotypic means for 27 elite parents, three exotic parents, 113 elite × exotic hybrids, and 356 elite × elite hybrids and mid-parent and best-parent heterosis ratios for every parent-hybrid combination. Best parent heterosis (BPH), mid parent heterosis (MPH) and worst parent heterosis (WPH) are defined in the the article titled "A dominance hypothesis argument for historical genetic gains and the fixation of heterosis in octoploid strawberry."
71 columns x 530 rows, each row corresponds to a diferent hybrid individuals identified in column 1 and 2.
[3 & 9] indicates the type of hybrid by flowering type (3) and elite vs exotic status (9), which are described in detail in the associated article.
[4 - 8] biparental family identifiers, corresponding to the parent IDs and alias in 5-8.
"Family" is the 6 digits family identifier,
"Mother ID" are the 10 digit UC identifiers for the maternal parents,
"Father ID" are the 10 digit UC identifiers for the paternal parents,
"Mother Alias" are the cultivar named or 10 digita UC ID for the maternal parents,
"Father Alias" are the cultivar named or 10 digita UC ID for the paternal parents,
[10 - 12] genotypic pedigree validation,
Column 10 and 11 are the genetic validated pedigree assignments for hybrids in column 1 using the methods presented in Pincot et al (2021) (https://doi.org/10.1093/g3journal/jkab015).
Column 12 is the corresponding trio transgression ratio defined in the material and methods using the methods presented in Pincot et al (2021) (https://doi.org/10.1093/g3journal/jkab015).
[13 - 14] Observed heterozygosity (%) and F_i inbreeding estimates for all hybrid individuals.
[15] PF locus genotype, pfpf = Seasonal Flowering or "short day", homozygote, PFpf = DN heterozygote perpetual flowering or "day neutral", PFPF = DN homozygote perpetual flowering or "day neutral"
[16-23] Estimated marginal means (EMMs) from linear mixed model analyses for Yield (g/plant), Count (fruit), Weight (g/fruit), TSS (%), TA (%), TSS/TA (ratio), anthocyanin concentration (ANC; µg/mL), and Firmness (kg-Force).
[24-31] Best parent heterosis (BPH) estimates for Yield, Count, Weight, TSS, TA, TSS/TA, Anc, and Firmness. BPH is a ratio.
[32-39] Mid parent heterosis (MPH) estimates for Yield, Count, Weight, TSS, TA, TSS/TA, Anc, and Firmness. MPH is a ratio.
[40-47] True or False indicators for whether or not the BPH estiamtes were significant.
[48-55] True or False indicators for whether or not the MPH estiamtes were significant.
[56-63] Worst parent heterosis (WPH) estimates for Yield, Count, Weight, TSS, TA, TSS/TA, Anc, and Firmness. WPH is a ratio.
[64-71] True or False indicators for whether or not the WPH estiamtes were significant.
Supplemental File S2. Genotypes for 28,523 SNP loci among 530 individuals.
Individuals (10 digit UC ID) were genotyped using a 50K Axiom array (Hardigan et al 2020; https://doi.org/10.3389/fpls.2019.01789)
Data are a matrix with 530 rows and 28,523 columns.
-1 represent allele 1 bomozygotes, 0 are heterozygotes, 1 are allele 2 homozygotes.
Supplemental File S3. Fruit yield (g/plant), count (fruit), and weight (g/fruit) phenotypes for 29 parents, 31 S1 (selfed individuals), and 469 hybrids (outcrossed individuals) from 11-13 harvests/year, three clonal replicates/entry/years, and two years. One of the exotic parents (Alias = 'Oso Flaco', ID = 55C023P001) was not phenotyped.
ID (10 digit UC ID) is the entry name.
Block is the block ID (nested in trial year)
Harvest is the time stamp for the week of the trial.
Yield is the g / plant from each plot
Count is the fruit / plant from each plot
Weight is the g / fruit from each plot
Trial year indicates one of two years that this trial was conducted.
Supplemental File S4. Fruit total soluble solids (%), titratable acidity (%), and anthocyanin concentration (ANC; µg/mL) phenotypes for for29 parents, 31 S1 (selfed individuals), and 469 hybrids (outcrossed individuals) from 11-13 harvests/year, three clonal replicates/entry/years, and two years.
Entry ID (10 digit UC ID) is the entry name.
Block is the block indicator (nested in trial year)
Harvest is the time stamp for the week of the trial.
TSS is the total soluble solids / brix estimate (%)
TA is titratable acidity (%)
TSS/TA is the ratio of TSS and TA.
ANC is the pelargonidin 3,O glucoside concentration.
Year indicates one of two years that this trial was conducted.
Supplemental File S5. Fruit firmness (g-force) phenotypes for 29 parents, 31 S1 (selfed individuals), and 469 hybrids (outcrossed individuals) from 11-13 harvests/year, three clonal replicates/entry/years, and two years.
Entry ID (10 digit UC ID) is the entry name.
Block is the block indicator (nested in trial year)
Harvest is the time stamp for the week of the trial.
Firmness is the maximum force measured by that TA.XT plus texture analyzer in kg-force. each measurement corresponds to a unique fruit.
Season indicates one of two years that this trial was conducted.
Supplemental File S6. Cumulative fruit yield (g/plant) phenotypes for for 29 parents, 31 S1 (selfed individuals), and 469 hybrids (outcrossed individuals) summed over 11-13 harvests/year and three clonal replicates/entry/year within and between years.
ID (10 digit UC ID) is the entry name.
Block is the block indicator (nested in trial year).
Yield is the cumulative g/plant from each plot.
Count is the cumulative fruit /plant from each plot.
Weight is the average g / fruit from each plot.
Year indicates one of two years that this trial was conducted..
Supplemental File S7. Rosetta Stone for cross-referencing linkage group and chromosome nomenclatures.
All columns represent the linkage group and chromosome nomenclature for the given citations in the column names.
Supplemental File S8. GWAS statistics displayed in Fig. 3 Manhattan plots for fruit yield, count, weight, firmness, total soluble solids (TSS), titratable acidity (TA), TSS/TA (sugar-to-acid ratio), and anthocyanin concentration (ANC).
8 tabs corresponding to 8 different GWAS analyses (yield, TSS, TA, size = weight, TSS-TA Ratio, Firmness, Count, and ANC as defined amove.).
Columns are:
Marker are the (Single Nucleotide Polymorphism) SNP probe IDs
Chromosome and Position are the genome coordinates of the particular Marker
Allele 1 and Allele 2 are the TOP and BOTTOM alleles from the FanaSNP 50K Array.
Allele 1 Frequency is the frequency of Allele 1.
Beta year 1 All hybrids and Beta year 2 All hybrids are the linear regression coefficients the trait on the marker genotypes for the Elite + Expanded population.
p-value All hybrids and FDR log10 p-value all hybrids are the p-values from GEMMA and the p-values from the GEMMA software after FDR adjustments and transformed to the -log10() scale for all hybrids
Beta year 1 Elite hybrids and Beta year 2 Elite hybrids are the linear regression coefficients the trait on the marker genotypes for the Elite population.
p-value All hybrids and FDR log10 p-value all hybrids are the p-values from GEMMA and the p-values from the GEMMA software after FDR adjustments and transformed to the -log10() scale for elite hybrids only.
The methods are described in detail in the manuscript.
