Leveraging genetic resources and genomic prediction to enhance flavonol content in cranberry fruit
Data files
Jul 11, 2025 version files 15.17 MB
-
Supplemental_TableS1.csv
104.61 KB
-
Supplemental_TableS2.csv
4.54 KB
-
Supplemental_TableS3.csv
46.85 KB
-
README.md
7.38 KB
-
Supplemental_DataS1.txt
14.81 MB
-
Supplemental_DataS2.txt
204.46 KB
Abstract
American cranberry fruit (Vaccinium macrocarpon Ait.) are rich in flavonols, a subgroup of flavonoids that contribute to human health and plant stress resilience. Despite their importance, the genetic diversity and potential for improvement of flavonols in cranberry remain underexplored. We analyzed phenotypic and genetic variation for eight flavonol compounds in a genetically diverse germplasm collection (n = 247) over two years. Myricetin-3-galactoside, quercetin-3-galactoside, quercetin-3-rhamnoside (Q3Rha), and quercetin-3-arabinofuranoside represented 87% of total flavonol content (TFC) in cranberry fruit, with TFC ranging from 0.17 to 0.75 mg/g FW. Wild and landrace accessions in the Rutgers cranberry collection exhibited higher genetic variation and breedability for flavonols than the breeding subgroup. We identified a stable locus on chromosome 3 associated with Q3Rha, explaining 15.8–19.7% of the genetic variance. Genomic prediction models for TFC exhibited predictive ability varying between 0.12 and 0.26, with combined data from both years yielding the highest values. Although further investigation is needed to improve prediction accuracy, simulated crosses showed similar outcomes between phenotypic and genomic selection. These findings elucidate the genetic architecture of flavonols and show the potential of genomic tools to enhance flavonol content and fruit quality in cranberry breeding programs.
https://doi.org/10.5061/dryad.7wm37pw45
Description of the data and file structure
Supplemental Table S1. Rutgers cranberry germplasm collection (n = 247), including 103 wild, 118 native selections or landraces, 16 cultivars, and 10 breeding lines. The entries marked “n/a” represent missing data.
Variables
- Accession ID: Passport identifier.
- Alias: Known alias.
- Location/Country or State:Site and Country or U.S. state in which the accession was collected or bred.
- Germplasm Type: Material designation (wild population, landrace, cultivar, or breeding line).
- Birth or Collection Year: Year of collection or crossing.
- Genotyped: Whether the accession was genotyped with the 17K cranberry Flex-Seq platform.
- Hierarchical Cluster: Group designation based on clustering analysis from flavonol estimated marginal means (EMMs) in combination with single-nucleotide polymorphisms (SNPs).
- Raw Phenotypic Data (mg/g FW): Total flavonol content (TFC), myricetin-3-galactoside (My3Gal), myricetin-3-arabinoside (My3Ara), quercetin-3-glucoside (Q3Gal), quercetin-3-xylopyranoside (Q3Xyl), quercetin-3-arabinopyranoside (Q3Arap), quercetin-3-arabinofuranoside (Q3Araf), and quercetin-3-rhamnoside (Q3Rha) taken in 2016 and 2017.
- EMM Data: TFC, My3Gal, My3Ara, Q3Gal, Q3Xyl, Q3Arap, Q3Araf, and Q3Rha EMMs for combined years.
- GEBV Data: TFC genomic estimated breeding values (GEBVs) for combined years.
Supplemental Table S2. Identification of accessions with high flavonol content in cranberry fruit (n = 78) using the median distribution of estimated marginal means for combined years as a threshold.
Variables
- Accession ID: Passport identifier.
- Alias: Known alias.
- Germplasm Type: Material designation (wild population, landrace, cultivar, or breeding line).
- Flavonol Content: “YES” or “NO” indicate whether the accession qualified as “high-flavonol” for total flavonol content (TFC), myricetin-3-galactoside (My3Gal), myricetin-3-arabinoside (My3Ara), quercetin-3-glucoside (Q3Gal), quercetin-3-xylopyranoside (Q3Xyl), quercetin-3-arabinopyranoside (Q3Arap), quercetin-3-arabinofuranoside (Q3Araf), and quercetin-3-rhamnoside (Q3Rha).
Supplemental Table S3. Genomic region underlying the Q3Rha locus on chromosome 3, detailing the most significant markers (FDR-corrected p < 0.01), the genes within the Q3Rha locus, and their corresponding homologs in Arabidopsis thaliana. The entries marked “n/a” represent missing data.
Variables
- Category: Whether the entry is a SNP marker or annotated gene based on the reference genome ‘Ben Lear’ v1.0.
- Identifier: SNP or gene ID based on the reference genome ‘Ben Lear’ v1.0.
- Position (bp): Physical position of the SNP or annotated gene based on the reference genome ‘Ben Lear’ v1.0.
- AGI Homolog: Best-matching unique Arabidopsis Genome Initiative (AGI) gene code for the annotated gene in the reference genome ‘Ben Lear’ v1.0.
- Amino Acid Identity (%): Percentage of identical amino acid residues in the highest‑scoring local BLASTP alignment between the annotated ‘Ben Lear’ v1.0 protein and its best-match sequence in the Arabidopsis thaliana database.
- AGI Homolog Function: Gene function annotation from Arabidopsis thaliana.
- GO Term: Gene ontology annotation from Arabidopsis thaliana.
Supplemental Data S1. The study used single-nucleotide polymorphism (SNP) data from 234 cranberry accessions. The alignment of 17K raw cranberry Flex-Seq data with the reference genome ‘Ben Lear’ v1.0 yielded 24,154 SNPs for analysis.
Supplemental Data S2. Mapping population built from 218 genotypes derived from a cross originating in the Rutgers breeding program. A total of 6,464 SNPs were mapped in 12 linkage groups using the R package MAPpoly (Mollinari et al., 2020).
Supplemental Figures: Uploaded to Zenodo
Supplemental Figure S1. Linkage disequilibrium (LD) decay among the 5,099 SNPs used in the genome-wide association analysis for flavonols, with a threshold of r² = 0.3 applied to estimate the decay distance.
Supplemental Figure S2. Phenotypic correlations and estimated marginal mean differences for myricetin-3-galactoside (My3Gal; A-B), myricetin-3-arabinoside (My3Ara; C-D), quercetin-3-galactoside (Q3Gal; E-F), quercetin-3-glucoside (Q3Glc; G-H), quercetin-3-xylopyranoside (Q3Xyl; I-J), quercetin-3-arabinopyranoside (Q3Arap; K-L), quercetin-3-arabinofuranoside (Q3Araf; M-N), and quercetin-3-rhamnoside (Q3Rha; O-P) across the germplasm collection, breeding lines, landraces, and wild accessions over two years, 2016 and 2017.
Supplemental Figure S3. Dendrogram illustrates the hierarchical clustering of 234 germplasm accessions based on genotypic data of 24,154 SNPs mapped to the ‘Ben Lear’ v1.0 reference genome and eight flavonols: myricetin-3-galactoside, myricetin-3-arabinoside, quercetin-3-galactoside, quercetin-3-glucoside, quercetin-3-rhamnoside, quercetin-3-xylopyranoside, quercetin-3-arabinopyranoside, and quercetin-3-arabinofuranoside. Five distinct groups are color-coded: I (orange), II (green), III (brown), IV (blue), and V (purple). The scale bar indicates the genetic distance.
Supplemental Figure S4. Manhattan plots display the genome-wide association study results in the Rutgers germplasm collection (n = 234) for A) total flavonol content, B) myricetin-3-galactoside, C) myricetin-3-arabinoside, D) quercetin-3-galactoside, E) quercetin-3-glucoside, F) quercetin-3-xylopyranoside, G) quercetin-3-arabinopyranoside, and H) quercetin-3-arabinofuranoside The results for 2016 are presented at the top, and those for 2017 are presented at the bottom. The y-axis represents the negative logarithm of the p-values adjusted for the false discovery rate (FDR), with a horizontal dashed line indicating the significance threshold at 0.05.
Supplemental Figure S5. Manhattan plots display the genome-wide association study results in the wild subgroup (n = 102) for A) total flavonol content, B) myricetin-3-galactoside, C) myricetin-3-arabinoside, D) quercetin-3-galactoside, E) quercetin-3-glucoside, F) quercetin-3-xylopyranoside, G) quercetin-3-arabinopyranoside, and H) quercetin-3-arabinofuranoside. The results for 2016 are presented at the top, and those for 2017 are presented at the bottom. The y-axis represents the negative logarithm of the p-values adjusted for the false discovery rate (FDR), with a horizontal dashed line indicating the significance threshold at 0.05.
Supplemental Figure S6. Manhattan plots display the genome-wide association study results in the landrace subgroup (n = 99) for A) total flavonol content, B) myricetin-3-galactoside, C) myricetin-3-arabinoside, D) quercetin-3-galactoside, E) quercetin-3-glucoside, F) quercetin-3-xylopyranoside, G) quercetin-3-arabinopyranoside, and H) quercetin-3-arabinofuranoside. The results for 2016 are presented at the top, and those for 2017 are presented at the bottom. The y-axis represents the negative logarithm of the p-values adjusted for the false discovery rate (FDR), with a horizontal dashed line indicating the significance threshold at 0.05.