Data from: Genomic approaches to accelerate American chestnut restoration
Data files
Dec 18, 2025 version files 292.51 MB
-
OxO_neg_2024_R1T14.jpg
4.45 MB
-
OxO_neg_2024_R1T6.jpg
5.06 MB
-
OxO_pos_2023_R10T10.jpg
5.24 MB
-
OxO_pos_2023_R1T8.jpg
5.13 MB
-
OxO_pos_2023_R7T13.jpg
4.99 MB
-
OxO_pos_2024_R1T29.jpg
4.19 MB
-
OxO_pos_2024_R1T45.jpg
4.40 MB
-
OxO_pos_2024_R1T53.jpg
3.84 MB
-
OxO_pos_2024_R2T5.jpg
4.34 MB
-
OxO_pos_2024_R2T57.jpg
3.75 MB
-
README.md
39.54 KB
-
SIData1.blight.phenotypes.all.oct2025.v4.txt
1.30 MB
-
SIData10.comb.bed.txt
17.39 MB
-
SIData11.dentata.pangenes.txt
15.49 MB
-
SIData12.combinedWindowedBlockCoordinates.txt
92.68 KB
-
SIData13.chestnut.rnaseq.metadata.v2.txt
53.98 KB
-
SIData14.dentata.blight.rnaseq.count.matrix.txt
11.79 MB
-
SIData15.mollissima.blight.rnaseq.count.matrix.txt
10.33 MB
-
SIData16.blight.rnaseq.within.between.species.allele.contrasts.3d.10d.txt
44.69 MB
-
SIData17.mol.v.den.copy.number.expansion.gene.expression.txt
30.85 KB
-
SIData18.mollissima.dentata.degs.kegg.enrichment.txt
169.13 KB
-
SIData19.chestnut.metabolite.summary.oct25.txt
16.24 KB
-
SIData2.tacf.gbs.genotypes.imputed.filtered.vcf.gz
105.13 MB
-
SIData20.fungal.growth.metabolites.rawdata.txt
3.81 KB
-
SIData21.tacf.root.rot.phenotypic.rawdata.2024.txt
1.71 MB
-
SIData22.root.rot.phenotypic.summary.oct2025.txt
74.02 KB
-
SIData23.ahmm.genotypes.qtl.nov2024.csv
8.91 MB
-
SIData24.ahmm.qtl.gmap.nov2024.csv
26.57 KB
-
SIData25.qtl.effects.all.oct25.v4.csv
137.84 KB
-
SIData26.blight.prr.qtl.gwas.intervals.metadata.oct2025.v4.csv
13.90 KB
-
SIData27.blight.root.rot.gwas.pvalues.effects.oct25.txt
12.39 MB
-
SIData28.tacf.blight.root.rot.height.blups.inoculated.11.1.2025.txt
802.74 KB
-
SIData29.chestnut.genomic.selection.simulation.outputs.txt
12.06 MB
-
SIData3.tacf.castanea.hybrid.species.ancestry.txt
559.94 KB
-
SIData4.gmbigxhorn.genetic.map.txt
55.02 KB
-
SIData5.blockCoords.qtl.gwas.oct25.txt
3.18 MB
-
SIData6.tacf.pedigree.2024.txt
463.46 KB
-
SIData7.progeny.blight.resistance.data.txt
122.04 KB
-
SIData8.scrivener.ssa.2025.csv
16.17 KB
-
SIData9.darling54.blight.height.phenotypes.purdue.txt
42.10 KB
Dec 18, 2025 version files 292.51 MB
-
OxO_neg_2024_R1T14.jpg
4.45 MB
-
OxO_neg_2024_R1T6.jpg
5.06 MB
-
OxO_pos_2023_R10T10.jpg
5.24 MB
-
OxO_pos_2023_R1T8.jpg
5.13 MB
-
OxO_pos_2023_R7T13.jpg
4.99 MB
-
OxO_pos_2024_R1T29.jpg
4.19 MB
-
OxO_pos_2024_R1T45.jpg
4.40 MB
-
OxO_pos_2024_R1T53.jpg
3.84 MB
-
OxO_pos_2024_R2T5.jpg
4.34 MB
-
OxO_pos_2024_R2T57.jpg
3.75 MB
-
README.md
39.51 KB
-
SIData1.blight.phenotypes.all.oct2025.v4.txt
1.30 MB
-
SIData10.comb.bed.txt
17.39 MB
-
SIData11.dentata.pangenes.txt
15.49 MB
-
SIData12.combinedWindowedBlockCoordinates.txt
92.68 KB
-
SIData13.chestnut.rnaseq.metadata.v2.txt
53.98 KB
-
SIData14.dentata.blight.rnaseq.count.matrix.txt
11.79 MB
-
SIData15.mollissima.blight.rnaseq.count.matrix.txt
10.33 MB
-
SIData16.blight.rnaseq.within.between.species.allele.contrasts.3d.10d.txt
44.69 MB
-
SIData17.mol.v.den.copy.number.expansion.gene.expression.txt
30.85 KB
-
SIData18.mollissima.dentata.degs.kegg.enrichment.txt
169.13 KB
-
SIData19.chestnut.metabolite.summary.oct25.txt
16.24 KB
-
SIData2.tacf.gbs.genotypes.imputed.filtered.vcf.gz
105.13 MB
-
SIData20.fungal.growth.metabolites.rawdata.txt
3.81 KB
-
SIData21.tacf.root.rot.phenotypic.rawdata.2024.txt
1.71 MB
-
SIData22.root.rot.phenotypic.summary.oct2025.txt
74.02 KB
-
SIData23.ahmm.genotypes.qtl.nov2024.csv
8.91 MB
-
SIData24.ahmm.qtl.gmap.nov2024.csv
26.57 KB
-
SIData25.qtl.effects.all.oct25.v4.csv
137.84 KB
-
SIData26.blight.prr.qtl.gwas.intervals.metadata.oct2025.v4.csv
13.90 KB
-
SIData27.blight.root.rot.gwas.pvalues.effects.oct25.txt
12.39 MB
-
SIData28.tacf.blight.root.rot.height.blups.inoculated.11.1.2025.txt
802.74 KB
-
SIData29.chestnut.genomic.selection.simulation.outputs.txt
12.06 MB
-
SIData3.tacf.castanea.hybrid.species.ancestry.txt
559.94 KB
-
SIData4.gmbigxhorn.genetic.map.txt
55.02 KB
-
SIData5.blockCoords.qtl.gwas.oct25.txt
3.18 MB
-
SIData6.tacf.pedigree.2024.txt
463.46 KB
-
SIData7.progeny.blight.resistance.data.txt
122.04 KB
-
SIData8.scrivener.ssa.2025.csv
16.17 KB
-
SIData9.darling54.blight.height.phenotypes.purdue.txt
42.10 KB
Abstract
Over a century after two introduced pathogens killed billions of American chestnuts, introgression of pre-adapted resistance alleles from Chinese chestnuts has contributed to recovering of self-sustaining populations. However, progress has been slow because of the complex genetic architecture of resistance. To better understand blight resistance, we compared reference genomes, gene expression responses, and stem metabolite profiles of the resistant Chinese and susceptible American chestnut species. To accelerate resistance breeding, we conducted large-scale phenotyping and genotyping in hybrids of these species. Simulation and inoculation experiments suggest that significant resistance gains are possible through selectively breeding trees with an average of 70% to 85% American chestnut ancestry. The resources developed here are foundational for breeding to create diverse restoration populations with sufficient disease resistance and competitive growth.
https://doi.org/10.5061/dryad.4xgxd25mj
Description of the data and file structure
Supplementary Data 1 | Blight and growth phenotypes for 5.5k trees
Supplementary Data 2 | Genotyping data at ~93k SNPs from 5,003 trees in variant calling format (VCF)
Supplementary Data 3 | Hybrid ancestry estimates for ~5k genotyped trees
Supplementary Data 4 | Genetic maps for the ‘GMBig x Horn’ C. dentata full sib family
Supplementary Data 5 | Genome (‘Ellis-1’) coordinates for C. dentata and C. mollissima ancestry in hybrids
Supplementary Data 6 | Pedigree for trees phenotyped in this study
Supplementary Data 7 | Seedling blight resistance data for ~1k backcross progeny
Supplementary Data 8 | Seedling blight resistance data for LSAC progeny
Supplementary Data 9 | Blight resistance and height data for ~500 ‘Darling 54’ progeny
Supplementary Data 10 | A bed file with orthogroup and tandem array information for each gene
Supplementary Data 11 | Pan-genome sets for C. dentata and C. mollissima genomes in ‘Ellis’ coordinates
Supplementary Data 12 | Syntenic block breakpoints
Supplementary Data 13 | Chestnut blight timecourse RNA-seq metadata
Supplementary Data 14 | Chestnut blight timecourse RNA-seq count matrix for *C. dentata *
Supplementary Data 15 | Chestnut blight RNA-seq count matrix for C. mollissima
Supplementary Data 16 | Species- and allele-specific gene expression responses to Cr. parasitica.
Supplementary Data 17 | Blight-responsive orthogroups with higher copy number and expression upregulation in resistant C. mollissima v. susceptible C. dentata
Supplementary Data 18 | KEGG pathway enrichment among single copy orthologs with species-specific and common expression responses to chestnut blight fungal inoculation
Supplementary Data 19 | Secondary metabolites present at higher concentrations in the stems of resistant C. mollissima v. susceptible C. dentata
Supplementary Data 20 | Cryphonectria parasitica fungal growth in the presence of metabolites that are at higher concentrations in resistant C. mollissima stems v. susceptible C. dentata stems
Supplementary Data 21 | Root rot resistance phenotypes for ~27 k open-pollinated progeny of C. dentata backcross hybrids, resistant C. mollissima, and susceptible C. dentata controls
Supplementary Data 22 | Family means for root resistance used for genetic analyses
Supplementary Data 23 | C. dentata v. C. mollissima ancestry calls used for QTL mapping
Supplementary Data 24 | Genetic map used for QTL mapping
Supplementary Data 25 | Effect of inheriting a C. mollissima allele on blight resistance and root rot survival across 641 loci used in QTL analysis
Supplementary Data 26 | Coordinates and effect sizes for blight and root rot QTL intervals and GWAS peaks
Supplementary Data 27 | P-values from genome wide associations studies of blight and root rot resistance
Supplementary Data 28 | Breeding values for blight and root rot resistance estimated with single-step GBLUP
Supplementary Data 29 | Genomic estimated breeding values for blight resistance, root rot resistance, and height growth from simulation of progeny genotypes
Files and variables
File: SIData1.blight.phenotypes.all.oct2025.v4.txt
Description: Blight and growth phenotypes. Blight presence/absence phenotypes were coded into 0 (susceptible) and 1 (resistant) phenotypic classes.
Variables
- shortcode: tree id
- mom: mother of tree
- dad: father of tree
- mainstemalive: is the main stem alive? (1 = yes, 0 = no)
- largecankers: are large cankers present? (1 = no, 0 = yes)
- blightcontained: are the cankers expanded beyond initial margins? (1 = no, 0 = yes)
- exposedwood: are most cankers > 15 cm in length? (1 = no, 0 = yes)
- sunkenswollen: are cankers generally sunken (0), swollen (1), or flat (2)?
- sporulation: are cankers forming fruiting bodies? (1 = no, 0 = yes)
- stumpsprouts: are stump sprouts present (1 = no, 0 = yes)
- propcanopyalive: proportion of canopy that is alive
- dbhlargest_cm: diameter at breast height of the largest living stem in cm
- height_m: height of the tallest living stem in meters
- chapter: chapter of the American Chestnut Foundation that grew the tree
- parcel: property name where the tree is growing
- orchard: orchard name where the tree is growing
- inoculated: inoculated with C. parasitica (1 = yes, 0 = no)
- plantdate: date that the tree was planted
- obsdate: date that the tree was phenotyped
- age: age in years that the tree was phenotyped
- tmax_c: average maximum monthly temperate in degrees celsius
- tmin_c: average minimum monthly temperature in degrees celcius
- prec_mm: average monthly precipitation in mm
- ph: average soil pH down to 30 m depth interpolated from USGS soil survey data
- sand: average % sand in soil down to 30 m depth interpolated from USGS soil survey data
- height_m_adj: residual variation in tree height after adjusting for the effects of age, inoculated, tmax_c, prec_mm, sand, ph,
- blightresistanceindex: sum of mainstemalive, largecankers, blightcontained, exposedwood, sunkenswollen, sporulation, & stump sprouts after correcting phenotypes for average effects of inoculated, age, tmax_c, prec_mm, ph, & sand (see methods)
- dentata_ahmm: C. dentata ancestry estimated from ancestry hmm (if NA, tree was not genotyped)
- mollissima_ahmm: C. mollissima ancestry estimated from ancestry hmm
- crenata_ahmm: C. crenata ancestry estimated from ancestry hmm
- pumila_ahmm: C. pumila ancestry estimated from ancestry hmm
- resistance source: C. mollissima or parent from other species (e.g. C. crenata or LSAC) that contributed resistance to progeny
- generation: hybrid or backcross generation
- gwas: TRUE = included in blight resistance genome wide association study
- clapper_blight_qtl: TRUE = included in ancestry based QTL analysis for Clapper descendants
- graves_blight_qtl: TRUE = included in ancestry based QTL analysis for Graves descendants
- dentataparent: last known wild type C. dentata parent
- h2: TRUE = included in estimation of heritability
- gblup.xval: TRUE = included in cross validations to estimate genomic selection accuracy
- ssgblup: TRUE = included in estimation of breeding values using the single step GBLUP method
- survival: whether or not tree was alive or dead at time of study
- survival.obs.year: last year that survival was observed
File: SIData2.tacf.gbs.genotypes.imputed.filtered.vcf.gz
Description: Genotyping-by-sequencing data for 93,333 SNPs in 5,003 trees in variant calling format (VCF). After filtering to sites with < 20% missing genotypes, missing genotypes were imputed in Beagle.
File: SIData3.tacf.castanea.hybrid.species.ancestry.txt
Description: Hybrid species ancestry inferences for ~5 k trees from Ancestry_HMM
Variables
- shortcode: tree id
- dentata_ahmm: C. dentata ancestry proportion
- mollissima_ahmm: C. mollissima ancestry proportion
- crenata_ahmm: C. crenata ancestry proportion
- sativa_ahmm: C. sativa ancestry proportion
- pumila_ahmm: C. pumila ancestry proportion
- henryi_ahmm: C. henryi ancestry proportion
- contributor: chapter that contributed the sample
- new_generation: backcross generation that was corrected for samples with ancestry that deviated from original pedigree expectations.
- resistancesource: source of resistance
- vcf.header.gbs: original sample id in the vcf (ignore)
File: SIData4.gmbigxhorn.genetic.map.txt
Description: Genetic maps for the ‘GMBig x Horn’ C. dentata full sib family. Contains genetic map positions in centiMorgans and physical coordinates in the ‘Ellis’ genome for each parent from this cross.
Variables
- map: separate maps were created for marker sets that were segregating in the GMBig or Horn parents
- chr: chromosome
- mb: physical position in the 'Ellis' genome in megabases
- cm: genetic position in centiMorgans
File: SIData5.blockCoords.qtl.gwas.oct25.txt
Description: Ellis-1 genome coordinates of ancestry tracts estimated with Ancestry_HMM for C. dentata hybrids
Variables
- shortcode: tree id
- chr: chromosome
- block.id: id of the ancestry block
- call: A - homozygous C. dentata ancestry, H - heterozygous C. mollissima and C. dentata ancestry, C - homozygous for C. mollissima ancestry
- start: start coordinate for ancestry block
- end: end coordinate for ancestry block
- n.markers: number of markers contained in ancestry block
File: SIData6.tacf.pedigree.2024.txt
Description: Multi-generation pedigree for all trees phenotyped in this study. Includes maternal and paternal parents for multiple generations of hybrid trees in The American Chestnut Foundation breeding program.
Variables
- shortcode: name of tree
- mom: mother of tree
- dad: father of tree
- generation: hybrid or backcross generation
- genorder: variable to order the pedigree such that early generations are first
- source: name of the original source of resistance
- resparent: whether mother, father, both, or neither parent contributed resistance to progeny
- dentataparent: id of the last known C. dentata parent
File: SIData7.progeny.blight.resistance.data.txt
Description: Progeny blight resistance phenotypic data. Seedling blight resistance phenotypes for ~1 K progeny from 30 full or half sib families
Variables
- treeid: individual tree
- cross: cross from which tree was derived
- mom: id of tree's mother
- dad: id of tree's father
- cross.type: BC x F1 = backcross tree crossed with C. dentata x C. mollissima F1 hybrid, BC x BC = intercross between two backcross trees, BC x op = open pollinated progeny of a backcross tree, LSA x LSA = intercross between large surviving American chestnuts, mollissima = resistant C. mollissima controls, F1 = C. dentata x C. mollissima, dentata = susceptible C. dentata controls.
- site: site where trees were inoculated (University of Tennessee Chattanooga or Berry College, GA)
- orangezone_mm: length of orange zone of the canker in mm
- oz.bins: canker length quartile
- spores: whether or not canker was sporulating (0 = no, 1 = yes)
- rating: canker length quartile + 1 if spores present
- ht_cm: height of the seedling before inoculation in cm
- scaled.orange.zone: orange zone scaled from 0 = C. dentata mean to 100 = C. mollissima mean.
- scaled.rating: rating scaled from 0 = C. dentata mean to 100 = C. mollissima mean.
File: SIData8.scrivener.ssa.2025.csv
Description: Blight resistance data from open pollination progeny from large surviving American chestnuts planted in the Scrivener orchard in Maryland
Variables
- cross: description of cross (op = open pollinated)
- letter: shortcode for cross
- tag: greenhouse id for tree
- inoculated: 1 = tree was inoculated with the highly virulent EP155 strain of Cr. parasitica.
- height_mm: height of tree in mm at time of inoculation
- site: site where trees were inoculated (University of Tennessee Chattanooga or Berry College, GA)
- survival: Y = main inoculated stem alive at 60 days post inoculation
- full_zone_mm: full length (in millimeters) of the necrotic tissue around the Cr. parasitica inoculation site
- orange_zone_mm: length of the orange coloration within the canker
- sunken: whether canker was sunken, swollen, or flat
- rating: cankers were rated on a scale of 1 (inoculation failure), 2 (minimal canker expansion within 5 mm of inoculation site), 3 (large canker with no sporulation), 4 (large, sunken, sporulating canker), and 5 (tree dead with large cankers).
- basal_canker: presence of natural blight infection
- spores: whether or not canker was sporulating (0 = no, 1 = yes)
- rating: canker length quartile + 1 if spores present
- ht_cm: height of the seedling before inoculation in cm
- scaled.orange.zone: orange zone scaled from 0 = C. dentata mean to 100 = C. mollissima mean.
- scaled.rating: rating scaled from 0 = C. dentata mean to 100 = C. mollissima mean.
File: SIData9.darling54.blight.height.phenotypes.purdue.txt
Description: Canker length, canker severity ratings, and height on ~500 3 year old T3 OxO+ and OxO- full sibling progeny of 'Darling 54' from a field trial near Purdue University
Variables
- mom: female (wild type) C. dentata parent
- dad: male T2 'Darling 54' parent
- family: mom x dad
- year: year trees were inoculated
- oxo: 1 = OxO+ and 0 = OxO-
- row: row within plot
- tree: tree number within row
- select: 1 = high resistance and vigor
- ht: total tree height in cm
- gld: ground line diamter in mm
- inoc: 1 = inoculated, 0 = uninoculated
- len: canker length in mm
- width: canker width in mm
- rating: canker severity rating (5 = large, sunken, and sporulating to 1 = canker contained and superficial)
- sporulation: 1 = Cryphonectria parasitica fruiting bodies present or 0 = absent
- natcanker: 1 = presence on natural canker in additional to the canker from inoculation
- dead: 1 = tree dead
File: SIData10.comb.bed.txt
Description: Combined bed-like annotation file. This contains syntenic orthogroup information, as produced by GENESPACE. The standard bed file with a set of additional fields presenting orthogroup and tandem array information for each gene.
Variables
- chr: chromosome where the gene is located
- start: Start position of the gene on the chromosome
- end: fnd position of the gene on the chromosome
- id: Unique gene name
- ofID: orthofinder ID, a unique identifier for orthologous groups.
- pepLen: length of the peptide sequence associated with the feature
- ord: order of the genome within the pangenome
- genome: name of the genome to which the gene belongs
- arrayID: identifier for the array or cluster of related genes
- isArrayRep: Boolean indicating whether the feature is a representative of its array
- globOG: Global ortholog group identifier, representing orthologs across multiple genomes
- globHOG: Global hierarchical ortholog group identifier, representing a higher-level grouping of orthologs
- noAnchor: boolean indicating whether the feature lacks an anchor point in synteny analysis
- og: ortholog group identifier, representing a set of orthologous genes
File: SIData11.dentata.pangenes.txt
Description: GENESPACE pan-gene sets against the Ellis coordinate system. Pangenes file presenting the interpolated syntenic position of each syntenic phylogenetically hierarchical orthogroup.
Variables
- ofID: refers to OrthoFinder ID, which is assigned to genes and proteins by OrthoFinder
- pgID: pan-genome ID, a unique identifier for each entry in the pan-genome annotation
- interpChr: interpolated chromosome, referring to the inferred chromosomal location of a gene based on syntenic relationships
- interpOrd: the inferred gene order within a chromosome
- pgRepID: pan-genome representative ID, identifying the representative gene for a group of related genes in the pan-genome
- genome: the specific genome to which a gene belongs
- og: orthogroup, a group of genes descended from a single gene in the last common ancestor of all the species being considered
- flag: a marker indicating specific attributes of a gene, such as syntenic relationships or array membership
- id: gene identifier, unique within each genome
- chr: chromosome on which the gene is located
- start: start position of the gene on the chromosome
- end: end position of the gene on the chromosome
- ord: refers to the ordinal position or rank of the gene within its genome or chromosome
File: SIData12.combinedWindowedBlockCoordinates.txt
Description: Syntenic block breakpoints. Syntenic blocks were inferred with DEEPSPACE using default parameters except as noted in the methods. This is a standard .paf file except that the first field contains the DEEPSPACE file identifier and the last two fields contain the identifier for the query (windows) and target genomes.
Variables
- file: names of pairwise mApping Format (PAF) file
- qname: query sequence name
- qlen: query length
- qstart: query start
- qend: query end
- strand: + or -
- tname: target sequence name
- tlen: target length
- tstart: target start
- tend: target end
- nmatch: number of matching bases
- alen: alignment length
- mapq: a score that indicates the confidence level of a read's alignment to a reference genome
- windows: query genome
- target: target genome
File: SIData13.chestnut.rnaseq.metadata.v2.txt
Description: RNA-seq metadata. Contains species, tissue, and treatment information for samples used for genome annotation and *C. parasitica *gene expression responses.
Variables
- lib: id of RNA seq library (libraries sequenced across two flow cells are denoted with 1 and 2)
- species: indicated whether sequences came from Castanea mollissima, Castanea dentata, or C. mollissima x C. dentata F1 hybrids
- genotype: name of tree sequenced
- tissue: tissue type sequenced
- timepoint: days post inoculation (dpi) for stem samples that were inoculated with C. parasitica
- treatment: denotes whether samples were uninocuated, wounded only, or inoculated with C. parastica
- R1R2.size(GB): Gibabases of sequence obtained for each library
- read_1: id of library for read 1 of paired end sequences
- read_2: id of library for read 2 of paired end sequence
File: SIData14.dentata.blight.rnaseq.count.matrix.txt
Description: transcript count matrix for C. parasitica inoculation timecourse aligned to the C. dentata Ellis genome.
Variables
- rows - transcript IDs from Ellis genome. Genomic features format files and annotation information for these transcripts can be downloaded from Phytozome
- columns - library IDs - genotype, treatment, and timepoint information associated with library IDs can be accessed from SIData1.
File: SIData15.mollissima.blight.rnaseq.count.matrix.txt
Description: transcript count matrix forC. parasitica inoculation timecourse aligned to the C. mollissima Mahogany Hap1 genome.
Variables
- rows - transcript IDs from Mahogany Hap1 genome. Genomic features format files and annotation information for these transcripts can be downloaded from Phytozome
- columns - library IDs - genotype, treatment, and timepoint information associated with library IDs can be accessed from SIData1.
File: SIData16.blight.rnaseq.within.between.species.allele.contrasts.3d.10d.txt
Description: Species and allele specific expression responses to C. parasitica. Log2fold change and P-values for species- and allele-specific expression responses to C. parasitica inoculation.
Variables
- gene.id: unique gene identifier
- baseMean: The average of the normalized count values across all samples. It represents the overall expression level of a gene, accounting for sequencing depth but not gene length.
- log2FoldChange: The logarithm (base 2) of the fold change in expression between the comparison and control groups. It indicates the magnitude and direction of differential expression.
- lfcSE: Standard error of the log2FoldChange estimate
- stat: Wald statistic, which is the log2FoldChange divided by its standard error. It measures the significance of the fold change
- pvalue: raw p-value from the Wald test, indicating the probability of observing such an extreme fold change by chance
- padj: The adjusted p-value, corrected for multiple testing using the Benjamini-Hochberg method to control the false discovery rate. This value should be used to determine statistically significant differential expression.
- species: indicates whether contrast is within C. dentata (dentata), C. mollissima (mollissima), between C. mollissima and C. dentata (mollissima - dentata), within C. dentata alleles in F1s (dentata ase), within C. mollissima alleles in F1s (mollissima ase), or between C. mollissima and C. dentata alleles in F1s (mollissima - dentata ase)
- timepoint: indicates whether contrast was made at 3 or 10 days post inoculation (dpi)
- contrast: type of expression contrast including inoculation - wound only treatment (within species), inoculation - wound only treatment (between species), inoculation - wound only treatment (within allele) in F1s, and inoculation - wound only treatment (between alleles) in F1s
File: SIData17.mol.v.den.copy.number.expansion.gene.expression.txt
Description: Comparison of gene expression responses between C. mollissima and C. dentata among orthogroups that with greater copy number in C. mollissima
Variables
- og: orthogroup from C. mollissima Mahogany Hap1 pangenome
- mol.copy.n: average copy number among all four C. mollissima genome assemblies
- den.copy.n: copy number in the Ellis C. dentata genome
- tpm.den.3d: transcripts per million for the orthogroup in the C. parasitica-inoculated C. dentata samples 3 days post inoculation
- tpm.den.3d: transcripts per million for the orthogroup in the C. parasitica-inoculated C. mollissima samples 3 days post inoculation
- tpm.diff.3d: differences (Cmol - Cden) in orthogroup TPM at 3 dpi
- p.adj.3d: P-value for the t-test between Cmol - Cden tpm contrast adjusted with the Benjamini-Hochberg false discovery rate.
- tpm.den.10d: transcripts per million for the orthogroup in the C. parasitica-inoculated C. dentata samples 3 days post inoculation
- tpm.den.10d: transcripts per million for the orthogroup in the C. parasitica-inoculated C. mollissima samples 3 days post inoculation
- tpm.diff.10d: differences (Cmol - Cden) in orthogroup TPM at 3 dpi
- p.adj.10d: P-value for the t-test between Cmol - Cden tpm contrast adjusted with the Benjamini-Hochberg false discovery rate.
- id: C. mollissima mahogany hap1 genes from orthogroup that were significantly upregulated in response to C. parasitica.
- Best-hit-arabi-defline: annotation of ortholog in Arabidopsis thaliana
- Best-hit-rice-defline: annotation of ortholog in rice (Oryza sativa)
File: SIData18.mollissima.dentata.degs.kegg.enrichment.txt
Description: Enrichment of KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways among genes specifically upregulated in C. mollissima 3 or 10 days post inoculation with C. parasitica
Variables
- pathway.code: KEGG pathway in Arabidopsis
- pathway.name: name of pathway
- Annotated: number of orthologs in pathway that were specifically upregulated C. mollissima
- TotalGenes: total number of genes in pathway
- arabidopsis_genes: Arabidopsis orthologs to genes specifically upregulated C. mollissima
- mollissima_genes: C. mollissima Mahogany Hap1 genes in pathway that were uniquely upregulated
- timepoint: 3 or 10 days post inoculation (dpi)
- p.adjust: Benjamini-Hochberg false discovery rate to control for false positives from multiple testing
File: SIData19.chestnut.metabolite.summary.oct25.txt
Description: Metabolites at higher concentrations in the stems of resistant C. mollissima (n = 8) v. susceptible C. dentata (n = 11)
Variables
- Metabolite: name of metabolite. Unknowns are designated by their retention times and key mass-to-charge ratios.
- cden_avg: mean concentration in ug/g dry weight in C. dentata stems
- cden_sem: standard error on cden_avg
- cmol_avg: mean concentration in ug/g dry weight in C. mollissima stems
- cmol_sem: standard error on mollissima_mean
- cmol_cden_fc: cmol_avg/cden_avg
- cmol_cden_p: t-test P-value for cmol_avg - cden_avg
- hyb_avg: mean concentration in ug/g dry weight in C. mollissima x C. dentata F1 stems
- hyb_sem: standard error on hyb_avg
- hyb_cden_fc: hyb_avg/den_avg
- hyb_cden_p: t-test P-value for hyb_avg - cden_avg
- cden_wsw1_26 ..... hyb_vso_88: concentration (ug/g dry weight) of metabolite in a specific sample where sample ID = species.tree.sample number. Species include Castanea dentata ("cden"), Castanea mollissima ("cmol"), or C. dentata x C. mollissima F1 hybrid ("hyb").
- cmol: C = Cropper, Q = Qing (both provided by Greg Miller/Empire Chestnut ~2006-2009)
- hyb: K = KL-BC1, G = GR68-B1 (both backcross-1 generation, I believe both were provided by TACF/Sara via Chuck Maynard ~2006-2009)
- hyb: V = Luvall's Monster (complex commercial hybrid)
- cden: W = WB275-27, Z = Zoar, L = Lasdon all three origins/sources described in https://apsjournals.apsnet.org/doi/abs/10.1094/PDIS-01-13-0047-RE
- Other parts of the sample names indicate the tissue type, site, and replicate number. Thus, sample WSO1 is WB275, Stem, Open field, Replicate 1.
All trees were grown in Syracuse NY, and ranged in age from 2-7 years at the time of sample collection.
Samples were collected in July-August 2013 with sterilized hand pruners.
Tissue was woody stem segments from the previous year's growth (not fresh green tissue), approx 3-7mm diameter.
File: SIData20.fungal.growth.metabolites.rawdata.txt
Description: growth of C. parasitica in the presence of metabolites present at higher concentrations in C. mollissima v. C. dentata bark
Variables
- treatment: metabolite in PDA growth medium
- concentration: concentration of metabolite
- solvent: solvent used for metabolite
- rep: replicate number for each metabolite treatment
- area_2d: fungal area 2 days after culture initiation
- area_4d: fungal area 4 days after culture initiation
- area_6d: fungal area 6 days after culture initiation
- area_8d: fungal area 8 days after culture initiation
- area_10d: fungal area 8 days after culture initiation
File: SIData21.tacf.root.rot.phenotypic.rawdata.2024.txt
Description: Root rot resistance phenotypes. Survival and root lesion severity data after Phytophthora cinnamomi inoculation for ~27 k open pollinated progeny of C. dentata backcross hybrids, resistant C. mollissima, & susceptible C. dentata controls.
Variables
- mother: mother of progeny screened for root rot resistance
- father: father of progeny screened for root rot resistance
- source: founder tree that potentially contributed root rot resistance to the progeny
- id: seedling id
- mom_generation: generation of mother
- progeny_generation: generation of progeny
- rating: 0 = no lesions on roots, plant healthy; 1 = minimal lesions limited to secondary roots; 2 = any lesions on the tap root or extensive lesions on secondary roots; 3 = severe root rot, plant dead
- mortality: 1 = dead, 0 = alive at end of trial
- site: site of screening
- year: year of screening
- tub: tub where trees were screened (relevant for Chestnut Return Farm, where families were screened in a randomized complete block)
- isolate: P. cinnamomi isolate used for inoculation
File: SIData22.root.rot.phenotypic.summary.oct2025.txt
Description: family means for root rot resistance among 662 open pollinated families
Variables
- mother: mother of progeny screened for root rot resistance
- nmort: number of progeny phenotyped for mortality after P. cinnamomi inoculation
- mort.adj: mortality proportion after adjusting for average effect of year, site, and strain of P. cinnamomi
- nrate: number of progeny phenotyped for lesion severity rating
- rate.adj: average lesion severity rating after accounting for the average effect of year, site, and strain of P. cinnamomi
- subjectpargen: generation of mother tree
- progenygeneration: generation of progeny that were inoculated with P. cinnamomi
- source: source of resistance for the progeny
- nphenotyped: number of progeny phenotyped
- survprop: raw survival proportion (with no adjustment for covariates)
- avglesion rating: family average lesion rating (with no adjustment for covariates)
- ssgblup: TRUE - breeding value estimated with single step GBLUP
- dentata_ahmm: C. dentata ancestry proportion of the mother tree
- mollissima_ahmm: C. mollissima ancestry proportion of the mother tree
- crentata_ahmm: C. crenata ancestry proportion of the mother tree
- h2: TRUE- tree included in heritability estimation
- gwas: TRUE - tree included in genome wide association study of root rot survival
- gblup.xval: TRUE- tree included in cross validations to estimate genomic prediction accuracy
- graves_rootrot_qtl: TRUE- tree included in QTL analysis for the Graves source of resistance
File: SIData23.ahmm.genotypes.qtl.nov2024.csv
Description: genotype matrix of ancestry calls from Ancestry HMM that was subset to perform QTL analyses. Individuals are in rows, markers (Chr_position in Ellis coordinates) are in columns. Genotype calls are AA- homozygous for C. dentata ancestry, AC- heterozygous C. dentata and C. mollissima ancestry, and CC-homozygous for C. mollissima ancestry.
File: SIData24.ahmm.qtl.gmap.nov2024.csv
Description: genetic map used for ancestry-based QTL analysis
Variables
- marker: chromosome and physical position in Ellis-1 genome
- chr: chromosome
- pos: genetic position in centimorgans
File: SIData25.qtl.effects.all.oct25.v4.csv
Description: Effect of inheriting a C. mollissima allele at 641 markers used in ancestry-based blight and and root rot resistance QTL mapping
Variables
- marker: chromosome and physical position in the 'Ellis-1' genome
- chr: chromosome
- gpos: physical position in the 'Ellis-1' genome.
- source: donor parent for resistance including the 'Clapper' BC1 tree and the 'Graves' F1 tree
- mol.effect: average effect of inheriting a C. mollissima allele compared to inheriting a two alleles from C. dentata (i.e. AC - AA)
- trait: trait used in QTL analysis including the blight resistance index or root rot survival
File: SIData26.blight.prr.qtl.gwas.intervals.metadata.oct2025.v4.csv
Description: Blight and root rot resistance quantitative trait locus (QTL) and interval and genomewide association study (GWAS) metadata. Peak and border coordinates and effect sizes for blight and phytophthora root rot (PRR) QTL intervals across the C. dentata and C. mollissima genomes.
Variables
- disease: disease for which resistance QTL was discovered (blight or root rot)
- trait: trait for which QTL was discovered
- qtl: identifier for QTL
- source: source of resistance
- generation: generation of QTL mapping population
- pop: identifier of population used for QTL mapping
- study: study in which QTLs were reported (current study, Fan et al. 2024 or Zhebentyayeva et al. 2019)
- n: number of progeny phenotyped for QTL discovery
- chr: chromosome identifier
- peak.dentata: position of QTL peak in the 'Ellis-1' genome
- start.dentata: position of QTL left border in 'Ellis-1' genome. For GWAS, estimated as distance at which linkage disequilibrium r2 values between peak and surrounding markers decayed to < .2
- end.dentata: position of QTL right border in the 'Ellis-1' genome. For GWAS, estimated as distance at which linkage disequilibrium r2 values between peak and surrounding markers decayed to < .2
- widthMb: width of the QTL (right - left border) in megabases
- lod.or.P: logarithm of the odds that QTL position has zero effect or P-value for association
- PVE: percent of phenotypic variance explained by the QTL calculated as** *100[1 - 10^(-2/N LOD)]
- effect: effect of inheriting a C. mollissima allele (QTL) or alternative allele (GWAS) on the phenotype
- chfreq.or.maf: frequency of C. mollissima alleles across the mapping population (QTL) or minor allele frequency (GWAS)
- moll.mah.hap1.peak: position of QTL peak in the 'Mahogany' hap1 genome
- moll.mah.hap1.start: position of QTL left border in the 'Mahogany' hap1 genome. For GWAS it was assumed peak - 73 kb corresponding rate of LD decay to r2 < 0.2
- moll.mah.hap1.end: position of QTL right border in the 'Mahogany' hap1 genome. For GWAS it was assumed peak + 73 kb corresponding rate of LD decay to r2 < 0.2
- moll.mah.hap2.peak: position of QTL peak in the 'Mahogany' hap2 genome.
- moll.mah.hap2.start: position of QTL left border in the 'Mahogany' hap2 genome. For GWAS it was assumed peak - 73 kb corresponding rate of LD decay to r2 < 0.2
- moll.mah.hap2.end: position of QTL right border in the 'Mahogany' hap2 genome. For GWAS it was assumed peak + 73 kb corresponding rate of LD decay to r2 < 0.2
- moll.nan.hap1.peak: position of QTL peak in the 'Nanking' hap1 genome
- moll.nan.hap1.start: position of QTL left border in the 'Nanking' hap1 genome. For GWAS it was assumed peak - 73 kb corresponding rate of LD decay to r2 < 0.2
- moll.nan.hap1.end: position of QTL right border in the 'Nanking' hap1 genome. For GWAS it was assumed peak + 73 kb corresponding rate of LD decay to r2 < 0.2
- moll.nan.hap2.peak: position of QTL peak in the 'Nanking' hap2 genome
- moll.nan.hap2.start: position of QTL left border in the 'Nanking' hap2 genome. For GWAS it was assumed peak - 73 kb corresponding rate of LD decay to r2 < 0.2
- moll.nan.hap2.end: position of QTL right border in the 'Nanking' hap2 genome. For GWAS it was assumed peak + 73 kb corresponding rate of LD decay to r2 < 0.2
- analysis: QTL or GWAS
File: SIData27.blight.root.rot.gwas.pvalues.effects.oct25.txt
Description: P-values and effect sizes from genome wide association studies of blight and root rot resistance.
Variables
- SNP: Marker identifier (chromosome_position)
- Chromosome: Chromosome number
- Position: Physical position of the SNP on the chromosome (base pairs, 1-based coordinates)
- P.value: Raw association P-value from BLINK GWAS linear model for marker-trait test
- maf: Minor allele frequency for the SNP in the analyzed population
- nobs: number of individuals including in the GWAS analysis
- effect: estimated allelic substitution effect for inheriting the alternative allele
- Example: 2.10116386
- P.adj: Benjamini-Hochberg adjusted P-value for multiple-testing correction
- trait: Trait measured for association (e.g., blight resistance index)
File: SIData28.tacf.blight.root.rot.height.blups.inoculated.11.1.2025.txt
Description: Estimated breeding values for blight resistance, root rot resistance, and tree height. Breeding values for blight and root rot resistance indices were scaled from 0 (C. dentata mean) to 100 (C. mollissima mean). Resistance data were merged with hybrid species ancestry estimates from Ancestry_HMM
Variables
- shortcode: tree id
- blightindex: breeding value for the blight resistance index scaled from 0 = C. dentata mean to 100 = C. mollissima mean. To create the blight resistance index, phenotypes were adjusted for fixed effects, residuals were weighted by trait heritability, and the following variables were summed: ‘mainstemalive’, ‘largecankers’, ‘blightcontained’, ‘exposedwood’, ‘sporulation’, ‘sunkenswollen’, and ‘stumpsprouts’.
- blightindex_accuracy: estimated accuracy of breeding value for breeding values for the blight resistance index. Accuracy was calculated from standard errors (SE) of breeding values as sqrt(1- [SE^2]/var(g)), where var(g) is the estimated genetic variance.
- phenotyped.blight: boolean indicating whether tree was phenotyped for blight resistance.
- heightblup: breeding values for tree height in meters
- heightblup_accuracy: estimated accuracy of breeding value for breeding values for tree height index. Accuracy was calculated from standard errors (SE) of breeding values as sqrt(1- [SE^2]/var(g)), where var(g) is the estimated genetic variance.
- phenotyped.height: boolean indicating whether tree was phenotyped for height
- prr.survival.blup.scaled: breeding value for root rot survival scaled from 0 = C. dentata mean to 100 = C. mollissima mean
- prr.survival.blup.accuracy: estimated accuracy of breeding values for prr survival
- prr.phenotyped: boolean indicating whether tree was phenotyped for prr resistance.
- dentata ahmm: C. dentata ancestry proportions estimated in Ancestry_HMM
- mollissima_ahmm: C. mollissima ancestry proportions estimated in Ancestry_HMM
- crenata_ahmm: C. crenata ancestry proportions estimated in Ancestry_HMM
- sativa_ahmm: C. sativa ancestry proportions estimated in Ancestry_HMM
- pumila_ahmm: C. pumila ancestry proportions estimated in Ancestry_HMM
- mom: mother of tree
- dad: father of tree
- generation: backcross generation
- dentataparent: last known C. dentata parent
- latitude: latitude of last known C. dentata parent
- longitude: longitude of last known C. dentata parent
- source: original source of resistance
- chapter: chapter where tree originated
- parcel: parcel where tree was grown
- inoculated: whether tree was inoculated (1) or not (0)
- plantdate: date that tree was planted
- obsdate: date that tree was phenotyped
- age: age in years that tree was phenotyped
- seedzone: breeding zone from Sandercock et al. 2024.
- survival: last observation of whether the tree was alive or dead
- survival.obs.year: year when survival was last recorded on tree
File: SIData29.chestnut.genomic.selection.simulation.outputs.txt
Description: Predicted blight resistance, root rot resistance, height growth, and C. dentata ancestry after one generation of controlled pollinations and genomic selection
Variables
- shortcode: tree id
- generation: unselected - not selection, first selected - selected progeny, current unselected: trees from current generation
- type: selection type (blightheight - selection for blight resistance and height growth, blightheightrootrot - selection for blight resistance, root rot resistance, and height growth.
- replicate: simulation replicate where different parents that met selection criteria were randomly selected for controlled pollinations (inoculated reps - both parents were inoculated with Cr. parasitica, inoculated + naturally infected - parents either inoculated or naturally infected)
- variable - blight resistance, height growth, root rot resistance, C. dentata ancestry
- value - genomic estimated breeding values or C. dentata ancestry for tree
Files: Original photos of 'Darling 54' progeny
- OxO_pos_2024_R2T57.jpg
- OxO_pos_2024_R2T5.jpg
- OxO_pos_2024_R1T53.jpg
- OxO_pos_2024_R1T45.jpg
- OxO_pos_2024_R1T29.jpg
- OxO_pos_2023_R10T10.jpg
- OxO_pos_2023_R7T13.jpg
- OxO_pos_2023_R1T8.jpg
- OxO_neg_2024_R1T14.jpg
- OxO_neg_2024_R1T6.jpg
Description: Original photos of Darling 54 progeny from Fig S5. ID (OxO_pos = inherited the oxalate oxidase gene, OxO_neg = did not inherit the oxalate oxidase gene, 2023 or 2024 - year that three year old cohorts were inoculated with Cr. parasitica, RxTx = row and tree number of the tree. Raw canker data can be accessed in SIData9.
Access information
Other publicly accessible locations of the data:
- Raw PacBio and Illumina sequence data + RNA seq data for reference genomes can be accessed via NCBI bioproject PRJNA1147634
- Ellis, Mahogany, and Nanking genomes can be accessed on Phytozome
- Raw fastq from files from genotyping-by-sequencing can be accessed through NCBI under bioproject PRJNA507748
