Genetic architecture underlying response to the fungal pathogen Dothistroma septosporum in lodgepole pine, jack pine, and their hybrids
Data files
Feb 03, 2025 version files 231.97 MB
-
all_info.txt.zip
527 B
-
fil_mask_MAD_rnd.input.zip
4.07 MB
-
introgress_input.zip
52.78 KB
-
long_key.txt.zip
460 B
-
majorAF_rnd.input.zip
6.01 MB
-
manhattan_input.zip
5.67 MB
-
pl_1_input.tar.gz
36.14 MB
-
pl_2_input.tar.gz
36.26 MB
-
pl_LD.zip
67.80 MB
-
popool_fst_rnd.txt.zip
8.19 MB
-
pxj_1_input.tar.gz
16.68 MB
-
pxj_2_input.tar.gz
16.68 MB
-
pxj_LD.zip
34.42 MB
-
README.md
4.14 KB
Abstract
In recent decades, Dothistroma needle blight (DNB), a pine tree disease caused by the fungal pathogen Dothistroma septosporum, has severely damaged lodgepole pine (Pinus contorta Dougl. ex. Loud.) in British Columbia, Canada, and raised health concerns for jack pine (Pinus banksiana Lamb.). The pathogen has already shown signs of host shift eastward to the hybrid populations between lodgepole pine and jack pine (Pinus contorta ´ P. banksiana), and possibly into pure jack pine. However, we have little knowledge about mechanisms of resistance to D. septosporum, especially the underlying genetic basis of variation in pines. In this study, we conducted controlled inoculations to induce infection by D. septosporum and performed a genome-wide case-control association study with pooled sequencing (pool-seq) data to dissect the genetic architecture underlying response in lodgepole pine, jack pine, and their hybrids. We identified candidate genes associated with D. septosporum response in lodgepole pine and in hybrid samples. We also assessed genetic structure in hybrid populations and inferred how introgression may affect the distribution of genetic variation involved in D. septosporum response in the studied samples. These results can be used to develop genomic tools to evaluate DNB risk, guide forest management strategies, and potentially select for resistant genotypes.
README: Genetic architecture underlying response to the fungal pathogen Dothistroma septosporum in lodgepole pine, jack pine, and their hybrids
Description of the data and file structure
The datasets here were used for analyses in the manuscript. The codes can be found in
DOI: https://zenodo.org/records/14756909
1) For generating genetic structure
fil_mask_MAD_rnd.input
is the input including “randomly chosen” major allele depth for adegenet.
majorAF_rnd.input
is the input including major allele frequencies for corrplot.
long_key.txt
includes the index to re-order the populations based on longitude.
popool_fst_rnd.txt
is the final input for calculating pairwise Fst using randomly chosen loci.
all_info.txt
includes the population IDs and ploidy, which are necessary for running pairwise Fst.
2) For pool-GWAS using CMH
Abbreviations:
- LPp -- pine tree seedlot;
- 1R -- resistant trees inoculated with Dothistroma septosporum isolate 1 (D1);
- 1S -- susceptible trees inoculated with Dothistroma septosporum isolate 1 (D1);
- 2R -- resistant trees inoculated with Dothistroma septosporum isolate 2 (D2);
- 2S -- susceptible trees inoculated with Dothistroma septosporum isolate 2 (D2);
- pl -- lodgepole pine samples (LP in the manuscript);
- pxj -- combined jack pine and lodgepole pine x jack pine hybrid samples (JP + LPxJP in the manuscript)
pl_1_input
includes allele frequency tables for each lodgepole pine population infected by D1.
pl_2_input
includes allele frequency tables for each lodgepole pine population infected by D2.
pxj_1_input
includes allele frequency tables for each jack or hybrid pine population infected by D1.
pxj_2_input
includes allele frequency tables for each jack or hybrid pine population infected by D2.
manhattan_input
includes input for plotting manhattan plot in Figure 3.
- pl_chr_input_for_Fig3.bed includes the positions and GWAS p-values of SNPs amongst lodgepole pine samples.
- pl_top50_chr.txt includes the outlier intervals amongst lodgepole pine samples.
- pxj_chr_input_for_Fig3.bed includes the positions and GWAS p-values of SNPs amongst jack-hybrid pine samples.
- pxj_top50_chr.txt includes the outlier intervals amongst jack-hybrid pine samples.
pl_LD
is a folder containing lodgepole pine samples inputs, which were used to calculate pairwise r2 on each scaffold, and estimate decay of linkage disequilibrium (LD) with distance in Figure 4.
- plall_AF_add.txt is the input for plotting LD decay across all scaffolds amongst lodgepole pine samples.
- plall_AF_sig.txt is the input for plotting LD decay across only the scaffolds including identified outliers amongst lodgepole pine samples.
- sig_list_within_pl.txt is the scaffold list containing identified outliers amongst lodgepole pine samples.
- sig_notsig_scaf_within_pl.txt is the whole scaffold list amongst lodgepole pine samples.
pxj_LD
is a folder containing jack-hybrid pine samples input, which were used to calculate pairwise r2 on each scaffold, and estimate decay of linkage disequilibrium (LD) with distance in Figure 4.
- pxjall_AF_add.txt is the input for plotting LD decay across all scaffolds amongst jack-hybrid pine samples.
- pxjall_AF_sig.txt is the input for plotting LD decay across only the scaffolds including identified outliers amongst jack-hybrid pine samples.
- sig_list_within_pxj.txt is the scaffold list containing identified outliers amongst jack-hybrid pine samples.
- sig_notsig_scaf_within_pxj.txt is the whole scaffold list amongst jack-hybrid pine samples.
3) For plotting introgression patterns
introgress_input
is a folder containing inputs for Figure 5.
fst_input_for_Fig5ab.txt is the input for calculating Fst values of D. septosporum response outliers between jack pine samples and pure lodgepole pine samples.
lrinput_for_Fig5c is the input for calculating regression patterns of Fst values with longitude. The *Fst *values were calculated between the jack-hybrid pine samples and the six pure lodgepole pine samples.
Methods
Plant materials
Seeds were obtained from 40 natural seedlots across Western Canada (Figure 1, seedlot numbers and locations can be found in Table S1, seed contributors http://adaptree.forestry.ubc.ca/seed-contributors/), including 25 LP seedlots from British Columbia (BC_LP) and three from Alberta (AB_LP), seven LP ´ JP seedlots from Alberta (AB_LPxJP), and five JP seedlots from Alberta (AB_JP). The range maps were downloaded from https://sites.ualberta.ca/~ahamann/data/rangemaps.html (Hamann et al., 2005). When the seeds were collected in the wild, they were assigned to pure LP, pure JP, or LP ´ JP, based on their location and morphological traits such as cone and branch characteristics, microfibril angle, and cell area (Wheeler & Guries, 1987; Wood et al., 2009; Yeatman & Teich, 1969). The proportion of LP and JP ancestry of the collected seeds were genotyped using 11 microsatellite loci by Cullingham et al. (2012). Briefly, seeds were germinated to obtain seedlings, then DNA was isolated from the seedlings. DNA was used to amplify 11 microsatellite loci and allele sizes were determined for genotyping as described by Cullingham et al. (2011).
For the present study, seeds from the 40 seedlots were used to grow seedlings in a greenhouse at University of British Columbia, Vancouver, BC, for their first year (a flow chart of experimental procedure is shown in Figure S1). Resistance was phenotyped by inoculating 100 individuals per seedlot with one of two D. septosporum isolates (D1 & D2, 50 individuals of each isolate). D1 was isolated from needles of an infected LP ´ JP seedlot in Alberta, Canada, while D2 was isolated from an infected LP seedlot in the Kispiox Valley region close to Smithers in Northwestern British Columbia, Canada (Feau et al., 2021; Ramsfield et al., 2021). The seedlots were identified as LP or LP ´ JP as aforementioned (Cullingham et al., 2012). The controlled inoculation experiment was performed as described in Kabir et al. (2013) and Feau et al. (2021). Briefly, the one-year-old seedlings were placed in a completely randomized experimental design in a growth chamber with the condition of 16 hours daylight at 20 , 8 hours a night at 12
, and minimum relative humidity of 80 %. D. septosporum conidia were harvested from colonies grown for 10~16 days on Dothistroma sporulation medium plates and then suspended in sterile distilled water. Each seedling was sprayed twice at 10-day intervals with a standard inoculum of approximately 3 mL of 1.6
106 conidia mL–1, using a household trigger action atomizer. Once the needles were dry (~45 minutes), the seedlings were wrapped in transparent plastic bags and kept in the growth chamber. After 48h the plastic bags were removed and the seedlings were kept in the growth chamber for 15 weeks till phenotyping. A mister spraying tap water was activated hourly for 3 minutes during the trial to maintain needle wetness. Control seedlings, at least one for each seedlot, were inoculated with sterile water.
Fifteen weeks after inoculation, seedlings were rated for proportion of necrotic needles with red bands and/or fruiting bodies according to a disease severity scale of 1 to 5 (1 = least, 5 = most disease severity). Following disease rating, the ten most- and ten least-infected individuals per seedlot (five individuals for each D. septosporum isolate) were identified and retained for DNA extraction. Genomic DNA of each of these individuals was extracted as described in Lind et al. (2022), using the Nucleospin 96 Plant II Core kit (Macherey–Nagel GmbH & Co. KG, Germany) on an Eppendorf epMotion 5075 liquid-handling platform. The DNA of five most-infected and five least-infected individuals by each of the two D. septosporum isolates was combined in equimolar amounts to compose four pooled libraries per population, two susceptible (D1-S, D2-S) and two resistant libraries (D1-R, D2-R). Hence,160 pooled DNA libraries were generated.
Sequence capture and pool-seq genotyping
The capture probes were comprised of two sets of probes. The first set of probes was designed based on an existing LP sequence capture array (Suren et al., 2016) by removing the probes that did not yield successful genotyping in Yeaman et al. (2016). Probe sequences of the existing capture array were aligned to the reference genome using GMAP v2019-03-15 (Wu & Watanabe, 2005). Probes that covered the genomic regions with called SNPs in the dataset from Yeaman et al. (2016) were retained, otherwise the probes were discarded. Since there is no available LP reference genome, a masked Pinus taeda reference genome Pita.2_01.masked3k2.fa (https://treegenesdb.org/FTP/Genomes/Pita/v2.01/genome/) (Neale et al., 2014) was used instead. Pinus taeda is a closely related species to LP (Jin et al., 2021). The second set of probes was newly designed probes derived from the D. septosporum-induced genes, which were based on a LP reference transcriptome assembled using the RNA-seq data of D. septosporum infected LP samples (Lu et al., 2021). To avoid duplicates, only those D. septosporum-induced genes with low homology to the retained probe sequences were used to design the new probes. To do so, the retained probe sequences were aligned to this transcriptome using blastn v2.9.0 with an E-value of 1e-10. A total of 8,778 D. septosporum-induced genes did not have any aligned probe sequences. These non-duplicate D. septosporum-induced genes were subsequently aligned to the reference genome to predict the exon-intron boundaries using GMAP v2019-03-15. Exon sequences from these induced genes with a length of at least 100 bp were combined with the previously designed working probe sequences, and this combined set of sequences was submitted to Roche NimbleGen (Roche Sequencing Solutions, Inc., CA USA) for Custom SeqCap EZ probe design (design name: 180321_lodgepole_v2_EZ). Combining the two sets of probes, this updated LP sequence capture array has a capture space of 44 Mbp, containing roughly 35,467 assembled genes. Most LP genes responsive to environment stress and fungal pathogen attack were included in the current capture probe design. Though genes expressed in different development periods might be missed, genes that have evidence of substantial expression have been covered in this capture probe design.
The capture libraries for each of the 160 pools were constructed following NimbleGen SeqCap EZ Library SR User’s Guide and as described in Lind et al. (2022). Then the R (resistant) and S (susceptible) libraries (two libraries per capture, R1+S1 or R2+S2, indexed with different barcodes) per population and per isolate were combined for sequence capture and enrichment. Sequencing was performed using the Illumina NovaSeq 6000 S4 PE 150 platform in Centre d'expertise et de services Génome Québec. Our in-house pool-seq pipeline (Lind, 2021) was employed to align the reads to the reference genome and call SNPs. For raw SNPs, only bi-allelic loci in regions without annotated repetitive elements or potentially paralogous genes were retained. The annotated repetitive elements were acquired from the LP genome annotation (Wegrzyn et al., 2014). The potentially paralogous genes were identified as described in Lind et al. (2022) using haploid megagametophyte sequences, for the heterozygous SNP calls for haploid sequences are likely to represent misalignments of paralogs. Afterwards, the SNP loci with depth (DP) < 10, DP > 400, global minor allele frequency < 0.05, or > 25% missing data were also removed.