Skip to main content
Dryad

Variant discovery in full-sibling families of Pinus taeda L

Cite this dataset

Lauer, Eddie (2021). Variant discovery in full-sibling families of Pinus taeda L [Dataset]. Dryad. https://doi.org/10.5061/dryad.hhmgqnkgg

Abstract

Fusiform rust disease, caused by the endemic fungus Cronartium quercuum f. sp. fusiforme, is the most damaging disease affecting economically important pine species in the southeast United States. In this report, we detail the genomic localization and sequence-level discovery of candidate race-nonspecific broad-spectrum fusiform rust resistance genes in Pinus taeda L. Two full-sib families, each with ~1000 progeny, were challenged with a complex inoculum consisting of over 150 pathogen isolates. High-density linkage mapping revealed three QTL distributed on two linkage groups. The two QTL on linkage group 2 were additive with respect to their effects on the probability of disease outcome. All three QTL were validated using a population of 2057 cloned pine genotypes in a six-year-old multi-environmental field trial. As a complement to the QTL mapping approach, bulked  segregant RNAseq analysis revealed a small number of candidate nucleotide binding leucine rich repeat genes harboring SNP significantly associated with disease resistance. The results of this study demonstrate that single qualitative resistance genes can confer effective resistance against genetically diverse mixtures of an endemic pathogen.

Methods

A total of 15 samples had adequate quantity and quality of RNA for library preparation. Each sample was sequenced on two lanes of an S2 flow cell of the NovaSeq6000 Illumina sequencer, resulting in 7.6x109 50bp paired-end reads. For each sample, reads originating from lanes 1 and 2 were combined into a single fastq file for each mate, and aligned to the PacBio reference transcriptome using bwa mem with the default options (Heng Li, 2013). Around 70% of the sequences from each sample had both mate pairs properly mapped to the transcriptome, with an average quality score of 35.8, an average insert size of ~275bp, and an average depth of 163x.

Following alignment, variants were called using Freebayes version 0.9.6 (Garrison & Marth, 2012). Each bam file from a single family was combined in a variant discovery run. Since each sample represented a bulk of 100 (for the random) or 50 (for the disease status) individuals, the population model was specified as ‘pooled’ using the ‘-J’ qualifier. Complex alleles of up to 25bp were allowed using the ‘—max-complex-gap’ qualifier.   Biallelic SNP with a minimum of 10 observations of the alternate allele were considered for downstream analysis.

Usage notes

Sample ID's appearing in the .vcf files are described below.

Family E4
S15 random bulk collected prior to inoculation (100 full-sib)
S11 non-diseased bulk collected 7 months post-inoculation (50 full-sib)
S19 non-diseased bulk collected 10 months post-inoculation (100 full-sib)
S2 non-diseased bulk collected 7 months post-inoculation (50 full-sib)
S18 diseased bulk collected 10 months post-inoculation (100 full-sib)
S1 diseased bulk collected 7 months post-inoculation (50 full-sib)
S5 diseased bulk collected 7 months post-inoculation (50 full-sib)

Family E9
S16 random bulk collected prior to inoculation (100 full-sib)
S12  non-diseased bulk collected 7 months post-inoculation (50 full-sib)
S17 non-diseased bulk collected 10 months post-inoculation (100 full-sib)
S3 non-diseased bulk collected 7 months post-inoculation (50 full-sib)
S7 non-diseased bulk collected 7 months post-inoculation (50 full-sib)
S20 diseased bulk collected 10 months post-inoculation (100 full-sib)
S4 diseased bulk collected 7 months post-inoculation (50 full-sib)
S8 diseased bulk collected 7 months post-inoculation (50 full-sib)

Funding

National Institute of Food and Agriculture, Award: 2019-67013-29169