PacBio IsoSeq reference transcriptomes for Pinus taeda L.
Data files
Jul 07, 2021 version files 255.38 MB
-
isoseq_FamilyE4.fasta
-
isoseq_FamilyE9.fasta
Abstract
Fusiform rust disease, caused by the endemic fungus Cronartium quercuum f. sp. fusiforme, is the most damaging disease affecting economically important pine species in the southeast United States. In this report, we detail the genomic localization and sequence-level discovery of candidate race-nonspecific broad-spectrum fusiform rust resistance genes in Pinus taeda L. Two full-sib families, each with ~1000 progeny, were challenged with a complex inoculum consisting of over 150 pathogen isolates. High-density linkage mapping revealed three QTL distributed on two linkage groups. The two QTL on linkage group 2 were additive with respect to their effects on the probability of disease outcome. All three QTL were validated using a population of 2057 cloned pine genotypes in a six-year-old multi-environmental field trial. As a complement to the QTL mapping approach, bulked segregant RNAseq analysis revealed a small number of candidate nucleotide binding leucine rich repeat genes harboring SNP significantly associated with disease resistance. The results of this study demonstrate that single qualitative resistance genes can confer effective resistance against genetically diverse mixtures of an endemic pathogen.
Methods
Reference transcriptomes for both families were produced using the standard isoseq3 pipeline implemented in SMRTLink v.7.0 from PacBio. Consensus sequences were generated from the ‘pooled’ and the ‘size-selected’ fractions of each family. The ccs command-line application was used to generate circular consensus sequences (CCS) from the subreads.bam files, using a minimum length of 100bp. The circular consensus sequences were then clustered using the isoseq3 cluster command-line application, and final error correction was performed via the isoseq3 polish tool. Reference transcriptomes for each family were produced by concatenating the polished high-quality transcript sequences from both fractions. The reference transcriptome for family E4 contained 38634 contigs with a contig N50 of 4236bp and a total length of 115,867,422 bp. For family E9, there were 42881 contigs with an N50 of 3967 bp and a total length of 133,843,719 bp.
No trimming or other post-hoc data manipulation of the sequences were performed. All contigs are presented as full-length sequences which were the primary output of the isoseq3 pipeline.
Usage notes
The raw 'subreads.bam' files used to produce the transcriptomes via the Isoseq3 pipeline was deposited on NCBI under accession number PRJNA719490
No trimming or other post-hoc data manipulation of the sequences were performed. All contigs are presented as full-length sequences which were the primary output of the isoseq3 pipeline. Sequence ID's are consistent with the ID's posted in Supplementary Table 3 of the manuscript.