Characterization and improvement of novel bioenergy grasses (Tripidium spp.)
Data files
Jan 25, 2023 version files 95.55 MB
-
ListofFASTASequences.docx
42.06 KB
-
MAREN_SUPPLEMENTAL_DATASET_Readme.txt
96.85 KB
-
SF1-TranscriptomeAssembly.pptx
216.46 KB
-
SF2-AnnotationStatisticsPrimaryDeNovo.pptx
534.24 KB
-
SF3-AnnotationStatisticsClusterEnrichedAssembly.pptx
320.27 KB
-
SF4-AnnotationStatisticsPBIsoSeqAssembly.pptx
695.45 KB
-
ST10-160P-INFLORESCENCEDVLP-FINAL.xlsx
537 KB
-
ST11-200P-INFLORESCENCEDVLP-FINAL.xlsx
853.44 KB
-
ST12-FLOWERDVLP-ALL-FINAL.xlsx
4.80 MB
-
ST13-FT-FLOWERDVLP-FINAL.xlsx
577.08 KB
-
ST14-PAF-FLOWERDVLP-FINAL.xlsx
1.60 MB
-
ST15-ST-FLOWERDVLP-FINAL.xlsx
3.93 MB
-
ST16-ANT-FLOWERDVLP-FINAL.xlsx
2.09 MB
-
ST17-SEEDDVLP-ALL-FINAL.xlsx
1.08 MB
-
ST18-ANT-SEEDDVLP-FINAL.xlsx
231.96 KB
-
ST19-IS-SEEDDVLP-FINAL.xlsx
344 KB
-
ST20-MS-SEEDDVLP-FINAL.xlsx
583.83 KB
-
ST21-ClusterEnrichedDeNovoTranscriptomeAnnotations.txt
28.68 MB
-
ST22-CollapsedIsoSeqAnnotations.txt
39.08 MB
-
ST5-INFLORESCENCEDVLP-ALL-FINAL.xlsx
5.06 MB
-
ST6-20P-INFLORESCENCEDVLP-FINAL.xlsx
3.51 MB
-
ST7-40P-INFLORESCENCEDVLP-FINAL.xlsx
40.74 KB
-
ST8-80P-INFLORESCENCEDVLP-FINAL.xlsx
79.82 KB
-
ST9-120P-INFLORESCENCEDVLP-FINAL.xlsx
201.38 KB
-
Table_S1_Sequencing_statistics.docx
15.99 KB
-
Table_S2_GO_enrichment_analysis_for_inflorescence_development.docx
108 KB
-
Table_S3_GO_enrichment_analysis_during_floral_development.docx
118.90 KB
-
Table_S4_GO_enrichment_analysis_during_seed_development.docx
121.44 KB
Abstract
Growing economies, limited fossil fuel reserves, and environmental concerns have justified expanded research on renewable energy sources, including bioenergy crops. Taxa in the Poaceae Subtribe Saccharinae have gained attention as bioenergy crops based on their broad adaptability, pest resistance, and high biomass yields. Tripidium (syn. Erianthus, syn. Saccharum) is of particular interest due to perenniality, cold-hardiness, and high biomass yields. Tripidium ravennae is a cold-hardy, diploid species (2n = 2x = 20). Tripidium arundinaceum is a sub-tropical polyploid species (2n = 3x, 4x, 6x = 30, 40, 60) with high biomass yields.
Conventional breeding efforts focused on developing Tripidium as a competitive bioenergy feedstock for temperate climates. Advanced interspecific hybrids between T. arundinaceum and T. ravennae were evaluated in field plots relative to Miscanthus ×giganteus over three years. Collected data evaluated biomass yield, plant fertility, cytogenetics, and compositional analyses for lignocellulosic ethanol and forage utility. Cytology and cytometry confirmed hybrids were tetraploid with 2n = 4x = 40 (2C genome size = 5.06 pg). Dry biomass yields varied as a function of year and accession and increased each year, ranging 3.4 - 10.6, 8.6 - 37.3, and 23.7 - 60.6 Mg/ha for Tripidium hybrids compared to 2.3, 16.2, and 27.9 Mg/ha for M. ×giganteus in 2016, 2017, and 2018, respectively. Variations in yield and compositional analyses contributed to variations in theoretical ethanol yields ranging from 10,181 to 27,546 L/ha for Tripidium accessions compared to 13,095 L/ha for M. ×giganteus. These initial findings for Tripidium hybrids are promising and warrant further development of Tripidium as a temperate bioenergy feedstock.
A robust understanding of the molecular mechanism of flowering in Tripidium will enable future biotechnology applications by harnessing floral and seed development. Therefore, a differential gene expression (DGE) analysis was conducted to identify the differentially expressed genes (DEGs) associated with flower and seed development in T. ravennae. In the early phases of inflorescence development, the type II subfamily of MADS-box transcription factors were over-represented in both GO enrichment and differential expression analyses. As developing inflorescences matured, there was increased expression of inflorescence determinacy regulators, as well as transcripts related to meiotic, and multicellular organism developmental processes. In seed developmental samples, transcripts of multiple unigenes related to oxidative-reductive processes were identified. These results provide insights into the molecular regulation of reproductive development of Tripidium and provide a foundational database for future investigations and analyses, including genome annotation, functional genomics characterization, gene family evolutionary studies, comparative genomics, and precision breeding.
The ability to improve value-added traits of Tripididum hybrids via biotechnology would significantly enhance crop improvement opportunities. The objective of this portion of the research was to develop an efficient regeneration and transformation procedure for genetic modification of Tripidium hybrids. Multiple studies investigated the effects of various hormones, tissue culture media adjuncts, culture duration, Agrobacterium density, and hygromycin concentration on callus induction, maintenance, regeneration, and transformation efficiency. Callus induction media containing 10 - 40 µM 2,4-D with 12.5 mM L-proline generated callus that maximized the number of regenerated shoots (mean of 37 to 45 shoots٠explant-1). Callus maintenance media containing four µM 2,4-D and 12.5 mM L-proline for durations less than 12 weeks resulted in callus that maximized shoot number (mean of 13 to 18 shoots٠explant-1) following regeneration. Final experiments evaluating hygromycin concentration on selection efficiency and bacterial density on transformation efficiency are in progress.
Collectively, these research projects served to 1) evaluate and characterize new Tripidium hybrids as potential bioenergy crops, 2) establish foundational transcriptomic resources on flowering and reproductive development of Tripidium, and 3) develop a regeneration and transformation system to enable future biotechnology applications in these crops. These efforts will enable strategic advances in the development of Tripidium as a new bioenergy crop.
Methods
Plant material and sample collection
Vegetative and inflorescence meristems were collected from three plants on the week of August 9th, 2017, from the North Carolina Arboretum, Asheville, NC. Inflorescence meristematic tissue from reproductive culms was collected at various heights from the ground representing a progression of floral development. Inflorescences had emerged from more developed culms, but inflorescence meristems at earlier stages of development were apparent within the leaf sheath of less developed culms. Floral development tissues were collected at the floret boot stage, pre-anthesis, and anthesis stages as well as mature stamens. Spikelets containing immature and mature seeds were collected from the same plants in September and October of 2017, respectively. Culm segments, inflorescences, and developing seed samples containing target tissues were collected and immediately placed in 15 or 50 mL centrifuge tubes vials with 5 – 20 mL of RNAlater® Stabilization Solution (Ambion®, Life Technologies TM). Centrifuge tubes containing sample tissue were frozen in the field on a bed of dry ice before transport to the laboratory and stored at -80ºC (Table 2.1). Excess plant tissue was trimmed and removed or enriched under a stereomicroscope in sterile 100 mm Petri dishes containing approximately ~5-10 mL fresh RNAlater®. Immature floret tissue samples (FT) were purified from bulk collected inflorescence tissue prior to emergence from the flag leaf sheath. Pre-anthesis floret tissue samples (PAF) were purified from bulk collected inflorescence tissue that had emerged from the flag leaf sheath. The observation of spikelet expansion and glume extrusion identified PAF samples, but no evidence of anther or stigma extrusion was observed. Stamen tissue samples (ST) were purified from bulk collected floral spikelet tissues when anthers were exposed entirely outside of the glumes, and pollen was visibly dehiscing. Samples of florets at anthesis (ANT) were purified from bulk collected floral spikelet tissues by amassing florets showing stigma protrusion from the lemma and palea of the floret. Immature and mature seeds were processed by removing first and second order rachilla from the bulk collected tissue before tissue lysis and homogenization. Sample tissue lysis and homogenization were processed in liquid nitrogen by mortar and pestle.
RNA isolation, library preparation, and sequencing
Total RNA was extracted from all tissues using the Spectrum® Plant Total RNA Kit (Sigma-Aldrich, Burlington MA). DNA was digested on-column with the Sigma-Aldrich DNase10 (DNASE10) kit per the manufacturer's instructions. RNA concentration and integrity were quantitated with the QubitTM fluorimeter (Life TechnologiesTM) and the 2100 Bioanalyzer (Agilent) before library preparation, respectively. RNA samples were poly-adenylation purified, and cDNA libraries prepared using the BiooScientific (a PerkinElmer Co.) NEXTFlex Rapid Directional RNA-Seq kit with a target insert size of 200-300 bp. Libraries were sequenced using the Illumina HiSeq 4000, 150bp PE by Novogene (Sacramento, CA). The RNA of all samples were mixed and used to construct Pacific Biosciences Iso-Seq libraries (Protocol # 101-070-200 version 6) with three size fractions (no size selected, < 4 kb, and > 4 kb). The libraries were sequenced with four cells of a PacBio Sequel I system at NC State Genomic Sciences Laboratory.
Transcriptome assembly and functional annotation
Read quality was inspected for quality with FastQC (hyperlink). Trimming was conducted with CLC Genomics WB (CLC – GWB, V11.0.1, QIAGEN) to remove adapter sequences and low-quality reads (Q < 20). Multiple de novo transcriptome assemblies were constructed with the CLC – GWB using different k-mer (word size) and bubble size of de Bruijn graph combinations and assessed for the number of contigs, contiguity, and N50. The assembly with the lowest number of contigs but the largest N50 was selected. The final assembly was mapped (GMAP, V2015-07-23) to a draft genome assembly of T. ravennae (Maren et al., Unpub.) as well as multiple reference genomic assemblies within the Andropogoneae tribe, including Sorghum bicolor, Saccharum officinarum, and Zea mays. The GMAP mapping was carried out to enrich the transcriptome for plant transcripts and eliminate the transcripts of sample surface contaminants. Contigs with a 95% identity and match score to two or more reference genomes were retained. The transcriptome was analyzed for redundancy and cluster enriched with CD-HIT software with 95% identity to make a nonredundant set. Error corrected Iso-Seq transcripts derived from the SAM file using the CUPCAKE package (https://github.com/Magdoll/cDNA_Cupcake). Non-redundant de novo assembled transcripts were concatenated with the collapsed Iso-seq set and reanalyzed for redundancy and cluster enriched with CD-HIT software with 95% identity to make a nonredundant set. Functional annotation was carried out on a local server using BLASTx and the nr (NCBI non-redundant protein 12/2018 version) database. Searches were limited to the first 20 significant results with an E-cutoff value of 1.0E-6. Unitigs were functionally annotated utilizing default annotation rules in the BLAST2GO package. The unitigs and their BLASTX results were imported into the BLAST2GO package for functional gene annotation. Gene ontology (GO) term and functional annotation assignments followed InterPro scan, using the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) database, KEGG pathway analysis, Rfam annotation, and GO mapping based characterizations online on the BLAST2GO package.
Differentially expressed gene (DEG) and gene ontology (GO) enrichment analysis
Quality trimmed and filtered reads from all samples were mapped to the final cluster enriched transcriptome assembly with default parameters in the CLC – GWB. Statistical tests for the determination of differential gene expression utilize an exact test-like generalized linear model (GLM) similar to that performed in DESeq and EdgeR. In developing inflorescences, the test of differential expression utilized non-flowering controls for comparison. The statistical tests for differential expression in developing floral spikelets utilized two or more pair-wise comparisons between developing inflorescences and the sample query (e.g., FT, PAF, ST, ANT). Two or more pair-wise statistical tests between inflorescence controls, floral samples, and developing seeds comprise DEG calls. Genes of interest were filtered from differentially expressed genes table in CLC-GWB using a threshold p ≤ 0.05 and a two-fold threshold change.
Gene expression analysis by RT-qPCR
Bioinformatically derived differential expression statistics of inflorescence, floret, and seed development were screened for novel and putative genes in reproductive development to validate the sample set with RT-qPCR. GO enrichment by Fisher's exact test (FDR p ≤ 0.05; BLAST2GO) aided in selecting sequences from the test set for over-representation (Supplementary Table S2, S3, & S4). Unigenes were filtered for significant differential expression (FDR p ≤ 0.05) within the subset of relevant inflorescence samples. Final transcript selections were made on the unique mapping (GMAP; V2015-07-23) of the transcript to the reference genome assembly with concomitant support for gene architecture from the PacBio Iso-Seq data set. Primers were designed for each gene to maximize coverage for gene structures, which uniquely identified the isoform of interest. Multiple internal controls were selected from the RNA-seq data set by filtering the expression data set for unigenes with a minimum expression value of 200 transcripts per million (TPM), a mean value of less than 2000 TPM, and having a CV less than 0.35. Relative gene expression analysis was used in the evaluation of PCR data in the determination of gene expression values and calculated following the 2-∆∆Ct method.
Wu TD, Watanabe CK: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 2005, 21(9):1859-1875.
Garsmeur O, Droc G, Antonise R, Grimwood J, Potier B, Aitken K, Jenkins J, Martin G, Charron C, Hervouet C et al: A mosaic monoploid reference sequence for the highly complex genome of sugarcane. Nat Commun 2018, 9(1):2638.
Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA et al: The B73 maize genome: complexity, diversity, and dynamics. Science 2009, 326(5956):1112-1115.
Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22(13):1658-1659.
Fu LM, Niu BF, Zhu ZW, Wu ST, Li WZ: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28(23):3150-3152.
Conesa A, Gotz S: Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008, 2008:619832.
Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 2000, 28(1):27-30.
Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J et al: Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 2015, 43(Database issue):D130-137.
Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talon M, Dopazo J, Conesa A: High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Research 2008, 36(10):3420-3435.
Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26(1):139-140.
Robinson MD, Smyth GK: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 2008, 9(2):321-332.
Cheng Y, Bian W, Pang X, Yu J, Ahammed GJ, Zhou G, Wang R, Ruan M, Li Z, Ye Q et al: Genome-Wide Identification and Evaluation of Reference Genes for Quantitative RT-PCR Analysis during Tomato Fruit Development. Front Plant Sci 2017, 8:1440.
Livak KJ, Schmittgen TD: Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 2001, 25(4):402-408.
Preparation of plasmids and Agrobacterium.
A copy of the GUSPlusTM was obtained from the Gateway-compatible vector 6B described in Mann et al. (2012) and cloned into pCR8/GW/TOPO per the manufacturer's instructions (Invitrogen). The GUSPlusTM gene was recombined using the Gateway® LR Clonase® II enzyme mix (Invitrogen) into the destination 6A overexpression cassette described by Mann et al. (2012). Resulting vectors were transformed into DH5α by standard heat shock methods and selected on kanamycin containing LB media. Transformation competent Agrobacterium preparations followed the freeze-thaw method with strain EHA105 containing pTok47 (Hofgen and Willmitzer, 1988; Jin et al., 1987; Lu et al., 2008). Single colony selections were placed in 1.5 mL microcentrifuge tubes containing one mL of YEP and an additional 50 mg • L-1 kanamycin to grow overnight at 28 ºC. The following day, one mL starter cultures were added to flasks containing 100 mL of YEP and 50 mg • L-1 kanamycin. Forty-five ml of YEP medium and 200 µM acetosyringone (final concentration) was added to the overnight cultures, and growth continued until a colony density of OD600 ~ 0.8 was reached.
Mann, D.G.J., P.R. Lafayette, L.L. Abercrombie, Z.R. King, M. Mazarei, M.C. Halter, C.R. Poovaiah, H. Baxter, H. Shen, R.A. Dixon, W.A. Parrott, and C. Neal Stewart Jr. 2012. Gateway-compatible vectors for high-throughput gene functional analysis in switchgrass (Panicum virgatum L.) and other monocot species. Plant Biotech. J. 10:226-236. doi: 10.1111/j.1467-7652.2011.00658.x.
Hofgen, R. and L. Willmitzer. 1988. Storage of competent cells for Agrobacterium transformation. Nucleic Acids Research 16:9877-9877. doi: 10.1093/nar/16.20.9877.
Jin, S.G., T. Komari, M.P. Gordon, and E.W. Nester. 1987. Genes responsible for the supervirulence phenotype of Agrobacterium tumefaciens a281. J. Bact. 169:4417-4425. doi: 10.1128/jb.169.10.4417-4425.1987.
Lu, J., E. Sivamani, K. Azhakanandam, P. Samadder, X. Li, and R. Qu. 2008. Gene expression enhancement mediated by the 5' UTR intron of the rice rubi3 gene varied remarkably among tissues in transgenic rice plants. Mol. Genet. Genomics 279:563-572. doi: 10.1007/s00438-008-0333-6.