Nodules harboring nitrogen-fixing rhizobia are a well-known trait of legumes, but nodules also occur in other plant lineages, with rhizobia or the actinomycete Frankia as microsymbiont. It is generally assumed that nodulation evolved independently multiple times. However, molecular-genetic support for this hypothesis is lacking, as the genetic changes underlying nodule evolution remain elusive. We conducted genetic and comparative genomics studies by using Parasponia species (Cannabaceae), the only nonlegumes that can establish nitrogen-fixing nodules with rhizobium. Intergeneric crosses between Parasponia andersonii and its nonnodulating relative Trema tomentosa demonstrated that nodule organogenesis, but not intracellular infection, is a dominant genetic trait. Comparative transcriptomics of P. andersonii and the legume Medicago truncatula revealed utilization of at least 290 orthologous symbiosis genes in nodules. Among these are key genes that, in legumes, are essential for nodulation, including NODULE INCEPTION (NIN) and RHIZOBIUM-DIRECTED POLAR GROWTH (RPG). Comparative analysis of genomes from three Parasponia species and related nonnodulating plant species show evidence of parallel loss in nonnodulating species of putative orthologs of NIN, RPG, and NOD FACTOR PERCEPTION. Parallel loss of these symbiosis genes indicates that these nonnodulating lineages lost the potential to nodulate. Taken together, our results challenge the view that nodulation evolved in parallel and raises the possibility that nodulation originated ∼100 Mya in a common ancestor of all nodulating plant species, but was subsequently lost in many descendant lineages. This will have profound implications for translational approaches aimed at engineering nitrogen-fixing nodules in crop plants.
Gene phylogenies based on Bayesian analysis
Phylogenetic analyses of genes of interest (EPR, HB, NFP, HCT, EPR, NIN, and RPG). Amino acid sequence alignments were generated using MAFFT version 7.017. Analyses were performed using MrBayes version 3.2.6 running 2.2 million generations, setting gamma-distributed rate variation and integrating over different models of amino acid sequence evolution (aamodelpr=mixed). For NFP analyses were based on the full-length sequences as well as separately on the kinase domain only.
Gene_phylogenies_Bayesian.zip
Gene phylogenies based on Maximum_likelihood
Phylogenetic analyses of 146 orthogroups comprising genes that function in nitrogen-fixing root nodulation in legumes (Legume symbiosis genes), and 415 orthogroups comprising Parasponia genes that are expressed higher in nodules than in roots (Nodule enhanced genes). Amino acid sequence alignments were generated using MAFFT version 7.017. Analyses were performed using RAxML version 8.2.11 setting gamma-distributed rate variation, estimating optimal models of amino acid sequence evolution (PROTGAMMAAUTO), and running 100 fast bootstrap replicates to assess clade support.
Gene_phylogenies_Maximum_likelihood.zip
Draft genomes of Parasponia and Trema species
Draft genome assemblies of Parasponia rigida, Parasponia rugosa, Trema levigata, and Trema orientalis accession RG16 based on medium-coverage sequence data. Read data are available at GenBank under bioprojects PRJNA272486 (P. rigida), PRJNA272880 (P. rugosa) PRJNA38059 (T. levigata), and PRJNA272878 (T. orientalis RG16). Assembly was performed with the iterative de Bruijn graph assembler IDBA-UD version 1.1.1, iterating from 30-mers to 120-mers, with incremental steps of 20.
Draft_genomes.zip
Phylogenetic analyses of Cannabaceae
Nucleotide alignments were generated using MAFFT version 7.017. The first phylogenetic reconstruction of Cannabaceae (MarkerData) was based on four plastid markers with five optimal partitions and models of sequence evolution: atpB-rbcL combined with trnL-F (GTR+I+G); first codon position of rbcL (GTR+I+G); second position of rbcL (SYM+I+G); third position of rbcL (GTR+G); rps16 (GTR+G). The second phylogenetic reconstruction of Cannabaceae (GenomeData) was based on whole chloroplast genomes with eight optimal partitions and models of sequence evolution: tRNA sequence (HKY+I), rRNA sequence (GTR+I), long single copy region (LSC) coding sequence (GTR+I+G), LSC non-coding sequence (GTR+G), short single copy region (SSC) coding sequence (GTR+G), SSC non-coding sequence (GTR+G), inverted repeat region (IR) coding sequence (GTR+G), and IR non-coding sequence (GTR+G).
Cannabaceae_phylogeny.zip
Orthogroup inference
Orthogroups were inferred with OrthoFinder version 0.4.0. Since orthogroups are defined as the set of genes that are descended from a single gene in the last common ancestor of all the species being considered, they can comprise orthologous as well as paralogous genes. Our analysis included proteomes of selected species from the Eurosid clade: Arabidopsis thaliana (Brassicaceae, Brassicales) and Eucalyptus grandis (Myrtaceae, Myrtales) from the Malvid clade; Populus trichocarpa (Salicaeae, Malpighiales), legumes Medicago truncatula, and Glycine max (Fabaceae, Fabales), Fragaria vesca (Rosaceae, Rosales), P. andersonii, and T. orientalis (Cannabaceae, Rosales) from the Fabid clade. Sequences were retrieved from phytozome (www.phytozome.net).
Orthofinder_analysis.zip