Unresolved questions about evolution of the large and diverse legume family include the timing of polyploidy (whole-genome duplication; WGDs) relative to the origin of the major lineages within the Fabaceae and to the origin of symbiotic nitrogen fixation. Previous work has established that a WGD affects most lineages in the Papilionoideae and occurred some time after the divergence of the papilionoid and mimosoid clades, but the exact timing has been unknown. The history of WGD has also not been established for legume lineages outside the Papilionoideae. We investigated the presence and timing of WGDs in the legumes by querying thousands of phylogenetic trees constructed from transcriptome and genome data from 20 diverse legumes and 17 outgroup species. The timing of duplications in the gene trees indicates that the papilionoid WGD occurred in the common ancestor of all papilionoids. The earliest diverging lineages of the Papilionoideae include both nodulating taxa such as the genistoids (e.g. lupin), dalbergioids (e.g. peanut), phaseoloids (e.g. beans), and galegoids (= Hologalegina, e.g. clovers), and clades with non-nodulating taxa including Xanthocercis and Cladrastis (evaluated in this study). We also found evidence for several independent WGDs near the base of other major legume lineages, including the Mimosoid-Cassiinae-Caesalpinieae (MCC), Detarieae, and Cercideae clades. Nodulation is found in the MCC and papilionoid clades, both of which experienced ancestral WGDs. However, there are numerous non-nodulating lineages in both clades, making it unclear whether the phylogenetic distribution of nodulation is due to independent gains or a single origin followed by multiple losses.
assemblies
The filtered (>1% per-component read representation) Trinity assemblies of legume transcriptomes used in this study, as part of the Thousand Plant Transcriptome (1kp) project. These are multifasta files.
peptide translations
Peptide translations based on a blastx of the filtered Trinity assemblies to the 22 genome dataset (Amborella Genome Consortium, 2013) and conceptually translated using Genewise. These are multifasta files.
pep_translations.tar.gz
CDS translations
Corresponding CDS sequences for the peptide translations. These are multifasta files.
cds_translations.tar.gz
trees - bipartitions
The RAxML bipartitions trees for the 3360 orthogroup alignments containing Glycine max. syntelogs. These are Newick tree files.
trees_bipartitions.tar.gz
Legume pairs - ksplots
Putative paralog pairs estimated from Ks plots for multiple species from study. These were used to test for other WGD events (other than PWGD). This is a tab delimited file with the first and second columns being Ks plot identified putative paralogs and the third column being a designation for the putative event.
Legume_pairs_ksplots.txt.gz
alignments for other WGD tests
Alignments of orthogroups used to look at putative WGD events in data set. These are multifasta files.
alignments_for_other_WGD_tests.tar.gz
trees for other WGD tests
RAxML best trees with bootstrap values added (bipartitions) for orthogroups used to test for putative WGD events in data set identified through Ks plots that are not PWGD. These are Newick tree files.
trees_for_other_WGD_tests.tar.gz
single copy trees
Low-copy trees used to estimate a species phylogeny using MP-EST methods.
single_copy_trees.tar.gz
concat_alllegume.fsa
Concatenated nucleotide alignments for 101 orthogroups used to estimate a species tree in RAxML. This is a multifasta file.
RAxML_bipartitions.concat_alllegume.fsa.raxml.out
RAxML best tree with bootstrap values estimated from concatenation of 101 orthogroups alignments. This is a Newick tree file.
DNA alignments
cDNA alignments for all gene families in the project, made using pal2nal. These are multifasta alignment files.
dna_alignments.tar.gz
protein-sequence alignments
Peptide alignments for orthogroup clusters in the project, calculated using MUSCLE. These are multifasta alignment files.
pep_alignments.tar.gz
trees - bipartitions
The RAxML bipartitions trees for the 3360 orthogroup alignments containing Glycine max. syntelogs. These are Newick tree files.
trees_bipartitions.tar.gz