Data for: Phylogenetic resolution and conflict in the species-rich flowering plant family Leguminosae
Data files
Feb 11, 2026 version files 441.18 MB
-
MG39-pf.txt.reduced
1.07 KB
-
MG39.fas.reduced.fas
54.89 MB
-
Mito-39CDSs-aln.zip
530.47 KB
-
Mito-39genes-aln.zip
1.90 MB
-
Mito-9introns.aln.zip
1.15 MB
-
PCN203_696-2.nogap.fa
363.26 MB
-
PCN203_696-2.parts
8.11 KB
-
Platstome.122intergenes.aln.zip
10.65 MB
-
Platstome.81genes.aln.zip
4.98 MB
-
README.md
6.10 KB
-
scripts.zip
18.70 KB
-
sub-51plastomes.gb.zip
3.55 MB
-
Supplemental_file_1_newicktrees.zip
231.40 KB
Abstract
The Tree of Life is central to evolutionary biology, yet resolving deep, recalcitrant phylogenetic relationships remains challenging due to complex processes such as incomplete lineage sorting (ILS), hybridization, and polyploidization. Although previous phylogenetic studies have advanced our understanding of Leguminosae (Fabaceae), a species-rich and ecologically diverse family, many deep relationships at the tribal and higher levels remain unresolved. Integrating newly generated genome skimming data for 231 species with previously issued plastid genomic, mitochondrial genomic and transcriptomic data, we reconstructed a phylogeny of the family using whole plastome, 39 mitochondrial genes, and 1559 low-copy nuclear genes, achieving dense taxonomic sampling across almost all recognized tribes and major unplaced lineages. Our results supported the monophyly of the six subfamilies and 49 recognized tribes, resolved ten clades worthy of recognition as new tribes in subfamily Papilionoideae, and clarified many contentious relationships. However, nuclear-nuclear and cytonuclear conflicts persists at multiple nodes among trees inferred from different datasets and analytical methods. We propose the most probable resolution for 22 contentious nodes by applying nuclear gene-tree quartet analysis and corroboration from support of nuclear Maximum Likelihood (ML) and ASTRAL trees. Our results indicate ILS significantly contributes to observed phylogenetic conflicts, while gene flow represents an additional and previously underappreciated factor that mainly contributes to cytonuclear conflicts, particularly along the branches of the Angylocalyceae + Dipterygeae + Amburaneae (ADA) clade and Wisterieae. Both processes likely underlie recalcitrant phylogenetic relationships, such as those within the 50-kb inversion clade of Papilionoideae. Our study uses multiple data partitions and analytical methods to resolve contentious phylogenetic relationships in Leguminosae, resulting in a robust phylogenomic framework to guide further investigations in this economically important and exceptionally diverse family.
https://doi.org/10.5061/dryad.wstqjq2wm
Description of the data and file structure
All these files were produced during the sampling, designing, and analyzing processes.
Files and variables
Mito-39CDSs-aln.zip
This folder contains multiple sequence alignments used for phylogenetic analyses.
Files are in FASTA format.
File names indicate 39 mitochondrial protein coding sequences' alignments
These alignments was aligned using MAFFT v7.475.
Mito-39genes-aln.zip
This folder contains multiple sequence alignments used for phylogenetic analyses.
Files are in FASTA format.
File names indicate 39 mitochondrial protein coding gene' alignments
These alignments are different to the above alignments because they include introns. These alignments were aligned using MAFFT v7.475.
Mito-9introns.aln.zip
This folder contains multiple sequence alignments used for phylogenetic analyses.
Files are in FASTA format.
File names indicate 9 mitochondrial intron sequences' alignments.
These alignments was aligned using MAFFT v7.475.
MG39.fas.reduced.fas
This file contains a dataset concatenated from 39 mitochondrial genes's alignments, is used to construct the maximum likelihood phylogentic tree of 459 samples. This file is provided in FASTA format.
MG39-pf.txt.reduced
This file provides the gene partition information for the "MG39.fas.reduced.fas" dataset.
PCN203_696-2.nogap.fa
This file contains the PCN203 dataset of plastomes, including all 203 plastid loci––both coding and noncoding, which is used to construct the PCN203 ML tree.
Files are in FASTA format.
PCN203_696-2.parts
This file provides the gene partition information for the "PCN203_696-2.nogap.fa" dataset.
Platstome.122intergenes.aln.zip
This folder contains 122 plastomic noncoding alignments, which was used to construct the PN122 ML tree.
Files are in FASTA format.
Platstome.81genes.aln.zip
This folder contains 81 plastomic gene's alignments, which was used to construct the PC81 ML tree.
Supplemental_file_1_newicktrees.zip
This folder contains inferred 17 phylogenetic trees in Newick format in this study.
Trees were inferred using maximum likelihood or coalescent-based methods.
X1_PCN203.ML.concord.cf.tre was constructed by 203 coding and noncoding sequences using RAxML v8.1.2 and evaluated gene concordance (gCF) in IQ-Tree v1.6.6.
X2_PCN203.ASTRAL.concord.cf.tre was constructed by 203 coding and noncoding gene trees using ASTRAL-III v5.1.170 and evaluated gene concordance (gCF) in IQ-Tree v1.6.6.
X3_Nucl1559.ML.concord.cf.tre was constructed by 1559 nuclear orthologs (this dataset was procuded in Zhao et al, 2021) using RAxML v8.1.2 and evaluated gene concordance (gCF) in IQ-Tree v1.6.6.
X4_Nucl1559.ASTRAL.concord.cf.tre was constructed by 1559 gene trees of nuclear orthologs (this dataset was procuded in Zhao et al, 2021) using ASTRAL-III v5.1.170 and evaluated gene concordance (gCF) in IQ-Tree v1.6.6.
X5_PCN203.ASTRAL.MLBScutoff10.tre was constructed by 203 coding and noncoding gene trees (cutoff boostrap support=10%) using ASTRAL-III v5.1.170 and evaluated gene concordance (gCF) in IQ-Tree v1.6.6.
X6_PC81.ML.tre was constructed by the 81 genes' concatenated aligment using RAxML v8.1.2.
X7_PN122.ML.tre was constructed by the 122 noncoding sequences' concatenated aligment using RAxML v8.1.2.
X8_PCN203-trimAl.ML.tre was constructed by the 203 coding and noncoding sequences' concatenated aligment using RAxML v8.1.2. These gene alignments were trimed by trimAl using default parameters.
X9_PC81-trimAl.ML.tre was constructed by the 81 genes' concatenated aligment using RAxML v8.1.2. These gene alignments were trimed by trimAl using default parameters.
X10_PN122-trimAl.ML.tre was constructed by the 122 noncoding sequences' concatenated aligment using RAxML v8.1.2. These gene alignments were trimed by trimAl using default parameters.
X11_Nucl1559.ASTRAL.MLBScutoff10.tre was constructed by the 1559 nuclear gene trees (cutoff boostrap support = 10%) using ASTRAL-III v5.1.170 and evaluated gene concordance (gCF) in IQ-Tree v1.6.6.
X12_MG39.ML.tre was constructed by the 39 mitochondrial genes' concatenated aligment using RAxML v8.1.2.
X13_MG39-trimAl.ML.tre was constructed by the 39 mitochondrial genes' concatenated aligment using RAxML v8.1.2. These gene alignments were trimed by trimAl using default parameters.
X14_MC39.ML.tre was constructed by the 39 mitochondrial genes' concatenated aligment using RAxML v8.1.2.
X15_MC39-trimAl.ML.tre was constructed by the 39 mitochondrial coding regions' concatenated aligment using RAxML v8.1.2. These alignments were trimed by trimAl using default parameters.
X16_MI9.ML.tre was constructed by the 9 mitochondrial introns' concatenated aligment using RAxML v8.1.2.
X17_MI9-trimAl.ML.tre was constructed by the 9 mitochondrial introns' concatenated aligment using RAxML v8.1.2. These alignments were trimed by trimAl using default parameters.
sub-51plastomes.gb.zip
This folder includes 51 plastomes' gb files, which are incomplete plastomes assembled from genome skimming data using GetOrganelle v1.6.3. All plastome information were available in the supplemental Table S1 of the journal website.
scripts.zip
This folder contains scripts used to generate figures or summary statistics.
Scripts are provided for reference but may require external R software and packages.
Access information
Our new sequencing data will be uploaded to a public database following the manuscript review process. Other data in this study are sourced from the following databases and articles.
Other publicly accessible locations of the data:
- https://doi.org/10.5061/dryad.1vhhmgqpb.
- GenBank (https://www.ncbi.nlm.nih.gov/genbank/): accession numbers are found in supplemental Table S1.
All these files were produced during the sampling, designing, and analyzing processes.
