Skip to main content

Exploration of plastid phylogenomic conflict yields new insights into the deep relationships of Leguminosae

Cite this dataset

Zhang, Rong et al. (2022). Exploration of plastid phylogenomic conflict yields new insights into the deep relationships of Leguminosae [Dataset]. Dryad.


Phylogenomic analyses have helped resolve many recalcitrant relationships in the angiosperm tree of life, yet phylogenetic resolution of the backbone of the Leguminosae, one of the largest and most economically and ecologically important families, remains poor due to generally limited molecular data and incomplete taxon sampling of previous studies. Here, we resolve many of the Leguminosae's thorniest nodes through comprehensive analysis of plastome-scale data using multiple modified coding and noncoding datasets of 187 species representing almost all major clades of the family. Additionally, we thoroughly characterize conflicting phylogenomic signal across the plastome in light of the family's complex history of plastome evolution. Most analyses produced largely congruent topologies with strong statistical support, and provided strong support for resolution of some long-controversial deep relationships among the early diverging lineages of the subfamilies Caesalpinioideae and Papilionoideae. The robust phylogenetic backbone reconstructed in this study establishes a framework for future studies on legume classification, evolution, and diversification. However, conflicting phylogenetic signal was detected and quantified at several key nodes that prevents the confident resolution of these nodes using plastome data alone.


Each coding and noncoding locus was individually aligned using the L-INS-i method of MAFFT. To minimize the use of loci with limited information or with relatively few species, loci consisting of fewer than four species were excluded, and aligned regions less than 22 bp were also removed. In addition, tRNAs were excluded from analyses because they are short (≤ 93 bp) and very conserved. We finally obtained 226 alignments including 81 coding and 145 noncoding loci. We first constructed three basic datasets: the PC (plastid coding regions; the concatenated 81 coding genes), PN (plastid noncoding regions; the concatenated 145 noncoding loci) and PCN (the concatenated PC and PN) datasets. To rapidly concatenate the alignments of separate loci and generate a concomitant configure file for use in downstream partition-based analysis, we also developed a new custom script "". Multiple strategies were then applied to reduce systematic error for the three basic datasets. Then we obtained other corresponding datasets.

Usage notes

Please see the Supplementary Table S6 for the list of removed loci in all datasets.


Strategic Priority Research Program of Chinese Academy of Sciences, Award: XDB31010000

National Natural Science Foundation of China, Award: 31720103903

Large-scale Scientific Facilities of the Chinese Academy of Sciences, Award: 2017-LSF-GBOWS-02

CNPq Research Productivity Fellowships, Award: 306736/2015-2

CNPq Research Productivity Fellowships, Award: 303585/2016-1