Data from: Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees
Burleigh, J. Gordon et al. (2010), Data from: Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees, Dryad, Dataset, https://doi.org/10.5061/dryad.7881
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony is a phylogenetic optimization criterion in which a species tree is selected that minimizes the number of gene duplications induced among a set of gene trees. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of gene tree parsimony on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the gene tree parsimony analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among gene tree parsimony bootstrap replicates. Excluding these taxa either before or after the gene tree parsimony analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade, and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.